How The Crew was ported to PlayStation 4
And why PC is so important for the next-gen launch.
Ubisoft Reflections rounded off day two of last week's Develop conference with an intriguing talk, tantalisingly entitled "Tips and Tricks for Porting to Next-Gen". For Digital Foundry, it was a must-see presentation primarily because the vast majority - and perhaps even all - of the multi-platform games we'll be playing on Xbox One and PlayStation 4 by the end of the year have been derived from PC code, necessitating some level of porting across to the new hardware.
It's an intriguing state of affairs, especially if you're a PC gamer with a reasonably powerful computer. Once upon a time your hardware was a porting target, sometimes with only minimal effort put into the conversion. Now, PC is lead platform. E3 2013 starkly demonstrated that the availability of final production console hardware for developers is exceptionally limited, with many games destined for console hardware running on "target" PC systems. It makes sense that PC takes centre-stage during the development effort, simply because games take upwards of two years to develop and actual console hardware wasn't available until very recently.
Ubisoft's highly promising next-gen racer, The Crew, received its E3 debut last month, with the company's first gameplay demo running on PC hardware. It's a new project created from ex-Test Drive Unlimited staff who've formed a new studio - Ivory Tower - and are producing the core PC version of the game (and, we suspect, the Xbox One version). However, what's curious here is that it's the UK-based Ubisoft Reflections tech team that is entirely responsible for the PS4 edition, while other staff in the Newcastle studio produce additional content for the game - specifically sound, script, the skill challenges and, remarkably, the entire state of Texas.
For the PS4 staff, the task facing them looked rather onerous, with the developer taking on a massive codebase generated by an entirely separate studio, the initial aim simply to compile it on the new Sony hardware and try to get some kind of image on-screen.
"We started off with a large codebase - there were about 12,000 source files. And we started with a 64-bit Windows version of the engine using D3D11," says Reflections' expert programmer (yes, that is an actual job title), Dr. Chris Jenner.
"It's important to start with a 64-bit version because obviously the [PS4] hardware is 64-bit so it's nice to get those 32-bit/64-bit issues out of the way before you start worrying about the platform specifics. The initial aim of our work was to get the PS4 version to feature-parity with the Windows version."
Sony has made a big deal about the accessibility of the PS4 hardware, and a key element of that would be the quality of the toolchain - the series of programs used to create compiled code. For the PS4 developers, the use of the established Visual Studio environment proves to be a key benefit, and the extent to which Sony has acknowledged and supported cross-platform game-makers is self-evident. There are even options within Sony's compiler specifically added in order to increase compatibility with the Microsoft counterpart used in compiling DirectX 11 games.
"One thing that definitely helped getting the game to work was that the engine uses quite a lot of middleware. Middleware supporters have been very active on PS4, so there are versions of all the middleware we wanted available," Jenner continues.
"It takes a bit of work and a bit of time to integrate as SDKs change to get new versions of the middleware you're after, so that can feel like a full-time job at times, but as the platform settles down and the SDK changes become less significant getting closer to launch that becomes less of an issue."
"We started off with a large codebase - there were about 12,000 source files... The initial aim of our work was to get the PS4 version to feature parity with the Windows version."
More crucial is how the 8GB of RAM in the PlayStation 4 is utilised. This unified pool is a significant advantage over platforms like PC and PS3, where CPU and graphics RAM takes the form of two entirely separate pools of memory. The PS4 operates a system where memory is allocated either to the CPU or GPU, using two separate memory buses.
"One's called the Onion, one's called the Garlic bus. Onion is mapped through the CPU caches... This allows the CPU to have good access to memory," explains Jenner.
"Garlic bypasses the CPU caches and has very high bandwidth suitable for graphics programming, which goes straight to the GPU. It's important to think about how you're allocating your memory based on what you're going to put in there."
Jenner wouldn't go into details on the levels of bandwidth available for each bus owing to confidentiality agreements, but based on our information the GPU has full access to the 176GB/s bandwidth of the PS4's GDDR5 via Garlic, while the Onion gets by with a significantly lower amount, somewhere in the 20GB/s region (this ExtremeTech analysis of the PS4 APU is a good read). Whatever the precise figure is for the more constrained CPU area, Jenner would only confirm that it's "enough". Optimising the PS4 version of The Crew once the team did manage to get the code compiling required some serious work in deciding what data would be the best fit for each area of memory.
"The first performance problem we had was not allocating memory correctly... So the Onion bus is very good for system stuff and can be accessed by the CPU. The Garlic is very good for rendering resources and can get a lot of data into the GPU," Jenner reveals.
"One issue we had was that we had some of our shaders allocated in Garlic but the constant writing code actually had to read something from the shaders to understand what it was meant to be writing - and because that was in Garlic memory, that was a very slow read because it's not going through the CPU caches. That was one issue we had to sort out early on, making sure that everything is split into the correct memory regions otherwise that can really slow you down."
So elements like main system heap (containing the main store of game variables), key shader data, and render targets that need to be read by the CPU are allocated to Onion memory, while more GPU-focused elements like vertex and texture data, shader code and the majority of the render targets are kept in the ultra-wide Garlic memory.
A more crucial issue is that, while the PS4 toolchain is designed to be familiar to those working on PC, the new Sony hardware doesn't use the DirectX API, so Sony has supplied two of their own.
"The graphics APIs are brand new - they don't have any legacy baggage, so they're quite clean, well thought-out and match the hardware really well," says Reflections' expert programmer Simon O'Connor.
"At the lowest level there's an API called GNM. That gives you nearly full control of the GPU. It gives you a lot of potential power and flexibility on how you program things. Driving the GPU at that level means more work."
Sony has talked about its lower-level API at GDC, but wouldn't disclose its name, so at least now we know what it's called (the PS3 equivalent is GCM, for what it's worth) but what about the "wrapper" code supplied by Sony that is supposed to make development simpler?
"Most people start with the GNMX API which wraps around GNM and manages the more esoteric GPU details in a way that's a lot more familiar if you're used to platforms like D3D11. We started with the high-level one but eventually we moved to the low-level API because it suits our uses a little better," says O'Connor, explaining that while GNMX is a lot simpler to work with, it removes much of the custom access to the PS4 GPU, and also incurs a significant CPU hit.
A lot of work was put into the move to the lower-level GNM, and in the process the tech team found out just how much work DirectX does in the background in terms of memory allocation and resource management. Moving to GNM meant that the developers had to take on the burden there themselves, as O'Connor explains:
"The Crew uses a subset of the D3D11 feature-set, so that subset is for the most part easily portable to the PS4 API. But the PS4 is a console not a PC, so a lot of things that are done for you by D3D on PC - you have to do that yourself. It means there's more DIY to do but it gives you a hell of a lot more control over what you can do with the system."
Another key area of the game is its programmable pixel shaders. Reflections' experience suggests that the PlayStation Shader Language (PSSL) is very similar indeed to the HLSL standard in DirectX 11, with just subtle differences that were eliminated for the most part through pre-process macros and what O'Connor calls a "regex search and replace" for more complicated differences.
"The SDK is changing all the time... We're getting near to the final state. We're not expecting huge performance changes, just finalisation of features."
At the Ubisoft E3 event, the PC version of The Crew was running at 30 frames per second, but the first working compilation of the PS4 codebase wasn't quite so hot, operating at around 10fps.
"The PS4 SDK comes with a nice CPU profiling tool which we used very early on which has been very useful for us in finding out where the high-level bottlenecks were in our code," says Chris Jenner, referring to a Sony tool known as Razer.
"Our game is architected to have two main CPU threads, one of which is running the simulation, the other of which is drawing the scene and they run in parallel. Both of those threads can then fork out to extra processors to really run lots of work in parallel."
Perhaps not surprisingly, it was the render thread that proved to be the bottleneck, particularly in terms of setting up the programmable pixel shaders - the "constants" being the main issue. Constants are the data supplied to the shader that aren't vertices or textures - elements like the position of the object, the colour of sunlight or the exact position of bones in a skeletally animated object. A shader needs anything from dozens to hundreds of these constants, and factoring in the amount of shader work in a modern game, it can present a significant bottleneck.
"We had a couple of solutions to fix this, one of which was to reduce the time spent setting constants in the render thread and the other one was to load-balance across the different cores by multi-threading our command buffer generation," says Jenner, also revealing that this is a whole lot easier than it was on PS3 owing to the fact that all CPU cores have access to main memory.
"The other thing we did is to look at constant setting. GNMX - which is Sony's graphics engine - has a component called the Constant Update Engine which handles setting all the constants that need to go to the GPU. That was slower than we would have liked. It was taking up a lot of CPU time. Now Sony has actually improved this, so in later releases of the SDK there is a faster version of the CUE, but we decided we'd handle this ourselves because we have a lot of knowledge about how our engine accesses data and when things need to be updated than the more general-purpose implementation... So we can actually make this faster than the version we had at the time."
In general, from a performance perspective, it seems that Sony's SDK is just about where it needs to be right now, in contrast to the Microsoft equivalent, where techs are still working on very significant improvements that will drive improved GPU throughput. We asked the Reflections team if they expect their optimisation efforts to be aided by revised, improved versions of the Sony development environment. In essence, is the GPU "driver" still being optimised?
"The SDK is changing all the time, [but] it's changing less quickly than it was six months ago," Chris Jenner says.
"We're getting near to the final state, we're not expecting huge performance changes, just finalisation of features. It's a lot more stable than it was early on. We haven't had to do any changes for a while."
With the basic porting complete, the Ubisoft Reflections team is now ramping up its staff in order to complete the PS4 game ready for the Q1 2014 release, but the core engineering effort in moving The Crew across to PlayStation 4 was accomplished in six months with a team of just two to three people working on it. Overall, Reflections felt that the process of porting over the PC codebase was fairly simple and straightforward.
What we didn't find out is how the Xbox One version is faring, or who is producing it. Our bet is on the Ivory Tower studio producing it in tandem with the PC version, owing to the use of the DirectX 11 API on two platforms. But Xbox One and PS4 both have much in common from an architectural standpoint, and questions we have about collaboration between the console teams resulting in optimisations common to both console versions remain unanswered for now.
Simon O'Connor did point out that Reflections considers its work on The Crew to end up being much more than a simple, feature-complete port. This is an opportunity to explore what the new hardware is available of, and there's a sense that the PlayStation 4's graphics hardware is not being fully exploited.
"The PS4's GPU is very programmable. There's a lot of power in there that we're just not using yet. So what we want to do are some PS4-specific things for our rendering but within reason - it's a cross-platform game so we can't do too much that's PS4-specific," he reveals.
"There are two things we want to look into: asynchronous compute where we can actually run compute jobs in parallel... We [also] have low-level access to the fragment-processing hardware which allows us to do some quite interesting things with anti-aliasing and a few other effects."
The standard porting process at the beginning of the Xbox 360/PS3 era seemed to be a case of targeting a lead platform and then removing features for subsequent ports, or alternatively taking a hit to performance. While multi-platform next-gen titles see the consoles take on target rather than lead platform status, there's clearly the realisation that the new machines are capable of more, and that much is to be gained by exploring platform-specific features. If Reflections can indeed achieve feature parity with the PC version, and then tailor the codebase to suit the strengths of Mark Cerny's "super-charged PC architecture", The Crew should be one to watch out for.