Tech Interview: Trials Fusion
Digital Foundry vs RedLynx in an extensive multi-platform Q&A.
With the arrival of a new Trials game, it's the Digital Foundry tradition to accompany the launch with an in-depth tech interview with RedLynx's tech mastermind, lead graphics programmer Sebastian Aaltonen - aka sebbbi. Whereas previous interviews have concentrated on how RedLynx coaxed stunning effects, physics and performance from Xbox 360 hardware, the topic of conversation shifts somewhat here: Trials Fusion is the studio's first simultaneous release, multi-platform project - and a cross-generation game to boot.
Here we'll learn how RedLynx prototyped the new game, what the new consoles add to the mix and how the company approached the Xbox 360 version of the game. There's also an in-depth conversation on the thorny subject of the Xbox One's 32MB of ESRAM - does optimising a game for the seemingly meagre amount of available scratchpad memory actually hold back developers from getting the most out of PS4 and PC versions of the game? And what's the score with games that run at 720p on Xbox One and 1080p on PlayStation 4? You'll find out here. On top of that, we talk Mantle, DirectX 12, GPU compute and much, much more.
But before we begin, a quick clarification - these deep-dive interviews aren't easy to arrange - and to provide the kind of depth we strive for, they often need to be set-up way ahead of time. In the case of Trials Fusion, questions were submitted to RedLynx before the Xbox One 800p to 900p patch was common knowledge, and before we'd seen any of the console versions of the game. However, we did get some hands-on time with the game via the PC beta. You can read our full thoughts in the final game in our recently published four-way Face-Off.
"After spending so many years writing games exclusively on Xbox 360, we were really thrilled about the possibilities of the next-generation hardware. The GPU performance is a huge jump ahead..."
We started Trials Fusion project in 2012 using high-end PCs. We knew the rough target specs of the next-generation hardware, so we bought us PC GPUs that estimated the next-generation GPU performance as closely as possible.
After spending so many years writing games exclusively on Xbox 360, we were really thrilled about the possibilities of the next-generation hardware. The GPU performance is a huge jump ahead, especially when you consider all the efficiency gains provided by compute shaders and other architectural improvements.
Console gaming is now more online oriented than ever, and hardware manufacturers are pushing new online features such as game recording to online video services. Trials has always been a connected experience. In the 'single player mode' you always see your friends' results, you play against their ghosts, watch their replays and compete against them in the online leaderboards. Trials Fusion is our third game to include user created content creation and sharing system. We have expanded the possibilities even further. As a part of Ubisoft we have access to their big server infrastructure and we can fully customise it to match our needs. This brings entirely new possibilities for the Track Central - our user created content sharing platform.
I have good news for you: a guaranteed 60fps is actually now a part of the official Trials brand guidelines!
The locked 60fps goal impacts both our content production and our programming practices, as every single frame needs to finish in a tight 16.6 millisecond budget. For programming this means that we favour algorithms that have good worst-case performance instead of good average performance. Fluctuation of frame cost is the biggest enemy of reaching a locked frame-rate. Data reuse is also very important. Each frame in a 60fps game is highly similar to the previous one generated 16.6 milliseconds ago. This provides many possibilities for data re-use and ultimately better performance and quality. GPU and CPU cycles should be spent on important things instead of calculating the same things over and over again.
For content production, a locked 60fps means tight quality control. We report a bug to our level design team for every single location that dips under 60fps. To help ensuring the quality, we have implemented automated frame-rate monitoring. Our development build sends frame-rate statistics to our monitoring server from every single checkpoint area of every level in the game on all four platforms. This data includes min, max, average and median load of all our CPU threads and the GPU. We analyse this data and use it to focus both code and content optimisation efforts during the project.
"The locked 60fps goal impacts both our content production and our programming practices, as every single frame needs to finish in a tight 16.6ms budget."
It's true that the next-gen consoles and high-end PCs have more than 10x available memory compared to the Xbox 360. However the speed of hard drives in general hasn't improved that much in the last ten years, and now we need to support 1080p and beyond (2560x1600 is starting to be quite common on PC). Higher resolution and more complex physically based lighting model means that we need to load approximately four times as much texture data as we did in Trials Evolution.
Our team (still) hates long loading screens with passion. The goal of Trials Fusion was to keep level loading screens around five seconds long. Longer loading screens were never really considered, as we feel they ruin the game flow. We don't want the players become bored while waiting and eventually stop playing the game.
Short loading time goal and much higher content quality mean that streaming is actually now more important for us than ever. Virtual texturing also frees our artists and level designers completely from texture memory budgets. This helps content production, and is especially important for user created content. We don't want to limit the creativity of our users by limiting the variations of objects they could use in a single level.
In the in-game editor, you can freely roam around the 16 square kilometre game world. It would be impossible to keep the whole world in memory at once. Streaming is, thus, very important for seamless navigation inside the editor, and also inside some user-created games, such as first person shooters or large scale adventure games.
"Next-gen consoles... have more than 10x available memory compared to the Xbox 360. However the speed of hard drives in general hasn't improved that much in the last ten years, and now we need to support 1080p and beyond."
But interestingly, the thing that other developers most frequently ask me about virtual texturing is decal rendering. Virtual texturing allows us to blend all the decals to the texture page cache (and reuse them for hundreds of frames) instead of blending them every frame (at 60fps) to the back buffer. This saves a huge amount of GPU cycles. Because the decals are dirt cheap to render, artists can use a lot of them, allowing for a much more varied look. Developers are getting tired of repeating tiled texture surfaces, and virtual texturing can help them to get the variety they need. As soon as developers understand that they don't need to bake everything to the disc like id Software did in Rage in order to achieve high texture variety, they will become much more interested in virtual texturing.
We recently noticed that the console launch version and the first day update of Trials Fusion had a serious bug regarding data streaming and frame-rate optimisations. Our latest shader optimisations weren't included in these builds because of a build script error. These issues will be fixed in the next patch and finally the players can enjoy the locked 60fps also on Xbox One, and much reduced texture popping on PS4.
The FMX stunt system is completely physics based. Our rider is a powered ragdoll connected to the bike. We control the rider physics joints, by emulating forces that a real human being would do if he wanted to change his pose. As our whole world is already physics based, the trick system didn't require any changes to the core physics engine itself.
In Trials Fusion we moved to a physically-based HDR rendering pipeline. This has been a common trend among AAA next-gen console titles. This new pipeline allows artists to do materials that work properly in all lighting environments, no matter where you place them. This is especially important for us, since user created content is an important part of the Trials games.
Both glossy and shiny materials now look much more natural compared to the old lighting model. We also emulate the human eye iris adjusting to changing lighting conditions. Bloom and tone mapping were also completely rewritten to provide the next-gen quality we were looking for.
Our new tone-mapping algorithm separates the luminance and chromaticity of the pixel color to eliminate the saturation loss problem caused by the commonly used RGB tone-mapping algorithms. This was really important for us, because we didn't want washed out colours in our bright new future world. The difference in colour quality is striking compared to the tone mapping algorithms generally used by last-generation games.
We also added high quality multi-resolution horizon based ambient occlusion (MHBAO). This algorithm separates local (high frequency) and distant (low frequency) occluders. The result is highly convincing and natural looking. The ambient occlusion system is fully integrated to our physically-based lighting pipeline and plays a big part in areas that are not directly hit by a light source.
On the Xbox 360 we use our own custom version of FXAA. We apply the filtering only to high contrast areas to eliminate the blurriness often associated to post-process anti-aliasing filters. This quality improvement also increases the algorithm performance, so this choice was a no brainer really.
On next-generation consoles we also use FXAA at launch, because we had to prioritise resolution and frame-rate over antialiasing quality. FXAA looks surprisingly good on 1080p at a locked 60fps. The smaller pixel size reduces the problems caused by the lack of sub-pixel information, and the locked 60 fps frame-rate reduces the edge-crawling problem (compared to 30fps games). In general I am now much happier about the quality of post-process antialiasing than I was in last-generation (sub-HD, often 30fps) games.
More advanced algorithms such as SMAA and CMAA provide a minor quality improvement over (properly configured) FXAA at a minor performance cost. We have been evaluating various algorithms, and it is likely that we will switch to a better algorithm in a forthcoming patch. There are multiple feature updates planned for the game after launch, so we still have plenty of time to do small improvements to the rendering pipeline.
"Both competing consoles are now closer to each other than ever. While the last-generation consoles required a lot of custom console-specific optimisations, now most of the optimisations help both of them."
Actually that 32MB discussion wasn't about our engine; it was about an unoptimised prototype of another developer. The general consensus in that discussion thread was that it is not possible to fit a fully featured G-buffer to a 32MB memory pool. I, of course, had to disagree, and formulate a buffer layout that had all the same data tightly packed and encoded to fit to the target size.
Modern GPUs, such as AMD Graphics Core Next (GCN), have full rate integer processing (except for 32-bit integer multiplies) and are able to perform combined shift + mask instruction in a single cycle. With these tools, we can do very fast bit packing. For every value stored to the memory, you should analyse the numeric distribution and range, and determine the best mapping to encode it with the least amount of bits while still preserving the desired quality. Packing data tightly is very important in achieving the best possible performance, as bandwidth is usually the limiting factor in GPU compute kernels on modern hardware.
With compute shaders you can also do processing 'in-place' (output data to same buffer as input) just like you usually do in cache-optimised CPU code. For example you can write your RGBA16F lighting output on top of your first two G-buffers, and save eight bytes per pixel storage cost. There are also two additional performance bonuses with this method: It guarantees that memory writes always occur to L1 cache, as the G-buffer read from the same buffer has just loaded that cache line to L1. Also direct memory writes from a compute shader sidestep ROPs completely. You will never be fill-rate bound this way.
We had our own unique way of maximising the ESRAM usage. We used an Excel sheet to track the lifetime of each resource during a frame. We split the frame into four passes (shadows, G-buffer rendering, lighting and post processing), and then tried to get as many as possible live resources simultaneously to ESRAM in each pass, while trying to keep as many resources in ESRAM that were needed later. This was a really successful strategy and allowed us to utilise over 95 per cent of ESRAM space in three out of our four passes. We have also planned to automate this process in the future using an algorithm similar to those used by compilers to do register allocation and spilling.
Both competing consoles are now closer to each other than ever. While the last-generation consoles required a lot of custom console-specific optimisations, now most of the optimisations help both of them.
Optimising the render target size to fit it better to the fast ESRAM scratchpad reduces bandwidth cost and that boost performance on PS4 and PC GPUs as well. Optimising for data locality helps all GPUs with caches. Intel has quite big L3 (and even L4) caches in their GPUs and Nvidia's new Maxwell GPUs have 8x bigger L2 caches than their older (mainstream) Kepler GPUs. Writing memory/cache optimised code has become really important for GPUs as well, and the trend seems to be continuing.
"Launch games never show the true long term potential of the consoles... Developers needed to start programming their next-gen engines before they have access to the final hardware. Lots of educated guesses must be made, and hitting them all right isn't easy."
Launch games never show the true long term potential of the consoles. Locked 60fps is a very hard goal for any launch title. Developers needed to start programming their next-gen engines before they have access to the final hardware. Lots of educated guesses must be made, and hitting them all right isn't easy.
In our case we started at 720p on both next-gen consoles, because we wanted to ensure that our game play programmers could fine tune the game mechanics and physics using a build that was running smoothly. Hitting our target frame rate (60fps) was more important for us than hitting a certain resolution at the beginning of the project.
At the end, we got very close to platform parity between the next-generation consoles. Both consoles are running the game at locked 60fps, with identical shader and effect quality and with identical content (textures, models and levels). Rendering resolution is the only difference between the platforms. PS4 renders at slightly higher 1080p resolution than Xbox One (900p).
We are proud about the rendering resolutions we achieved on both next-generation consoles, as there are only a handful of games that have achieved similar resolutions on either console at locked 60fps.
For Trials Fusion, more GPU performance was the preferred choice over more CPU performance, as it is a cross-generation game. The game logic had to be designed as cross-generation, because we wanted to have all the same levels available on all platforms. Graphics quality on the other hand is easy to scale up without messing up the gameplay, and this is where a fast GPU really helps.
We had to run the same code base on Xbox 360 and on next-generation consoles. Fortunately Xbox 360 has six hardware threads, so each thread could be easily mapped to a separate physical CPU core on next-gen consoles. In addition to our six main threads, we are running one worker thread per core. This system is used to process jobs that don't have tight timing requirements, such as data streaming and terrain mesh generation.
"Currently OpenGL is the most feature-rich graphics API on PC. OpenGL 4.4 exposes most of the new hardware features of AMD GCN and Nvidia Kepler GPUs that are not yet exposed in DirectX 11."
Generally the new CPUs were running our old PPC-optimised code very well. We only had to rewrite a few VMX128 optimised loops using AVX instructions to allow higher number of simultaneous active animations and physics objects. In the end we decided to double the complexity limitations of our in-game editor compared to the Xbox 360 version, allowing the users to build larger and more dynamic tracks for the next-generation consoles.
I love GPU compute! You can do many things more efficiently using compute than using a pixel shader. Unfortunately for Trials Fusion we couldn't use that much GPU compute, as we needed to run the same game also on Xbox 360 and on DirectX 10.1 compatible PCs.
However things will change radically in the future when we no longer need to support last-generation consoles and DirectX 10 PC GPUs. At that point we can run the whole graphics engine inside the GPU, freeing CPU cores for improved physics simulation and gameplay, and simultaneously allowing massive rendering performance improvements. I am eagerly waiting to see what kinds of crazy things developers will achieve with these new consoles once we know them as well as we know the last-generation.
Next-generation console GPUs are quite close to modern PC GPUs. Many shader optimisations done on consoles also help all modern PC GPUs. On the CPU side, we now have out of order execution and the same x86-64 and AVX instruction sets on both PC and consoles, making it easy to directly port most algorithms (and optimisations) between the platforms.
Currently OpenGL is the most feature-rich graphics API on PC. OpenGL 4.4 exposes most of the new hardware features of AMD GCN and Nvidia Kepler GPUs that are not yet exposed in DirectX 11. Features such as indirect multi draw call, bindless resources and sparse textures are very important for us in the future.
DirectX 12 is expected to expose pretty much the same GPU features as OpenGL 4.4 does, while reducing the draw call overhead near to Mantle level. Add solid driver support and cross-vendor GPU support (Nvidia, AMD and Intel are all backing DirectX 12) and the other options are not looking that interesting anymore.
OpenGL 4.4 still remains a solid choice if you need to support older operating systems, and makes porting to new platforms such as Steam Machines easier. It's definitely going to be an interesting battle, but there are still too many unknowns to predict the result yet.
Microsoft has announced that DirectX 12 has several efficiency improvements over DirectX 11. It seems to be a very well-designed API. As a long time console developer, I love to get my hands dirty with the low-level resource handling and data synchronisation also on PC. This will allow developers to create games that will never drop frames. On current high-level PC APIs, you can get unexpected stalls, because the GPU driver chooses to do memory reallocation or transfer some data unexpectedly though the slow PCI Express bus.
Xbox 360 got a big boost from the low-level graphics API. We managed to hit up to 10k draw calls per frame (at 60fps) in Trials Evolution using the low-level Xbox 360 graphics API, as discussed in our earlier interview. We are eagerly waiting to get our hands dirty with DirectX 12. It's definitely possible that Xbox One will also get a performance boost from a new low-level API.
"We are eagerly waiting to get our hands dirty with DirectX 12. It's definitely possible that Xbox One will also get a performance boost from a new low-level API."
If we have the same API on both the console and PC, porting and code maintenance will be also easier. However consoles have unified memory and PC doesn't, so there still needs to be multiple code paths, for example with data streaming from HDD to GPU. Same is true for any CPU+GPU interoperation. If you need to move the data between them, you most likely want to select a different algorithm for PC, because the PCI Express bandwidth and latency are very slow compared to direct unified memory access of consoles.
The Trials Evolution code base was evolved from the Trials HD code base. Both were designed heavily around Xbox 360 architecture, and most of the code didn't even compile on PC. When the Trials team started working on Fusion, a secondary team had to port this difficult console-centric code base to PC. This would have been a difficult task for any team. I think they handled it quite well in these difficult circumstances.
Trials Fusion, on the other hand, has been developed for PC from the start. The first thing we did was port our engine to PC and DirectX 11 and re-factor our resource management. Because we are using data streaming so heavily, the resource management code that had been hand-optimised for the console's unified memory architecture became a huge performance problem for PC. We had to handle dynamic resources quite differently on PC and make many changes to our virtual texturing implementation to get the fastest performance out of the PC architecture.
The PC was our lead platform during the first half of the project. All of our new next-generation rendering techniques were first programmed on PC, because next-generation hardware wasn't available at that time. The result is very impressive. We can now run next-gen graphics on the same PC computers that had problems running Trials Evolution: Gold Edition, ported from Xbox 360 one year ago.
I have been personally involved in the technical discussion with players having problems in our beta forums. We have already fixed compatibility problems on laptop configurations with Nvidia Optimus switchable graphics, fixed graphics corruption issues on Intel integrated GPUs and optimised the PC rendering performance by 10-40 per cent, allowing us to lower the PC minimum hardware requirements.
My five-year-old home computer (Core 2 Quad 2.4GHz with Radeon 5850) now runs Trials Fusion smoothly at a locked 60fps at 1680x1050 - the native resolution of my display. We have also announced to bring the same six DLCs and a season pass to the PC version of the game. The PC version will be supported for a long time.
"We got very close to platform parity between the next-generation consoles. Both consoles are running the game at locked 60fps, with identical shader and effect quality and with identical content (textures, models and levels)."
The graphics settings screen wasn't working in the first PC beta build, meaning all settings resulted in the same quality. So you didn't see any PC specific extra effects yet. For PC ultra settings, we have played with ideas to enable higher quality particle effects, better bokeh depth of field effect, improved anti-aliasing and longer view distance.
We can now allow our users to do more complex creations, because the next-generation consoles have more memory and faster CPUs. Levels can be larger and can contain more physics and animation. We also implemented a new keyframe animation system to make it much easier to populate the levels' backgrounds with large amounts of animated objects. This also helped our own level production a lot. The levels feel much more dynamic than they were previously.
Now that we are part of Ubisoft, we also have access to their servers, and we can customise the server side code to meet our needs better. We have planned to bring many improvements to Track Central after the game launches.
The hardcore gamer inside me enjoyed the super hard 'ninja-difficulty' tracks very much. Some of the best ninja tracks looked as good as our own creations, and had many innovative obstacles.
As a programmer, I was really surprised by the quality and innovation of the skill games. Users replicated many arcade hits, such as Tetris, WipEout and Missile Command using the in-game editor. But the drum machine that allowed you to record your own plays to the leaderboard, and play them back as replays was the most memorable one. It utilised both our in-game editor tools and our deterministic online replay system to the fullest.
Trials Fusion has been designed to support continuous feature improvements and content additions. So far we have announced that we are going to bring at least six DLCs and a season pass for all platforms. The season pass is priced very affordably. Divide that €19.90 by six and you get €3.33 per DLC. We are also planning to bring many post launch feature updates to the game for free.
"From the beginning of the project it has been clear that we wanted to bring the full Trials Fusion experience to the Xbox 360: All the environments, all the levels, same bikes (handling identically) and locked 60fps gameplay."
Xbox 360 is still a very important platform for us. Trials HD and Trials Evolution are both among the most sold Xbox Live Arcade games, and many of our fans haven't yet moved to the next-generation consoles.
From the beginning of the project it has been clear that we wanted to bring the full Trials Fusion experience to the Xbox 360: All the environments, all the levels, same bikes (handling identically) and locked 60fps gameplay. That wasn't an easy task. Last-generation consoles weren't designed with HDR (floating point) rendering in mind, and we definitely needed a full HDR pipeline for our new physically based lighting model. On Xbox 360 we had to use RGBM encoding in multiple stages of our pipeline to avoid performance bottlenecks. This had its own complications and required multiple workarounds. However the end result is very good.
The Xbox 360 uses the same assets as the next-generation consoles. Our tools handle some data reductions automatically, like baking ambient occlusion into the textures instead of calculating it dynamically, automatically removing one geometry LOD level and one texture mip level from our virtual texture to reduce the data size enough to fit into Xbox 360 memory and the 2GB limit of the XBLA downloadable package.
There are many clever Xbox 360 optimisations. We dynamically generate a 16x16x16 3D lookup texture that converts the colour from tone-mapped HSL (hue, saturation, luminance) colour-space to the final gamma corrected sRGB colour-space and simultaneously applies exposure compensation, color tinting, saturation and contrast adjustments to the pixel color. All this heavy math can thus be replaced with a single texture read instruction. This saved lots of ALU instructions from our post-processing shader, and allowed us to use the new high quality tone-mapping pipeline also on Xbox 360. The lookup table generation is very fast (it's only 4096 pixels in total), so we refresh it every frame to reflect changing environment and camera properties. As an additional bonus this lookup solved the usual Xbox 360 PWL gamma ramp black crush problems (providing a 16 piece linear ramp instead of the default 4 piece linear ramp).
We also implemented a new layering feature to the in-game level editor tool to help our level designers. This feature allows the level designers to create the level driving line and gameplay as a single shared layer, and put the background decoration into a separate layer. Level designers can then easily switch layers on/off in the editor without needing to reload the level, making it easy and productive to create the same level simultaneously for both next-generation and Xbox 360. Next-generation background and decoration layers have more objects and more dynamic action, while the same gameplay layer is used on both versions of the game.
Sebastian Aaltonen is lead graphics programmer at RedLynx.