Inside Unreal Engine 5: how Epic delivers its generational leap
A closer look at the new Nanite and Lumen technology.
Epic's reveal of Unreal Engine 5 running in real-time on PlayStation 5 delivered one of the seismic news events of the year and our first real 'taste' of the future of gaming. A true generational leap in terms of sheer density of detail, alongside the complete elimination of LOD pop-in, UE5 adopts a radical approach to processing geometry in combination with advanced global illumination technology. The end result is quite unlike anything we've seen before, but what is the actual nature of the new renderer? How does it deliver this next-gen leap - and are there any drawbacks?
Watching the online reaction to the tech trailer has thrown up some interesting questions but some baffling responses too. The fixation on the main character squeezing through a crevice was particularly puzzling but to make things clear, this is obviously a creative decision, not a means to slow down the character to load in more data - it really is that simple. Meanwhile, the dynamic resolution with a modal 1440p pixel count has also drawn some negative reaction. We have access to 20 uncompressed grabs from the trailer: they defy traditional pixel counting techniques. When the overall presentation looks this good, this detailed, with solid temporal stability (ie, no flicker or shimmer frame to frame), resolution becomes less important - the continuation of a trend we've seen since the arrival of the mid-generation console refreshes. As we said almost two years ago now, next-gen shouldn't be about 'true 4K', the game has moved on and put it frankly - GPU resources are better spent elsewhere.
Some interesting topics have been raised, however. The 'one triangle per pixel' approach of UE5 was demonstrated with 30fps content, so there are questions about how good 60fps content may look. There have also been some interesting points raised about how the system works with dynamic geometry, as well as transparencies like hair or foliage. Memory management is a hot topic too: a big part of the UE5 story is how original, full fidelity assets can be used unaltered, unoptimised, in-game - so how is this processed? This, in turn, raises further questions about the storage streaming bandwidth required, an area where PlayStation 5 excels. So, to what extent is the Lumen in the Land of Nanite tech demo leveraging that immense 5.5GB/s of uncompressed memory bandwidth? Hopefully we'll learn more soon.
Core to the innovation in Unreal Engine 5 is the system dubbed Nanite, the micro-polygon renderer that delivers the unprecedented detail seen in the tech demo. The concepts of the micro-polygon engine aren't new - they are used extensively in movie CGI. The starting point is the same, but the execution in games is different. Models are authored at that same very high quality with millions of polygons per model, but no lower quality models with bespoke normal maps are created for in-game usage. In Nanite, the geometric detail of that high quality model is scaled up and down in real time. Higher detail does not have to be filled in with a baked out normal map - the previous way game models were made. Key Epic Games staff members helped to further our understanding of the technology by answering a few key questions.
"With Nanite, we don't have to bake normal maps from a high-resolution model to a low-resolution game asset; we can import the high-resolution model directly in the engine. Unreal Engine supports Virtual Texturing, which means we can texture our models with many 8K textures without overloading the GPU." Jerome Platteaux, Epic's special projects art director, told Digital Foundry. He says that each asset has 8K texture for base colour, another 8K texture for metalness/roughness and a final 8K texture for the normal map. But this isn't a traditional normal map used to approximate higher detail, but rather a tiling texture for surface details.
"For example, the statue of the warrior that you can see in the temple is made of eight pieces (head, torso, arms, legs, etc). Each piece has a set of three textures (base colour, metalness/roughness, and normal maps for tiny scratches). So, we end up with eight sets of 8K textures, for a total of 24 8K textures for one statue alone," he adds.
Since detail is tied to pixel amount in screen size, there is no more hard cut-off - no LOD 'popping' as you see in current rendering systems. Likewise, ideally, it should not have that 'boiling' look like you can see with standard displacement as seen in with ground terrain in a game like 2015's Star Wars Battlefront (which still holds up beautifully today, it has to be said). And lastly, it can represent horizontal expansions and decimations of detail - so it can have individual rocks on the ground increasing and decreasing their quality in a more natural way. Ultimately, this micropolygon method makes a large difference to asset creation as level of detail versions of models and normal maps no longer have to be authored, saving time, memory, and even draw calls for the various versions - though the cost of the very large models in memory to offset this remains unknown.
Also key to Nanite's impressive fidelity is how micro-detail is shadowed - essential in grounding everything in the game world and key to achieving a realistic presentation. It's the reason why small detail works so well in a game like the latest Call of Duty Modern Warfare, where standard shadow maps are too low-resolution to realistically ground micro detail, and where ray tracing does such a good job by comparison. In lieu of triangle-based hardware-accelerated ray tracing, te UE5 demo on PlayStation 5 utilises screen-space as seen in current generation games to cover small details, which are then combined with a virtualised shadow map.
"There is a very short screen-space trace to get the sample point off of the surface to avoid biasing or self-shadowing artifacts," explains Brian Karis. "These are commonly called 'Peter Panning', where the shadow doesn't appear to be connected to the caster because it was biased away, and shadow acne in the opposite direction where a shadow map wasn't biased enough and a surface partly shadows itself, oftentimes in a dotted or striped pattern. We avoid both with our filtering method, which uses a short screen-space trace in combination with a few other tricks.
"Really, the core method here, and the reason there is such a jump in shadow fidelity, is virtual shadow maps. This is basically virtual textures but for shadow maps. Nanite enables a number of things we simply couldn't do before, such as rendering into virtualised shadow maps very efficiently. We pick the resolution of the virtual shadow map for each pixel such that the texels are pixel-sized, so roughly one texel per pixel, and thus razor sharp shadows. This effectively gives us 16K shadow maps for every light in the demo where previously we'd use maybe 2K at most. High resolution is great, but we want physically plausible soft shadows, so we extended some of our previous work on denoising ray-traced shadows to filter shadow map shadows and give us those nice penumbras."
We were also really curious about exactly how geometry is processed, whether Nanite uses a fully software-based raw compute approach (which would work well across all systems, including PC GPUs that aren't certified with the full DirectX 12 Ultimate) or whether Epic taps into the power of mesh shaders, or primitive shaders as Sony describes them for PlayStation 5. The answer is intriguing.
"The vast majority of triangles are software rasterised using hyper-optimised compute shaders specifically designed for the advantages we can exploit," explains Brian Karis. "As a result, we've been able to leave hardware rasterisers in the dust at this specific task. Software rasterisation is a core component of Nanite that allows it to achieve what it does. We can't beat hardware rasterisers in all cases though so we'll use hardware when we've determined it's the faster path. On PlayStation 5 we use primitive shaders for that path which is considerably faster than using the old pipeline we had before with vertex shaders."
The other fundamental technology that debuts in the Unreal Engine 5 technology demo is Lumen - Epic's answer to one of the holy grails of rendering: real-time dynamic global illumination. Lumen is essentially a non-triangle ray tracing based version of bounced lighting - which basically distributes light around the scene after the first hit of lighting. So, in the demo, the sun hits a surface like a rock, informing how it should be shaded, with light bouncing from the rock, affected by its colour. The demo delivers radical transformations of scene lighting in real-time - and there's even a short section in the video dedicated to showing the effect turned on and off.
Most games approximate global illumination by pre-calculating it through a system called light maps, generated offline and essentially 'baked' into the scene via textures. With this, a scene has global illumination, but lights cannot move and the lighting and objects affected by it are completely static. The lighting is essentially attached to the surface of the objects in the game scene. In addition to this, this lightmap only affects diffuse lighting, so specular lighting - reflections like those found on metals, water and other shiny materials - have to be done in a different manner, through cube maps or screen-space reflections.
4A Games' Metro Exodus offers up a potential solution to this with hardware accelerated ray tracing used to generate dynamic global illumination, but the cost is significant - as is the case with many RT solutions. Lumen is a 'lighter' real-time alternative to offline light map global illumination that uses a combination of tracing techniques for the final image. Lumen provides multiple bounces of indirect lighting for sunlight and in the demo, the same effect is seen from the main character's flashlight.
"Lumen uses ray tracing to solve indirect lighting, but not triangle ray tracing," explains Daniel Wright, technical director of graphics at Epic. "Lumen traces rays against a scene representation consisting of signed distance fields, voxels and height fields. As a result, it requires no special ray tracing hardware."
To achieve fully dynamic real-time GI, Lumen has a specific hierarchy. "Lumen uses a combination of different techniques to efficiently trace rays," continues Wright. "Screen-space traces handle tiny details, mesh signed distance field traces handle medium-scale light transfer and voxel traces handle large scale light transfer."
Lumen uses a combination of techniques then: to cover bounce lighting from larger objects and surfaces, it does not trace triangles, but uses voxels instead, which are boxy representations of the scene's geometry. For medium-sized objects Lumen then traces against signed distance fields which are best described as another slightly simplified version of the scene geometry. And finally, the smallest details in the scene are traced in screen-space, much like the screen-space global illumination we saw demoed in Gears of War 5 on Xbox Series X. By utilising varying levels of detail for object size and utilising screen-space information for the most complex smaller detail, Lumen saves on GPU time when compared to hardware triangle ray tracing.
Another crucial technique in maintaining performance is through the use of temporal accumulation, where mapping the movement of light bounces occurs over time, from frame to frame to frame. For example, as the light moves around in the beginning of the demo video, if you watch it attentively, you can see that the bounced lighting itself moves in a partially staggered rate in comparison to the direct lighting. The developers mention 'infinite bounces' - reminiscent of surface caching - a way to store light on geometry over time with a sort of feedback loop. This allows for many bounces of diffuse lighting, but can induce a touch of latency when the lighting moves rapidly.
Lumen is fascinating and the quality of the results speak for themselves and similar to Nanite, what we like about the presentation of these technologies in the tech demo is their authenticity. The impact is overwhelming, but as you start to peel back the layers, how Epic achieves this level of fidelity starts to become a little clearer. The firm has been transparent about how it achieves its results, went straight to real-time rendering with its engine reveal (no 'in-engine' chicanery) and has been open about performance and limitations. Fundamentally, it's the best approach for a firm that's all about supplying crucial tools to the games industry and beyond.
Due to the nature of Unreal Engine itself, there's nowhere to hide - the engine and its source will be fully available. Does the new technology have some kind of innate limitation? If so, we'll find out when it launches, if not sooner via preview feedback, but we'd be surprised if Epic doesn't deliver. Let's remember the fundamental nature of Unreal Engine: it's a versatile toolbox designed for developers to deliver virtually any type of game. Producing a next-gen renderer without a similar level of flexibility would be self-defeating for Epic if its applications are limited. That said, the notion of transitioning away from current modelling techniques to Megascans does sound like a seismic shift in the way games are made with some profound implications, as has been noted.
Of course, We're still around six months away from the release of the new consoles, while Unreal Engine 5 itself isn't set for its public debut until early 2021. The latter point suggests that there's still a lot of engineering work in the offing, but in the short term at least, Brian Karis says that further details on Lumen, Nanite and the creation of the tech demo will arrive shortly - and we can't wait to find out more.