Inside the Scorpio Engine: the processor architecture deep dive
How the chip that powers the next Xbox was made.
Editor's note: This one is for the hardcore. We've already covered Project Scorpio's hardware specs in broader detail, and posted critical analysis of everything we've seen so far, but for those of you hungry for more detail, who want to know absolutely everything shared with us, this is the place to be. We'll be running a similar deep dive on the construction of the retail console tomorrow.
If there's a recurring theme in our discussions with the hardware architects of Project Scorpio's new processor, it's customisation. And to be fair, PlayStation 4 system architect Mark Cerny made very similar points when I met up with him last year to discuss PS4 Pro. "It's not a process of calling up AMD and saying I'll take this part, this part and this part," says Kevin Gammill, Group Program Manager of the Xbox Core Platform. "A lot of really specific custom work went into this."
Of course, the base hardware designs across the various components and blocks within the Scorpio Engine SoC (system on chip) are indeed based on technology derived from AMD - the CPU technology has been customised to the point where Microsoft doesn't refer to them as Jaguar architecture any more, but that is clearly the starting point from which the Project Scorpio design began. Similarly, Scorpio's Radeon graphics core has features from AMD's latest Polaris architecture - but there is no equivalent part to it in the PC space. We've moved on from PS4 and Xbox One, where the basic GPU configurations (compute units, texture units, ROPs) at least mapped relatively closely to off-the-shelf PC parts.
"It's a completely unique design... you wouldn't be able to buy this anywhere else and really, we created this is in conjunction with AMD and it is a nice unique part for Scorpio," says Nick Baker, Distinguished Engineer, Silicon.
"The few high-level constraints and goals in the programme were to really say, we wanted to be the most powerful 4K gaming console. The other key area was to maintain full back-compat with Xbox One and One S titles and also to retain features we added in on those consoles to allow Xbox 360 compatibility as well. If you look at how we came up with the architecture those were the most important goals we wanted to keep in mind," Baker continues. "Key differentiators for Xbox that we wanted to retain and extend for Scorpio: we have CPU/GPU coherency, the GPU virtualisation support, the audio processing that's being used for spatial audio. We have the powerful and flexible display output processor we're using for super-sampling for 1080p TVs, for example, and we kept and extended and improved the GameDVR support too."
Years before any silicon arrived back from chip manufacturing giant TSMC, the Xbox team began by carrying a vast range of simulation and analysis. As Project Scorpio is effectively a mid-generation refresh - an extension of the existing console designed primarily for 4K screens - existing game code captured at a granular level via the PIX (Performance Investigator for Xbox) tool could be run on potential hardware designs, well before Microsoft went to AMD.
"We wanted to run simulations to make sure we were meeting performance goals and also back-compat, and to verify that the design was going to work correctly," says Nick Baker. "We took the full design database in-house. We had several hardware emulators that let us speed up the simulations. Typically, you get something like 1Hz or one cycle per second if you run on standard simulators. These things let us now run at half a MHz or more, which is 500 thousand cycles per second. We had these running 24/7 for several months, calculating effectively one hundred trillion cycles - it's absolutely amazing what we managed to achieve here. There are parts of the design that were designed at Microsoft. Microsoft engineers handed logic to AMD for integration."
Project Scorpio | Xbox One | PS4 Pro | |
---|---|---|---|
CPU | Eight custom x86 cores clocked at 2.3GHz | Eight custom Jaguar cores clocked at 1.75GHz | Eight Jaguar cores clocked at 2.1GHz |
GPU | 40 customised compute units at 1172MHz | 12 GCN compute units at 853MHz (Xbox One S: 914MHz) | 36 improved GCN compute units at 911MHz |
Memory | 12GB GDDR5 | 8GB DDR3/32MB ESRAM | 8GB GDDR5 |
Memory Bandwidth | 326GB/s | DDR3: 68GB/s, ESRAM at max 204GB/s (Xbox One S: 219GB/s) | 218GB/s |
Hard Drive | 1TB 2.5-inch | 500GB/1TB/2TB 2.5-inch | 1TB 2.5-inch |
Optical Drive | 4K UHD Blu-ray | Blu-ray (Xbox One S: 4K UHD) | Blu-ray |
"We did multiple PIX captures from every single game and ran them on the emulator," Andrew Goossen, Technical Fellow, Graphics, tells us - a process that proved invaluable for validating Scorpio's back-compat capabilities. "We did over 30,000 emulator runs, which is a big contributor to Nick's total cycle count because we had to make sure that we were going to land with that 100 per cent compatibility [with Xbox One]."
As we move deeper into the Scorpio Engine's hardware design during our presentation, the basic layout of the chip is displayed on-screen in the meeting room. It only represents two mask layers out of about 60 in the complete design, but it reveals the general plan of the full processor. It's a 360mm2 die and size-wise, it's very similar to the Xbox One processor's 363mm2, but the geography is very, very different. The move from 28nm planar fabrication to TSMC's 16nm FinFET coupled with the omission of ESRAM represents a radical change overall. More components now fit in the same area of the silicon and four clusters of Radeon compute units dominate. The GPU block sits to the left, with two, much smaller quad-core CPU clusters to the right, occupying a fraction of the overall area. There's a total of 4MB of L2 cache.
"We needed to run a fair amount of simulation to figure out what really we wanted to achieve with the CPUs as far as the design was concerned," says Baker. "In dealing with a full custom CPU design, you can't just go and say 'super-size this'. It's not that easy. Try doing that and quite often you'll find that somewhere else, you'll get counter intuitive results from that. We really wanted to be targeted with what we wanted to do."
It's at this point in the presentation where it's clear that the hopes of the most hardcore users weren't going to be realised - there is no Ryzen technology in Project Scorpio. Timelines, cost, area on the die, not to mention Microsoft confirming eight CPU cores last year had realistically ruled that out anyway - but hopes had lingered on, with shots like this stoking expectation.
"On the CPU side of things, we could still meet our design goals with the custom changes we made. At the end of the day we are still a consumer product. We want to hit the price-points where consumers want to purchase this. It's about balancing the two," explains Kevin Gammill.
It's a reminder that however advanced the hardware may be overall, 'bang for the buck' remains a crucial factor in the physical make-up of a new console processor. The Microsoft architects drew on Xbox One's existing customised design and aimed to double down on areas where clear performance wins were possible.
"Typically for CPU, the top two items are frequency and memory latency. If the CPU has data, the faster it can process it, the quicker the result, but it also means that if it doesn't have the data, it sits there idle, so latency is a big component. On frequency, we pushed it up to 2.3GHz" explains Nick Baker "On the latency, a couple of the areas we tackled, one was all the queues coming back from the memory interface, we sped those up as well. Specifically, within the core, because we're running a virtualised OS environment, we wanted to optimise how memory translation operations happen so there are some key changes inside the core to speed those things up. The end result is that not only does the CPU run faster, it also runs more efficiently meaning more power for you at the end."
Adding to the list of enhancements, Microsoft increased performance in CPU/GPU coherency and enhanced and improved the speed of the GPU command processor to offload a lot of work from the CPU too, specifically with DirectX 12 engines. However, looking at the layout of the Scorpio Engine, the proportion of the space occupied by the GPU dwarfs the CPU area. The ratio is much, much larger than both Xbox One and PlayStation 4.
"There are four shader engines on Scorpio, each shader engine has ten CUs," Baker explains. "The design from AMD lets us choose a lot of options in terms of number of SEs, CUs [compute units], the render back-end, the RBs that do the pixel blending, the cache sizes... and rather than make those as high as possible we needed to understand what was the best bang for the buck... Aside from that there were also 60 or so specific targeted changes throughout the pipeline. Everything from various memory sizes, queue sizes, features to make sure that back-compat went as smoothly as possible as well."
In actual fact, there are 11 compute units in each shader engine. However, one CU in each SE is disabled: it allows processors with small defects to still make their way into final consoles (fully enabled chips are available only for developers in the new Scorpio dev kit). The Scorpio Engine's GPU is targeted specifically for 4K rendering with ultra HD textures, with support for HDR and wide colour gamut. Microsoft's guiding principles involved making it as easy as possible for developers to access the full power of the hardware.
"This is reflected in everything. You'll see it in the developer kit we give to developers. You'll see it in terms of the performance goals - we want to make it easy as possible for developers to show that ultimate true 4K quality," says Andrew Goossen. "The other guiding principle was to minimise the work in terms of features. We didn't want developers to go and take advantage of some quirky new console-specific hardware feature to get that performance. Our guiding principles were around making it easy as possible for developers to showcase and exhibit the performance and power of Scorpio."
The Xbox One architecture was chosen as the base to better facilitate backwards compatibility, but similar to Sony's PlayStation 4 Pro, Microsoft had the option to factor in features from later Radeon architectures. There's some crossover in the elements both platform holders factored into their custom designs, but a good degree of variance too.
"We have Polaris features in Scorpio that we've picked up. Some of the big ones are delta colour compression, so that helps us out on our bandwidth, both for 4K textures and 4K rendering solutions to achieve that," says Goossen. "It's typically quite easy for the developers to integrate and then also more transparently we picked up some geometry and quad-scheduling improvements AMD has done in the Polaris architecture."
According to Goossen, some performance optimisations from the upcoming AMD Vega architecture factor into the Scorpio Engine's design, but other features that made it into PS4 Pro - for example, double-rate FP16 processing - do not. However, customisation was extensive elsewhere. Microsoft's GPU command processor implementation of DX12 has provided big wins for Xbox One developers, and it's set for expansion in Scorpio.
"Despite that, we're actually faster on the GPU too. You might think, oh you're sending more commands to the GPU now, maybe you're slowing down the GPU," suggests Goossen. "Well, very rarely are we draw-bound on the command processor. And the nice thing is that even when we are draw-bound now in D3D12, again we are more efficient even from the GPU perspective, because we're built-in. We don't have a very big and noisy and abstract interface we have to deal with. We just have the logic built right into the command processor, and in the command processor we can do more optimisations than we can in the driver."
But it's Project Scorpio's status as a console designed to enhance the current-gen titles with 4K functionality that helped most in defining the characteristics of the silicon. Typically, platform holders create console hardware and the developers do their best to fully exploit it. This time around, Microsoft could profile the game engines that would run on Scorpio and optimise the design to get the most out of the content.
"What we did was to take representative PIX captures from all of our top developers," says Goossen. "By hand we went through them and then extrapolated what the work involved would be for that game to support a 4K render resolution. We didn't want to apply a blanket upscale on all render targets, because as you know there are various intermediate renderings that will not scale as much. We went through those by hand and marked places where we didn't need to do resolves anymore because we have DCC, and there's other things we can save."
The Radeon GPU architecture can be built into a final console design with a range of different hardware configurations, and armed with this profiling data, Goossen and his team could effectively test-run the code on prospective designs before going to AMD with a target spec.
"Now we had a model for all of our top-selling Xbox One games where we could tweak the configuration for the number of CUs, the clock, the memory bandwidth, the number of RBs [render back-ends], the number of SEs [shader engines], the cache size," he says. "We could tweak our design and figure out what was the most optimal configuration. It was incredibly valuable for us to be able to make those trade-offs because ultimately these Xbox One titles are the ones that we wanted to get up to 4K."
The final Scorpio configuration achieves its performance objectives by hitting six teraflops of rendering power, and there's a clear emphasis on pushing GPU frequencies significantly higher than anything we've seen in any console powered by core AMD technology.
"This has two wonderful virtues from my perspective - as you know, the clock drives all the various different parts of the pipeline so it raises all boats," explains Andrew Goossen. "I don't get imbalances in my pipeline or introduce new bottlenecks or anything like that. The second one is that for the pixel pushing power we didn't need as much area, we didn't need as many CUs to hit that. It saves area - a pretty important consideration. We were 853MHz in Xbox One, we dialed it up to 1.172 GHz (1172MHz). That's a 37 per cent increase in clock, more than our CPU clock relatively. The next big one: we have 40 CUs. When you take 1172 multiplied by 40 multiplied by 64 for ops multiplied by 2 FLOPS per op, you get exactly 6.0TF."
Compared to Xbox One, the amount of shader engines doubles, which combined with the frequency boost sees triangle and vertex rate rise by 2.7x. The GPU's L2 cache gets a 4x increase, which Goossen says is there for targeting 4K performance.
"Those are kind of the big items but we also leveraged the fact that we understand the AMD architecture really, really well now and how well it does on our games," adds Goossen "So we were able to go through and examine a lot of the internal queues and buffers and caches and FIFOs that make up this very deep pipeline that if you can find the right areas that are causing bottlenecks, for very small area we could increase those sizes and get effective wins. This was a very big focus of ours to go through and you basically really leverage that understanding of having those years of looking at performance on the Xbox One."
Certainly in the PC space, GPU performance tends to scale with bandwidth - the more you increase compute power, the faster the memory you need to get best performance. It's an area where Sony were constrained in the PS4 Pro design. Having settled on 8GB of RAM, the only way to increase bandwidth and maintain compatibility was to swap in faster modules. There's a 2.3x increase in compute power, but only a 24 per cent increase in bandwidth. Microsoft ditched Xbox One's DDR3 and ESRAM combo, and moved to GDDR5.
"For 4K assets, textures get larger and render targets get larger as well. This means a couple of things - you need more space, you need more bandwidth," explains Nick Baker. "The question though was how much? We'd hate to build this GPU and then end up having to be memory-starved. All the analysis that Andrew was talking about, we were able to look at the effect of different memory bandwidths, and it quickly led us to needing more than 300GB/s memory bandwidth. In the end we ended up choosing 326GB/s. On Scorpio we are using a 384-bit GDDR5 interface - that is 12 channels. Each channel is 32 bits, and then 6.8GHz on the signalling so you multiply those up and you get the 326GB/s."
Baker explains that with those variables in place, the decision in targeting the amount of memory Project Scorpio would address is essentially made for you, and in our E3 2016 Scorpio speculation, that same logic led us to conclude that the machine would indeed deliver 12GB of capacity. Four gigs is reserved for the system (an extra 1GB there is utilised for a full 4K dashboard), leaving 8GB for game developers - a substantial increase over the 5GB used on Xbox One, and indeed PlayStation 4 Pro.
"Aside from the interface you also want to make sure that the data can get from the memory interface to the internals efficiently as well," adds Baker. "We already talked about the latency for the CPU, for example, but other areas particularly when you're dealing with real-time data for video output and such, you want to make sure that those aren't starved as well - and so we went through and looked and rejigged a lot of the quality of service that happens on the chip, to get the bandwidth around effectively. Also with render targets and textures being larger we went and tweaked the page tables for the GPU as well."
"We really like this memory solution because it did solve our two primary challenges in terms of providing 4K render targets and 4K textures. We considered various different options but the 384-bit 12GB config really made a lot of sense," Goossen says.
"And we have DCC on top of that as well," adds Baker.
ESRAM was a component of the Xbox One SoC that is surplus to requirements with the move to a fully unified pool of GDDR5, but there is no doubt that having that 32MB of memory within the SoC does have a latency advantage. However, the balance of the new design effectively factors that out.
"The memory system we've got, we've got enough bandwidth to more than cover what we go from ESRAM," says Baker. "We simply go and use our virtual memory system to map the 32MB of physical address that the old games thought they got into 32MB in the GDDR5. So latency is higher but in terms of aggregate performance, the improved bandwidth and improved GPU performance means we don't hit any issues."
Also within the SoC, the audio processing block returns from Xbox One - required for compatibility, but beefed up at the system level to allow for spatial surround system support for formats like Dolby Atmos for Home Theater, Atmos for Headphones, plus Microsoft's own Windows Sonic for Headphones, an HRTF based solution, developed by the HoloLens team. New for Scorpio is technology derived from AMD's latest GPU-accelerated media hardware. GameDVR supports 4K at 60fps, and it also uses HEVC encoding, meaning much higher quality video at the same bit-rate. HDR support in GameDVR actually gives Scorpio an advantage over conventional external capture equipment, which - right now at least - does not support high dynamic range.
"We can do full fidelity, incredibly high bit-rate GameDVR recording of your 4K60 experience," says Andrew Goossen. "You'll be able to play back locally at full fidelity and when you upload to YouTube you can automatically transcode - you can send up the raw thing as well, but typically we'll be doing a transcode to h.264 as part of that. We're also supporting HDR and SDR GameDVR so you'll be able to enjoy the full fidelity of the HDR experience, the challenge being - as you know - getting more platforms so you can actually view these. You'll certainly be able to view them locally. People with Xbox One S will be able to view them and hopefully we'll getting the industry moving to view these HDR videos as well."
The design aims for the Project Scorpio can be distilled down into two very specific goals - both 900p and 1080p Xbox One titles need to scale up to native 4K. It's actually a significant extension of what Microsoft promised last year at E3, which specifically addressed running 1080p Xbox One titles at ultra HD. The scope is wider now - by targeting 900p Xbox One titles too, the implication is that the same kind of scalability is on the cards for PS4 1080p games as well.
"We wanted [native 1080p Xbox One games] to run at full native 4K with a rock-solid frame-rate with a whole bunch of performance left over to showcase and actually improve the visual experience in many other ways beyond render resolution," Andrew Goossen tells us. "And then our other goal was that we wanted to get 900p games up to full native 4K. That's a little bit harder. Some of 900p games - day one port - they should be running fine, solid at 2160p. For other games it's going to be more work than you'll traditionally do in terms of console optimisation but we wanted to get those 900p games at 2160p."
Those are the calculations the Xbox team made in formulating the design, but developers are free to use the power of the processor as they see fit.
"Every game is different, every developer is different. The developers know best what techniques make the most impact for their games," explains Goossen. "We are perfectly happy with developers choosing a bunch of other techniques that are possible. We have hardware techniques for making checkerboarding very efficient. If developers want to go for checkerboarding, that's great. We've also heard from a bunch of our partners that they're actually finding that they prefer TAA [temporal anti-aliasing] with upscaling rather than checkerboarding. They do that, that's great. We don't impose any sort of requirements on them."
There are restrictions, however. Similar to PlayStation 4 Pro, Microsoft requires developers to run their games at the same frame-rate or better than the equivalent Xbox One version of the title. And all performance or high resolution rendering modes need to be available to all users, regardless of the screen they have attached to the console.
"They're perfectly allowed to have multiple solutions they support, but they have to ask the user which one. They have to have a good default and they have to ask the user if they want to switch to another one," says Goossen - good news of course, but our hope is that developers will instead opt for the ability to swap between modes in-game.
"In terms of the panoply of implementations we're going to see for Scorpio native games, I expect quite a range. I wouldn't be surprised to see games running at 1080p on Xbox One... they might use checkerboard and then they use the remaining GPU to really impact visual quality," Goossen continues.
"For the very small handful of titles that run at 720p today, our expectation is that they can checkerboard up to native 4K if they want to do that. I also expect variations of titles that are perhaps running at 900p at 30fps on Xbox One today that they can leverage the 31 per cent boost to CPU clock along with a bunch of other optimisations in conjunction with our D3D12 offload to potentially offer 1080p60 rather than 900p30. It's totally up to developers."
Microsoft didn't delve too deeply into specifics on the checkerboarding support that Scorpio possesses at the hardware level. However, Andrew Goossen tells us that the GPU supports extensions that allow depth and ID buffers to be efficiently rendered at full native resolution, while colour buffers can be rendered at half resolution with full pixel shader efficiency. Based on conversations last year with Mark Cerny, there is some commonality in approach here with some of the aspects of PlayStation 4 Pro's design, but we can expect some variation in customisations - despite both working with AMD, we're reliably informed that neither Sony or Microsoft are at all aware of each other's designs before they are publicly unveiled.
What is clear is that both companies had very different design priorities. Sony doubled down on checkerboarding support at a hardware level for addressing a 4K display because effectively there was no other choice: a 2.3x compute boost and only a modest bump to memory bandwidth over PS4 ruled out native 4K on top-tier titles. Microsoft's focus is clearly on pursuing higher native resolutions - the stops were pulled out on memory bandwidth and processing power, plus there's the focus on customising the silicon according to content.
But to what extent do those customisations elevate Scorpio beyond a PC equipped with a notional, baseline Radeon equivalent to Scorpio's GPU - no customisation but 'the same teraflops'. After the presentation, that's exactly what I asked Microsoft in the first of a couple of follow-up rounds of questions conducted over email.
"Our performance analysis and modelling was so core to the entire design process of optimisation and adjustments that I don't have a specific example to call out," says Andrew Goossen. "We put every change we considered through the model. But in terms of 'more from your teraflops', I will point out that Scorpio has significant performance benefits relative to PC:
"Microsoft has made continual improvements to the shader compiler. We see significant performance wins for Xbox game content relative to compiling the same shaders on PC. [Secondly], 'to the metal' API and shader extension support allows developer to optimise in ways that simply can't be done on PC cards. [Finally], PIX provides low level analysis and insight that, in conjunction with 'to the metal' support, allows developers to make the most of the console GPU. These technologies are all already mature and familiar to developers, so Scorpio games will benefit from the get go."
We've already seen some evidence of the results of this approach with Forza Motorsport - the demo we saw provided stability at 4K on par with a higher-end Nvidia GPU, but this is just one shaky comparison point this early on. Once third party titles are shown on Scorpio, we should have a better grip how easy it is for developers to tap into the power of the processor. And rest assured, we'll be there with complete analysis, when the time is right.
We learned about Project Scorpio at an exclusive briefing at Xbox HQ. Microsoft paid for travel and accommodation.