Switch overclocking: how powerful is the fully unlocked Tegra X1?
Demanding games tested with CPU and GPU pushed to the max.
The Nintendo Switch has delivered a host of handheld miracles since it first launched in March 2017. What initially looked like last-gen console power packed into a handheld has given us so much more, to the point where the system plays host to id Tech 6 ports - and even an upcoming conversion of The Witcher 3. Low level access to Nvidia's Tegra X1 has seen some astonishing results over the last couple of years - but it's capable of more. Much more.
Whether it's down to heat dissipation or battery life concerns, the fact is that Nintendo's hybrid console runs at significantly lower clocks compared to the Tegra X1's stock specification, meaning lower theoretical performance. The firm has gradually unlocked more of the console's power for developers, but we're still a long way from a fully unleashed Tegra X1 - unless you have an older, hackable version of the hardware. This kind of modification is not recommended and may lead to your console being banned from online services, or worse still, being rendered completely inoperable. Regardless, I was eager to test the full extent of the system's potential across a range of power hungry games.
We've previously used a homebrew tool called sys-clk to examine the frequencies of the Switch processor in order to nail down how Nintendo has gifted more power to developers over time - mostly in handheld mode. However, sys-clk has another function - overclocking the console beyond Nintendo's limits, topping out at the stock clocks Nvidia set for its original Tegra X1 design. Using Nintendo's docked configuration as a comparison point, GPU frequencies can be pushed up by an additional 20 per cent, while the CPU can enjoy an additional 75 per cent.
Obviously, I was fascinated to see what the real world applications of the overclock would be: by targeting a range of games which have performance challenges, witnessing the improvement brought about by higher CPU and GPU frequencies should better inform us of the problems and issues faced by developers. The results are not altogether surprising, but do bring the primary bottlenecks of the Switch hardware into focus - and from my perspective, the big takeaway is that I don't think it's the graphics hardware that's holding back developers.
The first order of business was to check out Dragon Quest Builders 2, which has massive performance problems running more ambitious user-made content - to the point where we measured a minimum of seven frames per second in both docked and mobile mode. Switch's GPU power and memory bandwidth is reduced in its portable configuration, but the CPU clock remains the same at 1020MHz. Identical performance in both modes suggests that the CPU is the limitation - and that turns out to be the case in DQB2. Overclocking the GPU to its limits does nothing, while running the CPU at 1785MHz delivers performance improvements of up to 40 per cent. That doesn't translate into anything like a smooth frame-rate when your base performance is so low, but what it establishes is that increased CPU frequencies can make a substantial difference to game performance.
The more games we tested, the more the CPU bottleneck is exposed. Mortal Kombat 11 is, by and large, an excellent port - but it does have performance issues in some areas. Overclocking the GPU on its own yielded poor results, while a CPU OC delivered far greater improvements to frame-rate. Combining the two - unsurprisingly - nigh on locks us to the target 60fps with just minor drops. The key takeaway is that the extra graphics power only comes into play once the CPU limitation is addressed.
The same situation is found in Wolfenstein Youngblood, another frankly astonishing id Tech 6 Switch conversion from Austin-based developer, Panic Button. Performance in this title is a fairly smooth 30fps, but it can drop down to the mid-20s in firefights. The frame-rate drops are mostly addressed by pushing up CPU clocks, and once done, extra GPU power does little - except to increase dynamic resolution. It's not a game-changer, but it's a noticeable improvement, but really it's the increased CPU performance that really makes the biggest difference here. Going back to Panic Button's original Doom 2016 conversion, there are similar gains - and in all cases, frame-pacing issues at 30fps found in these conversions are also less noticeable.
Tegra Max Clocks | Switch Docked | Portable #1 | Portable #2 | Portable #3 | Loading 'Boost' Mode | |
---|---|---|---|---|---|---|
CPU Clock | 1785MHz | 1020MHz | 1020MHz | 1020MHz | 1020MHz | 1785MHz |
GPU Clock | 921MHz | 768MHz | 307.2MHz | 384MHz | 460MHz | Title/Mode Dependent |
EMC Clock | 1600MHz | 1600MHz | 1331MHz | 1331MHz | 1331MHz | Title/Mode Dependent |
The takeaway from most of the overclocking testing is that despite Switch's relatively meagre GPU power, developers are managing to scale their projects graphically to the capabilities of the hardware - and that makes sense. With so many platforms out there, and so many different PCs to cater for, games are built to scale their GPU requirements, whether it's from resolution scaling or reducing the quality of specific features. That developers have been able to do so to the point where Wolfenstein, Hellblade, Mortal Kombat 11 or The Witcher 3 become possible on a mobile chipset is still a huge achievement. However, games have less scalability build into their design on the CPU side, and this seems to present more of a challenge.
There is another daunting limitation within the Switch - memory bandwidth. While docked, the memory controller runs at 1600MHz - and that's the hard limit of the Tegra X1, so it can't be overclocked further, and this can present issues. Whether it's overclocking CPU or CPU/GPU together, Saints Row The Third shows little palpable improvement whatsoever. Only by using handheld mode (which swaps out 1080p rendering for 720p) is the bandwidth requirement lessened to the point where decent performance gains kick in. In effectively dropping from 1080p to 720p, an extra 20fps is commonplace while in some scenarios, there's even a better than 2x improvement clock for clock.
Memory bandwidth also seems to be the main challenge facing the Korok Forest in Zelda: Breath of the Wild. Incremental gains are found by overclocking the CPU, with a more profound shift when both CPU and GPU are pushed to their limits - but a perfect 30fps still proves elusive. It's another scenario where I suspect that memory bandwidth limitations here are the issue and may be addressed by using the lower resolution mobile mode.
Switch overclocking: the drawbacks
There's no such thing as a free lunch when it comes to extracting extra performance from silicon and obviously there's a reason why Nintendo didn't bring Switch to market with the full power of the Tegra X1 unlocked. For starters, the more performance you have, the less battery life you can expect - and even on stock clocks, a game like Fast RMX can drain the battery in just 2.5 hours. Secondly, the Switch is a handheld and while its processor is actively cooled and designed for efficient cooling in both handheld and desktop modes, it is hardly the most powerful solution on the market.
Comparing stock docked clocks to a full OC running Wolfenstein Youngblood, power consumption on Switch could rise by a maximum of 25 per cent from around 15W to 20W at worst - a sample comparison image from the video is presented below. Meanwhile, temperatures increase too: using the processor's temperature sensor data, Wolfenstein peaked at about 60 degrees Celsius on stock clocks, rising to 64 degrees when the CPU was pushed to 1785MHz. Meanwhile, pushing the GPU to 921MHz from its stock 768MHz (in addition to the CPU OC) saw temperatures increase to a maximum of 67 degrees. The Tegra X1 throttles at around 83 degrees, so this may not sound like too much of a problem and while performance was rock solid for me, but it's worth remembering that these temperatures are significantly higher than stock despite the much more active fan.
However, these increases are with both CPU and GPU pushed to their limits - and the option remains for Nintendo to increase stock clocks incrementally, as we've already seen in titles like Super Mario Odyssey, Zelda: Breath of the Wild and Mortal Kombat 11 in handheld mode. The emphasis so far has been on increasing GPU power, but my results so far suggest that the experience could be improved significantly with a CPU upclock - even moving to the next step up (1220MHz) could deliver significantly smoother gameplay. I tried Wolfenstein Youngblood at this CPU speed and while the results weren't as impressive as a higher clock, it was still a noticeably smoother experience than stock with far fewer deviations from 30fps. In common with many overclocks, the chances are that a law of diminishing returns kicks in the harder you push frequencies.
Switch: an evolving platform?
When developer documentation first leaked, Switch's Tegra specs in mobile mode were limited to say the least, with developers expected to ship games with just a 307.2MHz clock. Before launch, this was increased to 384MHz, while select titles have seen a further GPU uptick to 460MHz. Meanwhile, the loading time 'boost mode' momentarily increases CPU clocks to the full 1785MHz to facilitate faster decompression and by extension, shorter loading.
The suggestion seems to be that Nintendo has set the baseline low - likely for battery life reasons - but is willing to experiment in unlocking more of Tegra X1's potential if it doesn't impact the user experience. It'll be interesting to analyse power consumption of the processor's various frequency bands to see what the likely impact to battery life and thermals would be, and what options the platform holder may choose to work with in future. There is the sense that Nintendo is more than willing to experiment with the capabilities of its own hardware.
In the meantime, overclocking the Switch has been a very useful exercise - it's allowed us to look under the hood of a contemporary console platform in a way that simply hasn't been possible in the past. We've seen the limits developers have to work with and what bottlenecks remain, even with overclocking in place. In truth, I could have spent a lot more time with this; I've not touched handheld tests yet, short of running Saints Row The Third with full clocks enabled. It's something I plan to find time for and at some point I also hope to tap into the SoC's power consumption in mobile mode - just to quantify what level of efficiency Nvidia and Nintendo attains in bringing games like Doom 2016 and The Witcher 3 to Switch in a portable form factor.
In the meantime, a new Switch model revision has just arrived using an even more power-efficient processor - we may not have the same level of access to the internals as we do with the existing model, but the new data we have from this deep dive into the Switch's capabilities will prove useful in assessing the changes and improvements delivered by the new chip. A new model Switch imported from Hong Kong has just arrived, and we'll be publishing a review later this week.