Beyond live TV - what the Xbox One user interface means for gamers
Digital Foundry on the next-gen dash, voice control... and Kinect.
"Xbox on."
We aren't in the habit of talking to our consoles. It feels weird - if not plain silly. We don't like the way that voice commands take longer to register than a good old-fashioned button press and we're unsure if it's actually going to work even if we do pluck up the courage to vocally command our consoles. And what's the point of using voice to turn on our games machine if we're reaching for our controller anyway? And how useful is voice control if we still need remotes for our TV and sound system?
What was novelty - if technically impressive - technology on the original Kinect forms a fundamental basis of interaction with Xbox One, and from what we saw at Gamescom, parallels with Xbox 360 hold little water. Voice control on Xbox One actually works and it seems to be genuinely useful. It should be faster than using the joypad for certain functions, but ideally you'd use both in concert. And as for Xbox's isolation from the rest of the components in your system, that is now clearly a thing of the past.
"It'll work with any TV, any amplifiers. It'll work with any AV equipment at home," says Xbox director of product planning Albert Penello. "When we say 'Xbox On' we can actually light up your whole entire system and control everything just with voice."
With full system integration, voice control comes into its own. Penello likens Xbox One's incorporation with your equipment to the Harmony universal remote, except that your voice is the key. You can walk into your lounge and your voice powers up everything: you won't need to reach for your remote or Xbox controller, you won't need to use multiple doobries to access each part of your system. Two words and you're up and running. Xbox One is always on. Even in its low-power sleep state, it should fully reactivate before your HDTV gets around to displaying its image. From a dashboard perspective at least, the days of waiting around for your console to boot should now be a thing of the past.
For this technology to work effectively, three things need to happen. First of all, Kinect's voice recognition tech has to just work - no mean feat considering the amount of languages supported, and the range of accents to accurately process. Secondly, Xbox One needs to know exactly what equipment is in the room in order to speak their own individual infra-red based languages. And finally, and perhaps most crucially, the IR signals that emanate from the console always need to register with the target hardware.
"Voice control on Xbox One actually works and it seems to be genuinely useful. It should be faster than using the joypad for certain functions, but ideally you would use both in concert."
At the press booth, Albert Penello faces some challenges. The amount of ambient noise in the background is immense - the Gamescom din is hardly comparable with living room conditions and he needs to address Kinect loudly and clearly for the message to get through.
"It's noisy in here and unnaturally loud so I have to kind of yell at Kinect," he laments.
Integration with your home system comes via a simple set-up procedure, but Penello's instructions are clearly making their way through to the TV he has set up in the presentation room. And it's all happening without any kind of traditional IR blaster visible in the room.
"Xbox mute. Xbox unmute. Xbox volume up. Xbox volume down."
"That's not going through wires," says Xbox group manager of corporate PR, David Dennis. "That's Kinect blasting the infra-red codes to the TV, the TV picks it up and that's the TV UI changing."
The audio demo shows the strengths and the weaknesses of voice control in working with your AV hardware. Muting and unmuting are on/off functions clearly suited to the tech. Volume adjustments operate on a scale and really require continuous button presses or better still, turning a knob - voice control can't really achieve the same effect without monotonous repetition. Turning volume down by an appreciable amount would take a long, long time.
But what is impressive is that every voice command registered by Kinect is flawlessly transmitted to the relevant kit in the room. It works so well because Kinect itself is the IR blaster. Microsoft has augmented the new Kinect sensor with an IR transmitter in order to see the environment even in pitch black conditions. The upshot of this is that the technology works by drenching the entire room in infra-red light. Forget little LEDs attached to wires you dangle in front of your set-top box; if that's an IR blaster, the Kinect solution is effectively the equivalent of going nuclear. Debug Kinect tools allow you to see what the IR sensor sees - complete blanket infra-red coverage of the whole room. It's hard to imagine a scenario where this form of IR blasting wouldn't work.
The core interface and Kinect personalisation
Penello is now demonstrating the core user interface, and he's keen to point out the authenticity of the presentation, warts and all.
"We are running on real kits, real hardware - these are near-final boxes. There's no PC hidden underneath anywhere, there are no wires going back behind. Everything I'm actually showing is running on this box right here," he says.
"Forget little LEDs attached to wires dangled in front of your AV equipment; if that's an IR blaster, Kinect's infra-red solution is the equivalent of going nuclear."
"The software is still early. It's not demo software, this is not software we put together just for the show. It's actually real take-home beta software, it's what our developers are using and what our internal testers are testing against. That's the good news, the bad news is it's still development software which means we have run into at least one hiccup in every demo - it's real code."
From what we can see, the user interface is looking slick and close to final - very different to the semi-broken UI we saw in the Wired coverage that ran simultaneously with the original Xbox One reveal. The only debug data we see comes from voice control - processed instructions are displayed in the top-right corner along with a number from zero to one. Penello's commands all get a 0.96 to 0.97 "rating" and we're subsequently told that this is the internal time (in seconds) taken to process the voice command.
"We have a lot of history with Xbox 360 in revising our dash. With Xbox One, we've tried to go with a more elegant and simplified user interface. So this is my home screen, the big tile there shows me the latest game application I'm running. Over to the right is our store, where you'll simply browse for games, movies, music, applications," Penello begins, before the screen suddenly changes with new content arriving in the tiles.
"It just signed me in. Kinect can actually see me and it knows that I'm talking and engaging with the system and so essentially it has downloaded my profile from Xbox Live and immediately signed me in. And you'll notice everything's changed. It's filled in the most recent applications I was using."
The recognition data and profile data are two separate things. Like Xbox 360 before it, there is a calibration sequence where Kinect on Xbox One learns about you from your skeleton and face. This is held locally. However, the customised UI elements that populate the screen are stored in the cloud, pulled across via Xbox Live. You can even set up "pins" - or favourites, if you will - that you attach to the home screen for swifter navigation.
"Once you've signed into the console it will recognise you and it'll customise [the dash]. If you think about 360, and moving between different Xboxes you have to have a memory unit or a USB thumb drive with all your save games and everything," Penello says. "All of that goes away with Xbox One. All you need to know is your Xbox Live ID and all of your save games and settings follow you wherever you go."
"The dash can log in up to six users tracked by Kinect, with voice commands bringing up profile-specific data like friends lists, Achievements and social feeds."
And all of this works with a number of users simultaneously logged into the system. Kinect itself is capable of independently tracking six different people, and this is reflected in the dash functionality.
How Kinect handles voice commands from multiple users
"The other thing I'm sure you've gone through is the idea of having more than one user on Xbox 360, which is actually pretty complicated. Maybe you guys have experienced roommates or significant others who might want to play a game that screwed up your Achievements, took your Gamerscore," Penello says. "It's very difficult to switch between users. On Xbox One we can actually have up to six people logged into the system at the same time and using Kinect we can recognise you, we can know you're talking and we can deliver a customised experience for that person."
The demo doesn't quite go to plan - David Dennis is addressing Kinect while Albert Penello is talking to the assembled press, so the sensor isn't quite sure where the commands are coming from, and which data to process. Accessing the people tab doesn't work, so the men try something else.
"Xbox go to Achievements," commands Penello.
"Xbox go to Achievements," repeats Dennis.
Both men are logged into the Xbox One dash, and both are able to access their individual data - be it Achievements, friends lists, whatever. Either of them could run Forza Motorsport 5 or any other game via a voice command and the system would ready their individual save state when the title loads. With digital delivery, all games in the library are accessible via voice as well as the standard controller. With the reversal back to disc-based DRM, the system isn't as streamlined with physical purchases - you'll still need to put your disc in the drive.
But what really impresses us in the presentation is the ability for Kinect to distinguish between the two men and bring up personalised content. How does it work? We know that Kinect recognises voice commands, but surely it can't distinguish individual voices. It turns out that other sensor data is used to make that determination.
"It knows our skeletons, it knows our faces. Kinect has an array microphone. It can see my face, it can see me talking, it can hear me and because it sees my skeleton, it can isolate that it's me," Penello explains.
"If you went into a development screen it would show you two skeletons standing here. It would show Albert talking. It would show me standing here but not talking," adds Dennis.
"There's another little trick in there. There's an LED emitter in [the joypad] that shows who's controlling which controller and that David is holding controller two and I'm holding controller one. Between all four of those things - who's talking, where they're standing, where the sound is coming from and the controller - it can snap to the right person."
"IR emitters in the new Xbox One controllers working in combination with skeletal tracking and input from the multi-array mic allow the OS to recognise individual voice commands and tailor the dash accordingly."
"Xbox go to Marble Maze."
Penello is demonstrating how you can load a game simply by asking for it. Again, this strikes us as a means by which you can do things a lot faster with Xbox One. Conceivably, with voice control, we could have turned on our entire AV system, and have our game of choice loading before we've even sat down on the couch.
"Marble Maze looks like a triple-A game [to the system]. So even though it's a silly demo game we've run, imagine it's something awesome like Ryse or Dead Rising. This is just something the dev team built to test some of the features," explains Penello.
"One of the things about Achievements for instance is that they show people that you've completed a task but they don't show you how you completed the task. The Xbox One is always storing the last five minutes of any gameplay at any given time, so imagine you're in a fighting game, a shooter or something and pulled off an amazing trick and just had to tell people about it... all you would have to say is 'Xbox record that' and what it's actually going to do in the background is take the last 30 seconds of buffer and compress it into a video I can see and share with my friends."
Snapped apps and video upload
And he does just that, in the process giving us our first look at the quality of the recorded video as it plays back on-screen. The Marble Maze demo itself runs at 60fps, and it's clear to see that the playback is a locked 30fps, and there is a hit to quality. But bearing in mind the quality of the available delivery platforms, it's still clearly a cut above YouTube. It'll be interesting to see whether that video undergoes another generation of encoding server-side or whether the original Xbox One stream is used.
"Notice I'm still in the game. I'm going to play the clip, it'll take a second to load. I can then take the video into our editor, trim it down, save it and then upload it. And then go back to the game," Penello says.
"Xbox go to Marble Maze - and you'll see I'm instantly back in the game right where I was. Of course that's one way to record video. I could do it manually... Xbox snap upload. And now the upload app is snapped to the side of the game. I'm still playing and I can actually go and set-up a manual record session recording up to five minutes of gameplay, editing later."
"When you upload it, it means you can share it with friends, it shows up in feeds and all that," adds Dennis.
With the upload app operating side-by-side with Marble Maze, we notice that the gameplay screen is squeezed horizontally and shunted to the left to accommodate the arrival of the new app. We're told that game developers have the option to choose what happens to the presentation - whether to squeeze the screen horizontally like this or to choose another option.
"Snapping in apps alongside gameplay, media or other apps works exactly as advertised with no adverse performance impact to speak of."
"Xbox unsnap."
Next up, switching between games. Microsoft says that loading up different apps is instant and from what we've seen it seems to occur with no impact to game performance. It's pretty impressive stuff. But swapping between games is different - the majority of the system RAM is used for gaming and there's no way to easily cache off 5GB of data, so here we do see loading times.
"Xbox go to Reflex."
Interestingly, there's a few seconds before the Xbox One processes the command. Penello explains that it's enough time to cancel the command, should a troublesome sibling try to chuck you off your game by using voice to load up a new one. It's probably the best solution available for the problem, but it does conjure up untold scenarios where voice control could be used for mischief. However, Reflex itself exposes more Kinect secrets that genuinely impress us.
The Reflex demo: Kinect and the first-person shooter
"Reflex is something we did to test how you could use the new Kinect sensor in a first-person shooter, so again this is not a real game, it's just a tech demo, but we wanted to test that we know when people are playing games, they're moving around, they're interacting - how do we capture that motion? And also how fast and how precise the new sensor is," Penello explains.
"So this is just a generic first-person shooter we've knocked up for testing purposes. You might notice that I'm getting shot at from things I can't see and want to put on my x-ray vision. If I just tap my head I can now see all the hidden characters in the game and you'll notice that that tap is as fast as a button press. The other thing is that - and it's not tuned super-great right now in this demo - but I can actually select things with my hands (which is not the part I want to demonstrate now) but also use my voice... Fire missiles... you can imagine that I can have yet another button with my voice to do certain actions."
We've spent more time than we care to mention flapping our hands around in front of high-speed cameras in order to measure end-to-end Kinect latency. While we'd take any comment that Kinect is as fast as a button press with significantly more than a pinch of salt, we can't deny that it looks quick. Really quick. In fact, much faster than the Kinect Sports Rivals demo we'd played on the show floor prior to arriving at this presentation.
"The Reflex FPS demo is pretty impressive stuff - we're seeing response that's seemingly faster than any previous Kinect experience we've had."
"The last thing is the ability to lean and take my kinetic energy. Obviously I'm standing up right now but this demo would work just fine if I were sitting and it actually knows that I'm putting my controller up as a shield, or that I want to lean. And all those things can happen simultaneously - and basically, instantly," Penello continues.
Firing missiles brings up a new Xbox system prompt - the "magic moment". Developers can place these in the game and auto-save gameplay video that they deem to be important and shareable.
"If there's a massive boss and the developer knows that the player will have to do something super-cool to finish, the game can set a flag to automatically record that moment," explains Penello. "And that's what happened there - using my voice to fire missiles is flagged as a magic moment and that causes the magic moment to be recorded."
Microsoft has radically overhauled the friends system from the 100-slot system used on Xbox 360. You have an expanded friends list, favourites, plus the ability to follow and be followed - Twitter-style (with the requisite blocking functionality, should you need it). Sharing of video is a key part of this, suggesting a massive bandwidth requirement system-side, which may explain why these features are for Live Gold subscribers only.
Snapped apps: capabilities and limitations
But our focus is technology, and we're impressed with the apps functions and the integration of Snap. The question is, how does it function and what are its limits?
"We've talked about the Xbox OS where we have three different operating systems running at the same time. We have a very small operating system layer that's just the code, we have what we call the application layer so that the applications in the OS run independently of the games," Penello says.
"That's why you can do Snap and that's why I can switch independently between apps without affecting the game performance. The game gets the vast majority of resources for the console. The OS and applications run in a totally different part and they don't affect each other. Two games can't run simultaneously because they take up the bulk of the resources. So when I start a new game I have to unload the old game and load the new game, but I can still have up to four applications running in the background. That was instantly switching, as you were seeing."
The dash does seem to be resident in RAM even during gameplay, so could we snap in the UI during gameplay and adjust system settings?
"The more advanced dash in combination with four apps resident in memory at any one time begins to explain why 3GB of Xbox One's RAM is reserved for the system."
"It's snapping the apps that were in the system. You can't snap the UI," David Dennis answers. "Imagine the Madden game and Madden app running simultaneously. Or think of Halo Waypoint as it is right now. Like you go out to Halo Waypoint and it's aggregating everything you do. Think of a Halo Waypoint app snapped as you're actually playing the game. It's updating the feeds. That's an app and a game operating at the same time and talking to each other."
"You could have YouTube snapped to a game, you could Netflix snapped to a game, you could have YouTube snapped to Netflix," adds Penello, pointing out that apps can be snapped to both games - and other apps.
While snapping in the UI itself is off the table, it seems that tasks we might use the dash for can be snapped in as individual apps.
"You can 'Snap party' and that'll show you everything people are playing and then you can join in games," Penello says.
Following a swift Q+A with the audience, the show's over, but we hang about afterwards to introduce ourselves and ask a few more questions about the Xbox One architecture. It's difficult to avoid the sense that we're talking with really enthusiastic people who have been a little hamstrung in the ways that they can talk about their product - whether it's the Xbox One silicon or the user interface. We get the impression that this is about to change, significantly.
It's also interesting to note that live TV only got a single, solitary mention in the presentation (when one of the press noticed a live TV Achievement in Albert Penello's profile - which he was at pains to point out carries no Gamerscore). This was a presentation tailored to a gaming audience, designed to highlight how useful the Kinect integration is even if you're not into full-body motion gaming, while at the same time exploring how the dash has evolved to become more dynamic, media-rich and - yes - more social.
While it's difficult to imagine Kinect voice control taking over exclusively from the joypad, the notion of using it in combination with the controller now makes a lot more sense. Regardless of how simplified and elegant the Xbox One dash is, there's going to be a lot of real estate to cover, and the ability to skip that with voice can't be sniffed at. Similarly, the idea of dealing with multiple logged-in players using the Kinect functionality seems really cool in theory - it's a little buggy in its current state, but this is beta software and we can expect that to be resolved.
Perhaps the biggest takeaway from the briefing is the transformation in the general perception of the user interface and its relevance to gamers. The Xbox One reveal concentrated too much on live TV to the point where core gamers felt genuinely excluded. At the end of this presentation we came away impressed, wanting to see more. Penello sums it up succinctly, using language uncannily similar to what we've been hearing from other sources close to Microsoft ever since the initial Xbox One debut:
"It's impossible to talk about it. When you see it, you understand."