Mass Effect Andromeda and the quest for great facial animation
Nexus generation.
It could totally have been them. Talk to game developers about the clips of Mass Effect Andromeda's dialogue scenes that have been circulating for the past couple of weeks and you'll get a thousand-yard stare. The kind you see from a soldier who's just been left unscathed by a shell that hit a platoon-mate right beside them. Many of Andromeda's dramatic issues are common to narrative-heavy games, and all developers know they could've equally been victim to the indefensible attacks that have been directed at BioWare and EA's staff.
Witness Naughty Dog's Jonathan Cooper, who worked on the first Mass Effect and took to Twitter to explain the challenges animators face, emphasising the pressure of modern expectations and cost/benefit tradeoffs on Andromeda.
Facial animation and dialogue are hard. They involve the talents, experience and technology of a raft of different specialisms. Graphics engineers making shaders for skin and eyes, lighting artists, scriptwriters, 3D modellers, voice actors, and, of course, animators. And they operate in the subjective and shifting world of drama and emotion, and against an ever-rising bar of expectation.
Yesterday's state of the art is today's garbage, and game characters are currently at a difficult stage in their journey through the uncanny valley. Faces can often look incredibly lifelike (I've repeatedly been asked if I'm watching a drama during Horizon: Zero Dawn's dialogue scenes), but as Andromeda proves, a sudden gurn and an awkward pause can undo the effect of hours of careful craft in a few gif-able seconds.
Andromeda scrabbles on the scree that lies on the slopes leading out of the uncanny valley. It's far more ambitious than previous Mass Effect games, with more lines of dialogue than Mass Effect 2 and 3 combined (for comparison, Mass Effect 3 has 40,000 lines). And Andromeda features 1,200 speaking characters, which is double the number in Mass Effect 3.
So that's a lot of facial animation. And, don't forget, these lines are delivered in five languages: English, French, German, Italian and Spanish. All in all, it represents an extraordinary amount of work, more than BioWare has ever tackled before in a Mass Effect.
The work begins with sculpting character models and rigging them. Rigging is the process of making an articulated skeleton for a 3D character, providing the axes of motion for each joint, allowing a leg to be a leg and a finger a finger. As an idea of how much work it is to rig a lead character in a game, it took Crystal Dynamics a full month to rig Lara Croft for 2013's Tomb Raider.
But bodies are one thing and faces are another. Nuanced facial expressions are the result of many 'bones' moving the mesh of the face around. Five years ago a face would typically feature 30-50 bones, but it limited how realistic the face could be. Today, game engines can handle far more. A face like Nathan Drake's in Uncharted 4 is made up of 300 to 500 bones and can feature between 200 and 400 blend shapes, which model smooth transitions between facial expressions.
These rigs are then given life with animation data, and for many cinematic scenes today that comes from motion capture. As David Barton, producer at motion capture specialist Cubic Motion, says, scan-based rigs are becoming more common. These take detailed scans of an actor's face across 70 to 80 different expressions.
Cubic Motion, which provided motion capture for Horizon: Zero Dawn's cinematics, produces algorithms called solvers, which take these scans and use them to interpret highly accurate and nuanced facial animation from motion capture data. "So you might take a smile, and break that down into ten different controls on the rig, and our tech then takes the actor's performance and does a solve for each of those controls to create the animation."
If you're going to go even higher spec, you can animate wrinkle maps, which simulate the way skin crinkles, and diffuse maps, which simulate the movement of blood beneath the skin as stretches and bunches on the face. "That's what we're always pushing for," says Barton.
The scale of Andromeda, from the number of characters and amount of dialogue to the fact that your Ryder is completely customisable, means much of this just isn't possible. For games like it, as well as The Witcher 3 and Horizon: Zero Dawn, automation is the answer.
Andromeda uses technology from OC3 Entertainment, which makes FaceFX, middleware that uses audio files to create lip and face animation for 3D models. It's everywhere, adopted by The Division, Fallout 4, Batman: Arkham Knight and many more big releases, and it's based on making a set of poses for phoneme shapes, the basic lip poses we form when we say Oh or Aa or Mm, and then applying them to the sound of the recorded dialogue.
CD Projekt RED used similar tech for The Witcher 3's 35 hours of dialogue using an in-house tool which also added body movements and camera angles to the mix. It's difficult to compare with Andromeda since BioWare hasn't shown off its tools, but The Witcher 3's dialogue scenes benefited from the fact that policy was that generative animation was only the starting point. On the basis that it's easier to modify than to start from fresh, animators could take generated scenes and tweak them freely, placing incidental gestures and new camera angles, even moving character positions and poses. Here's a demo technical director Piotr Tomsiński showed at GDC last year:
But there's more to the equation. Much facial animation completely omits the little twitches of eye, expression and head that really makes us human. "Humans move like an insect, jittering and firing off and super irregular," says Ninja Theory co-founder Tameem Antoniades. "There's an almost imperceptible amount of noise on human beings all the time that has to be there, otherwise you don't believe they're real."
One recent game that shows off how effective this is is Horizon: Zero Dawn, in which Aloy's eyes constantly shiver and glance to give her an often unnerving sense of life. Dead eyes are one of the biggest issues for game characters, because they're what we tend to focus on. Jittering helps, as does ensuring they converge correctly on the subject they're meant to be looking at.
And then there are graphics issues. A good eye needs good shaders to mimic the way fluids coat it and the way light refracts between the surface and the iris. And a good face needs good lighting. Here, Andromeda faces a tricky problem. With its world being dynamically lit by its engine, Frostbite, you'll often simply have a conversation in a badly lit place, causing characters to look flat. For some reason, BioWare doesn't place lights during Andromeda's dialogue scenes in the way it did for previous Mass Effects, and in the way Guerrilla has during Horizon's.
Ideally, no one would use generative tech for facial animation. Sorry, FaceFX. And that ideal is rapidly becoming a reality.
One trailblazer is Ninja Theory, which has a back catalogue that pretty much represents a history of the tech. 2007's Heavenly Sword was one of the first motion captured games, but its animations were heavily worked on by WETA's animators before they were added to the game. 2010's Enslaved was built with Ninja Theory's own facial solver, which Antoniades says got them 80 to 90% of the way to its final animations. "The result was still uncanny valley; you then need an artist to go in and say, 'No this is what the actor means, this is the intent.'"
2013's DmC used the same solver but captured not with camera tracking ping-pong ball-style markers on the body but with full computer vision, which freed up preparation time and opened up the physical space of the performance. And now, in Hellblade, it's using Cubic Motion's solver, a head-mounted camera system by a company called Technoprops, which shoots an actor's face from 100 angles, and a facial rig by a company called 3Lateral. "They make the best realtime facial rigs in the business," says Antoniades. This process gets them 95% of the way to usable facial animation; Cubic Motion works a little on the output and Ninja Theory rarely touches it.
The process is so streamlined that Ninja Theory demonstrated the technique live on stage at the computer graphics conference Siggraph last year, animating directly from performance into Unreal 4, and won the award for Best Real-Time Graphics & Interactivity.
Hellblade doesn't have nearly the volume of dialogue that Mass Effect does, but it shows that capturing clean and detailed animation directly from a voice actor's face as they perform lines is becoming attainable. It means developers will save hugely on not having to clean the animation up, though it's possible that they'll have to invest more in the original performance, as they get the right take and ask more of their actors.
It is, however, far too easy to look at best examples in other games and expect Andromeda to be able to operate at their levels in every aspect. When you consider that all the hours of dialogue in a game as extensive as Andromeda are the result of so many factors, you can appreciate the challenge of making every second perfect, especially when animation is competing for budget with so many other aspects of the game, from combat systems to level creation.
Still, for a game about relationships it's perhaps not completely out of order to expect better representations of its characters. Bit by bit, practical technology is getting better, and it's possible, as Jonathan Cooper noted, that Andromeda might be a moment that underscores the importance of animation in a game's overall budget. We all benefit from getting to spend time with videogame characters who behave with natural subtlety. When we feel we're playing with people, games come to life.