How developers really deal with bugs
They're everywhere.
Everybody knows bugs. There are funny ones and stupid ones. There are annoying ones and actually-damaging ones. But however they manifest themselves, bugs sit right between a game's maker and its player, a sudden manifestation of mistakes that have been made, a crack in the simulation, a bump right back down to Earth.
The player side of the experience of bugs is straightforward. They raise amusement, irritation and sometimes spluttering anger, and they should all be fixed. But players don't really know so much about the developer experience. That's despite the relationship between players and developers growing closer than ever over the past 10 or so years. In the era of internet-delivered patches, Early Access and the rise of indie development, players are caught in the swirl of the development process as they pore over changelogs and offer feedback.
"It's a mixed blessing, isn't it, the fact that you can release your game and people can tell you that it's broken and you can talk to them about it and then fix it," says Ricky Haggett, developer of Hohokum, Frobisher Says, and most recently of delightful space Rogue-like Loot Rascals. "That's amazing, and it's also incredibly stressful. You also feel very exposed."
"It can be a very emotional thing," agrees Cliff Harris of Positech Games, veteran of Lionhead and Elixir Studios, and sole developer of the Democracy series, Gratuitous Space Battles and currently car factory sim Production Line.
"There's a general misconception, I think, that when there's a bug players think the developer doesn't care because we've got their money. Especially in the days of Steam refunds, we've only temporarily got their money; they can easily take it back. Any bug that's in my game, unless it's in the sound middleware, will be my mistake, where I've fucked up. And I know it, and I can't pretend it was not me. You can feel the serotonin levels drop every time you see a bug report, or the word 'crash'. It really does drag you down."
Paradroid developer Andrew Braybrook on C64 bugs
6502 assembler is very unforgiving. It had no way of just stopping and letting you see the exact point of failure and we didn't have a debugger of any kind in the early days. Imagine then, that a game might be playing fine for 20 minutes, and then suddenly stops. This is exactly what happened to Paradroid in September 1983, less than a month before it was due to go off for duplication and release.
Usually if there is a bug present it means something isn't working at all, but Paradroid was apparently behaving itself, right up to the point of critical failure. With no indication of which area of the code was causing it, I spent three days reading the whole codebase. When I went home at the end of the third day I had narrowed it down to the collision detection system. At about 7pm I had eaten my dinner and had an inspired moment. I figured out what I'd done wrong: I had used the wrong index value in the robot data table.
There's a table of 24 different robot types, holding entries for the robot number, top speed, armour rating, starting energy and weaponry. Also, there's a table of 16 robots currently on the deck, holding position, energy, and speed. If you use the 24-element index on the 16-element table then any of the last eight values of that index would cause it to read invalid data and potentially write to data beyond the end of the table. It was only making this mistake when resolving collisions, so you might not notice that a messenger robot has more armour than it should, but you do when a big robot crashes into another and the game stops! I went out into the garden and had a good scream. I had found my mistake.
All developers have deep pride in their work, or at least strive for it. So when bugs happen, spontaneously emerging from the incredible complexity with which their systems interlock, developers feel bad, as much as they also know bugs are also almost inevitable. But it is not until after a game ships when the realbug reports start coming in.
"Sometimes I get emails about bugs," says Harris. "I've got a support forum on my website where people post bugs, though often they'll post them in the discussion forum. I get personal Facebook messages. I get messages on the game's Facebook page. I get to replies to threads on Reddit, and posts in the wrong forum on Steam and in the right forum on Steam. And then, every time I make an announcement, there are reports in the comments. Oh, and YouTube, every time I do a video someone says the game will crash."
Sometimes reports explain in detail what machine the player has, at what point in the game the bug appears, and what they were doing. Sometimes they'll include save games. "But I often get emails that say, 'Your game is broken, please fix,'" says Harris. "I don't even necessarily know what game it is. Throw me a bone here! And you get some very angry people as well, which doesn't help at all."
Fireproof's Rob Dodd on the pain of reproducing bugs
I was working on an FPS several years ago where enemies, when killed, would drop their weapon. The weapons would become physical and fall to the floor. A bug report came in that very rarely, the gun would fall straight through the floor. This was a big deal, because at times the game relied on you being able to collect a specific gun. There's a bunch of reasons why things might fall through the ground in a game. Seeing it happen was no use; I needed to make it reproducible, so I set up a bit of code that spawned a gun every second, each with a random velocity, spin and height, in different positions around a level. It would keep track of each one, and if after ten seconds the gun was below the ground, it would report the exact starting parameters.
I left it running overnight and came in the next morning to find the game had crashed a few hours in, but in the hours it survived it had thrown a few thousand guns, and a couple of them had fallen through. I changed my testbed to spawn guns with those starting parameters, and suddenly I had a steady stream gracefully spinning towards the ground, and dropping straight through it. The fix was easy - it was to do with the collision system not being set up for the guns to spin as much as they do in rare cases - one line to clamp the spin.
As a developer, it's hard to keep hold of the thought that angry bug reports are actually expressions of passion for a game. But simply replying to an angry player can often immediately turn an aggressor into someone far more reasonable. Harris sees it as a natural response to a world in which dealing with monolithic organisations like Google and Microsoft is like shouting into a void. It's often a surprise to find the support email address of a game has a human being at the end of it.
"I try to reply to them straight away, no matter what time it is, say sorry, and ask them for more information," says Haggett. "People are just generally cool; we're lucky enough not to experience any people who are dicks. And once you get past the initial apologising and getting people helping, it's actually positive human interaction, it's people reaching out to a developer and engaging with them. I love having a dialogue with people who play my games."
Next, a developer needs to log the issue. While Harris, who works alone, just logs them in his calendar with a rough date for fixing them, large developers will use support ticket management systems like Zendesk, coordinating the efforts of community managers, QA teams and the programmers who'll be actually working on the fixes. Professional systems are a long way from way they'd often be managed in the 1990s.
"One thing I find astounding is thinking back on how primitive the bug reporting and fixing process used to be," says Dorian Hart, a programmer and designer who worked at Looking Glass and Irrational. "When we worked on Underworld II and System Shock, there was no dedicated bug reporting software. Testers and developers would email the QA lead, who would compile a big list. Then, once a day, we'd have a big team bug meeting where the QA lead would read every bug out loud, at a time. Whoever was most responsible would raise their hand and agree to address it. If it was a bug that someone already had, they'd shout out 'Dupe!' which would often start an argument about whether the two bugs truly had the same root cause. Similar discussions would start for declarations of 'Not a bug!' or if there was disagreement about who was the right person to address it."
Joris Dormans on missing the missing bosses in Unexplored
When we moved Unexplored out of Early Access and into full release, we made a stupid mistake in one of the last patches before releasing. Basically we set the number of bosses to be generated to zero. It took us a week or so to realise we just shipped the game without any bosses apart from the final one - a player from Early Access brought the issue to our attention. We fixed it and very quickly had a patch that unleashed 50 new bosses unto the game as our first update. The other players seemed to take it pretty well. It's a good thing we're an indie team releasing on an online platform with unlimited updates. You can get away with things like this.
However reports are managed, the real work is in figuring out the cause. "Debugging is like detective work, you have to spot the clues, ask the right questions, and examine the crime scene," says Andrew Braybrook, developer of Commodore 64 classic Paradroid. "It can't be done to order, or on a budget, but it has to be done. On the C64 it also had to be done before the game is released." Back then, the codebase was pretty small, and since programmers tended to work alone, all the code was theirs and so they knew how it all worked. "That gives a significant advantage, because you're not looking for someone else's mistake in someone else's code. Most bugs I could find and fix in minutes."
"Almost all of it hinges on whether I can reproduce it," says Harris, who codes his own game engines, and therefore can see and work on pretty much every aspect of his games. "Generally speaking, if I can see a crash, bang, it's fixed." That's why developers need detailed information on the conditions that were in place when a player encounters a bug. If a developer can reproduce a bug, they can look at what the computer is doing at the point of failure, and through that, figure out its cause. Often, then, the *real* work is in discovering the rare combinations of events and variables that are the source.
But then there are the other, even more frustrating, kinds of bugs. Harris talks about 'Heisenbugs', which disappear or change during the act of running debug processes to examine them, making them very difficult to identify. Charles Randall, who's worked at many developers, including Bioware Edmonton, Ubisoft Montreal and Capybara Games, talks of 'meta-bugs', which arise not from code but the compiler, which converts code into the instructions that run on the computer itself.
"Blaming the compiler is the 'It's not lupus! moment of game development," he says. "But when it *is*, you are in for a world of pain. On MDK 2, the guy working on the sound code had an issue where a particular game sound refused to play. When debugging it, he found out that the code wasn't actually executing the playSound() function. After much investigation, we took an educated guess that it was a name-mangling issue and renamed the function to something like pleaseLordSatanPlaySound() and it fixed the issue. As far as I know, it shipped that way."
Charles Randall on not fixing a bug in Assassin's Creed 2
There was an ongoing issue in Assassin's Creed 2 that I couldn't solve with missing animations in combat. I could never figure out what led to the exact combination of circumstances which triggered the bug. It haunted me for well over a year but I could detect it in code, and... just make it work. Not properly, mind you. When I detected the error case, I just played another animation. I'm assuming there's a rare issue in the game where you'll see an animation that doesn't sync up, but no one ever complained, so I guess at the end of the day it was a valid fix. Sometimes making a bug disappear is the next best thing to actually fixing it.
And then, sometimes, the report isn't a bug at all. "I'm sure gamers think it's bollocks, but so many times when people say a game won't run, they just need to update their video card drivers," says Harris. "It sounds such hand-wavy bollocks, like you're buying time, but with startup crashes, 80 per cent of them are about updating drivers." On both Steam and PS4 versions, Haggett had players whose games crashed on startup for no apparent reason. A cause was never discovered, but reinstalling the game completely fixed it. "We were like, 'Wow, reinstalling. That's still a thing."
Once fixed, issuing updates today is easy, even on console, where the process is now largely automated. A common misconception is that the certification process that console makers impose on all releases on their platforms is about catching bugs. Not at all: it's for ensuring they comply with the platform's rules. Loot Rascals was certified from a build that had various crash bugs. Issuing a patch on PS4, for example, generally takes just a couple of days, and is free.
And sometimes, just sometimes, a bug just isn't fixable. This is rarer than you might think - remember developers' pride in their work - and therefore when it does happen, it's down to a business decision. "If someone said the latest update to Windows means Redshirt doesn't run any more, I wouldn't fix it," says Harris. "I'd just stop selling it. It's not worth it. Coders are emotionally embarrassed by bugs, we really hate them more than anyone because we know we fucked up. So you don't want to leave it there, unless it's a sensible business decision. You always want it to be perfect. It's never a coder decision."
Teddy Dief on the difference between bugs and exploits in Hyper Light Drifter
I remember showing Hyper Light Drifter at a convention in 2013. I'd been having a dream time, getting to show our game off and watch people enjoy it. I also hadn't slept the night before so we could get the build ready. Late in the day, this cocky kid rolls up to the booth and says, “I'm gonna break your collision,” and starts dashing into walls over and over. I told him he couldn't. He insisted he could. We argued back and forth for about 10 minutes. I argued. With a young child. But he didn't find a bug. Two years later, my fellow designer-coder Beau Blyth and I watched Awesome Games Done Quick together. We watched speedrunners abuse glitches in Ocarina of Time to jump through walls and skip entire levels. And for the first time I wondered: if someone *did* break our collision... would it be kinda cool?
Six months after that, we released Hyper Light Drifter and it took about two days for a speedrunner to figure out how to get through our impenetrable walls. He used a glitch we'd never tried, purposefully getting trapped in crystal and having it force him inside of a wall, at which point he could roam freely. We thought about fixing this. Alx Preston edited some of our level designs to keep crystals away from important walls as a start. But ultimately we chose not to completely fix it. OK, we weren't totally sure how, without a major overhaul. So instead of blocking players from this exploit, I decided to just let them do it... but kill them after a few seconds. It felt quick enough to keep speedrunners from doing anything *too* crazy, but slow enough so an unlucky casual player would have time to realise they were somewhere they shouldn't have been. Sometimes you just kill the player, and hope they forgive you. Please forgive me.
Illustrations by Anni Sayers.