The following post was inspired by several conversations and practices that I’ve had over the years regarding the subject of “ducking” in video games. Sometimes people feel like this is an unnecessary feature in an audio engine, but given the unpredictable nature of player behavior, I feel it’s a bit foolish to dismiss it out of hand.
Oftentimes in movies, television shows, radio broadcasts, etc, etc…when important dialogue starts to play the rest of the audioscape (including sound effects and music) will be lowered in volume. This is called “ducking” and allows more headroom in the mix to provide important information to the listener that they may otherwise miss due to the complexity of what else is going on within the soundtrack at that particular time.
This is fine and dandy in a linear/time-locked medium like film, but when you’re in an interactive environment where “anything goes”, this becomes much more of a complicated issue. One thing that can be a common problem in today’s action games is “player fatigue”. Player fatigue can simply be described as “not giving the player a break”. This is basic game design philosophy that comes down to pacing, but audio can play a very significant role with this issue, primarily because if anything moves in the game world that has been created, then it will more than likely make a sound as well. If the action is relentless for too long, you can wear the player’s ears (and brain) out. If that happens then the following can easily occur:
1) Player turns down the volume (ew!)
2) Player stops playing the game (holy God no!)
So how does a development team avoid this? There are a myriad of elements that go into approaching this problem. In an ideal scenerio, the audio team would be involved with all level layout meetings (and continued status update meetings throughout the rest of the project) to help with audio “pace” throughout the game. Much like a great piece of music, a game has a “rhythm”. It has establishing motifs and themes, it has gradual builds and rising action, it has massive climaxes, it has denouements, and it has resolves. If it’s a constant climax, the player will be get exhausted and probably pretty frustrated after awhile.
In addition, as a project gets closer and closer to final “lock-down”, the more important it becomes that the audio department is aware of any changes that occur at the design level. For example, if a new battle encounter is added to a section of the game where there wasn’t one before, the “rhythm” of the level has now changed. The audio department needs to be able to go through the levels and do a “final mix” of the entire game from top to bottom after design has completed any major reworks to make sure that the “aural integrity” remains intact throughout the shipping process.
Besides having the audio team “in the know” of what’s going on from a game design standpoint, there are some basic system elements that need to be addressed. At the very top of the list would be “priorities”. There are certain sounds that are more important than others. A voice-over line that tell the player what they should be doing is far more insightful that a looping cricket call, for example. This is where a priority system comes into play. The human ear cannot distinguish hundreds of independent channels of audio at any given time. It becomes a wall of sound that player cannot discern, so there needs to be a limit put in place. Generally a set number of allowed simultaneous channels that can be played at any given time will be set at the beginning of the project by the programming staff. From there, different “sets” of sounds will be given different priorities. Meaning, the higher priority sounds will trump anything else that is playing through the mixer at that time. After the high priority sounds have finished playing, the lower pris will pick up where they left off. If the priority system is set up properly and the sound sets themselves are organized in an intelligent manner, the player will never know that certain sounds have stopped playing and started again.
The next subject that I’ll go over is the idea of bus hierarchy. Busses have multiple names: Sound Groups, Channel Groups, Volume Groups, SoundClasses, Event Categories, etc, etc, etc…for the purposes of this blog, I will refer to all of these as “busses”. Busses and sub-busses are essentially groups and sub-groups of certain types of sounds. We lump things together for two major reasons. One of them is purely for organizational purposes. The other and more practical reason is to be able to affect similar types of sounds all in one fell swoop. So if all of the player’s footsteps on all different types of substances (concrete, marble, grass, mud, snow, gravel, broken glass, etc, etc, etc…) have to be lowered in volume and pitch, we wouldn’t change the values on each individual sound, we’d do it on the Bus that the footsteps live in. This simplifies the process significantly and allows to a lesser margin of error when trying to trouble-shoot audio settings/volume tweaks.
Now we obviously need to make sure that the bus hierarchy is split up and grouped in a sensible way. We don’t want things to get messy, because it’ll circumvent what we’re trying to do; which is keeping things nice and tidy and ready for ducking!
Now here is a very basic but common bus hierarchy breakdown:

So, at the top of the chain you have the Master Bus. Think of this as your volume control on your TV. If you adjust this, you adjust everything in the mix. Then you just follow the chain down from sub-bus to sub-bus. You’d want to categorized similar types of sounds within the sub-busses…for example, you might want to put all physics based sounds in a sub bus under SFX and then all bullet impacts on a seperate sub bus under “weapons” which is also under SFX.
Now in terms of ducking (specifically ducking when voice over is occurring), you’d want to put all sounds that you do NOT want ducked in the same bus. For this example, we’ll put them all under the Story VO sub-bus under the Voice main.
So now, in theory, you’re all setup for ducking. The system would have to call whenever a file is triggered from the Story VO bus that all other busses would dip in a specified volume level. There are multiple additional tweaks to take into account…do you delay the playback of the actual file itself to allow the rest of the audio mixer to do a gradual fade down so you don’t get a drastic pop in the mix? Do you run a sidechain compressor on the Story VO bus? Do you have all of the sounds fade back up after the file is done or a quick pop?
A lot of these kinds of ideas are, of course, all dependent on the limitations of your technology and how robust you want the system to be. The provided bus hierarchy is extremely basic and has room to grow…but keep in mind if your system allows for growth, that you need to keep tabs on every sound in the game and which bus they all live in. This can get confusing if you break busses up at the micro level. So you’ll need to find a balance.
Anyway…there’s a quick intro into ducking and some common practices in approaching a solution.
This post got huge.
Special thanks to Dave, Paul, Ed, Chris and Adam for looking over everything and providing feedback!