So happy you’ve joined us! I’m videogame composer Winifred Phillips. Welcome back to our four part discussion of the role that music plays in Virtual Reality video games! These articles are based on the presentation I gave at this year’s gathering of the famous Game Developer’s Conference in San Francisco. My talk was entitled Music in Virtual Reality (I’ve included the official description of my talk at this end of this article). If you haven’t read the previous two articles, you’ll find them here:
During my GDC presentation, I focused on three important questions for VR video game composers:
Do we compose our music in 3D or 2D?
Do we structure our music to be Diegetic or Non-Diegetic?
Do we focus our music on enhancing player Comfort or Performance?
While attempting to answer these questions during my GDC talk, I discussed my work on four of my own VR game projects – the Bebylon: Battle Royale arena combat game from Kite & Lightning, the Dragon Front strategy game from High Voltage Software, the Fail Factory comedy game from Armature Studio, and the Scraper: First Strike shooter/RPG from Labrodex Inc.
In these articles, I’ve been sharing the discussions and conclusions that formed the basis of my GDC talk, including numerous examples from these four VR game projects. So now let’s look at the second of our three questions:
Hey everybody! I’m video game composer Winifred Phillips. At this year’s Game Developers Conference in San Francisco, I was pleased to give a presentation entitled Music in Virtual Reality (I’ve included the official description of my talk at the end of this article). While I’ve enjoyed discussing the role of music in virtual reality in previous articles that I’ve posted here, the talk I gave at GDC gave me the opportunity to pull a lot of those ideas together and present a more concentrated exploration of the practice of music composition for VR games. It occurred to me that such a focused discussion might be interesting to share in this forum as well. So, with that in mind, I’m excited to begin a four-part article series based on my GDC 2018 presentation!
Most visual artists in the game industry are familiar with a concept known as the “Uncanny Valley,” but it isn’t a problem that typically occupies the attention of sound designers and game music composers. However, with the imminent arrival of virtual reality, that situation may drastically change. Audio folks may have to begin wrestling with the problem right alongside their visual arts counterparts. I’ll explore that issue during the course of this blog, but first let’s start with a basic definition: what is the Uncanny Valley?
Here’s the graphic that is typically shown to illustrate the Uncanny Valley concept. The idea is this: human physical attributes can be endearing. We like human qualities when we see them attached to inhuman things like robots. It makes them cute and relatable. However, as they start getting more and more human in appearance, the cuteness starts going away, and the skin-crawling creepiness begins. The ick-factor reaches maximum in an amorphous no-man’s land right before absolute realism would theoretically be attained. In this realm of horrors known as the “Uncanny Valley,” we see that the appearance of the human-like creature is not close enough to be real, but close enough to be really disturbing. Don’t take my word for it, though. Here’s a great video from the Extra Credits video series that explores the meaning of the Uncanny Valley in more detail:
So, now we’ve explored what the Uncanny Valley means to visual artists, but how does this phenomenon impact the realm of audio?
Spatial Audio – Reconstructing Reality or Creating Illusion?
The idea of an audio equivalent for the Uncanny Valley was suggested by Francis Rumsey during a presentation he gave in May 2014 at the Audio Engineering Society Chicago Section Meeting, which took place at Shure Incorporated in Niles, Illinois. Francis Rumsey holds a PhD in Audio Engineering from the University of Surrey and is currently the chair of the Technical Council of the Audio Engineering Society. His talk was entitled “Spatial Audio – Reconstructing Reality or Creating Illusion?”
Francis Rumsey, chair of the AES Technical Council
In his excellent 90 minute presentation (available for viewing in its entirety by AES members), Francis Rumsey explores the history of spatial audio in detail, examining the long-term effort to reach perfect simulations of natural acoustic spaces. He examines the divergent philosophies of top audio engineers who approach the problem from a creative/artistic point of view, and acousticians who want to solve the dilemma mathematically by virtue of a perfect wave field synthesis technique. Along the way, he asks if spatial audio is really meant to recreate the best version of reality, or instead to conjure up an entertaining artistic illusion? This leads him to the main thesis of his talk:
Sound Design in VR: Almost Perfect Isn’t Perfect Enough
Rumsey suggests that as spatial audio approaches the top-most levels of realism, it begins to stimulate a more critical part of the brain. Why does it do this? Because human listeners react very strongly to a quality we call “naturalness.” We have a great depth of experience in the way environmental sound behaves in the world. We know how it reflects and reverberates, how objects may obstruct the sound or change its perceived timbre. As a simulated aural environment approaches perfect spatial realism and timbral fidelity, our brains begin to compare the simulation to our own remembered experiences of real audio environments, and we start to react negatively to subtle defects in an otherwise perfect simulation. “It sounds almost real,” we think, “but something about it is strange. It’s just wrong, it doesn’t add up.”
Take as an example this Oculus VR video demonstrating GenAudio’s AstoundSound 3D RTI positional 3D audio plugin. While the audio positioning is awesome and impressive, the demo does not incorporate any obstruction or occlusion effects (as the plugin makers readily admit). This makes the demo useful for us in examining the effects of subtle imperfections in an otherwise convincing 3D aural environment. The imperfections become especially pronounced when the gamer walks into the Tuscan house, but the sound of the outdoor fountain continues without any of the muffling obstruction effects one would expect to hear in those circumstances.
Voice in VR: The Uncanny Valley of Spatial Voice
During the presentation, Rumsey shared some of the research from Glenn Dickins, the Technical Architect of the Convergence Team at Dolby Laboratories. Dickins had applied the theory of the Uncanny Valley to vocal recordings. The sound of the human voice in a spatial environment is exceedingly familiar to us as human beings, much in the same way that human appearance and movement are both ingrained in our consciousness. Because of this familiarity, vocal recordings in a spatial environment such as 3D positional audio can be particularly vulnerable to the Uncanny Valley effect. Very small and subtle degradation in the audio output of a spatially localized voice recording may trigger a sense of deep-rooted unease.
Glenn Dickins of Dolby Laboratories
As we embark on three dimensional audio environments for virtual reality games, the sorts of sound compression typically used in video game design may become problematic, particularly in relation to voice recordings in games. While a typical gamer might not recognize that a vocal recording had been compressed, the gamer might nevertheless feel that there was something “not quite right” in the sound of the character’s voices. Compression of audio subtly changes the vocal sound in ways that are usually unnoticeable, but may become disruptive in a VR aural environment in which imperfections have the potential to nudge the audio into the Uncanny Valley.
Music in VR: Some Good News
While I’ve talked in this blog before about the importance of defining the role that music should play in the three-dimensional aural environment of a virtual reality game, Francis Rumsey offers an entirely different viewpoint in his talk. He thinks that when it comes to music, listeners don’t really care about spatial audio. That might be good news for game composers, because this may mean that music may play no role in the Uncanny Valley effect.
Describing a study that was conducted to determine how both naive and experienced listeners perceived spatial audio, Rumsey showed that when it came to listening to music, the spatial positioning wasn’t considered tremendously important. Sound quality was held to be absolutely crucial, but this desire was neither heightened nor lessened by spatial considerations. So does this mean that when it comes to music, listeners have an enhanced suspension of disbelief? Are they willing to accept music into their VR world, even if it isn’t realistically positioned within the 3D space? If so, then this would mean that non-diegetic music (i.e. music that isn’t occurring within the fictional world of the game) may not need to be spatially positioned as carefully as either voice or sound design elements of the aural environment. This may prove useful to audio teams, who may turn to music as a reassuring agent in the soundscape, binding the aural environment together and promoting emotional investment and immersion. However, music’s role in virtual reality may not conform to the way in which listeners react to spatially positioned music in other situations. At any rate, the issue certainly needs further study and experimentation to clarify the role that non-diegetic music should play in a VR game.
For other types of music in VR, the situation may be much simpler. Music doesn’t always have to occupy the traditional “underscore” role that it typically serves during gameplay. In a “music visualizer” VR experience, spatial positioning may become entirely unnecessary, because the music is serving the purpose of pure foreground entertainment (much the same way that music entertains listeners on its own). Here’s a preview of a musically-reactive virtual world in the upcoming “music visualizer” game Harmonix Music VR, created by the developer of the famous and popular game series Rock Band and Dance Central:
Rumsey concluded his talk with the observation that near accurate may be worse than not particularly accurate… in other words, if it’s supposed to sound real, then it had better sound perfectly real. Otherwise, it might be better to opt for a stylized audio environment that exaggerates and heightens the world rather than faithfully reproducing it. I hope you enjoyed this blog, and please let me know what you think in the comments below!
Winifred Phillips is an award-winning video game music composer whose most recent project is the triple-A first person shooter Homefront: The Revolution. Her credits include five of the most famous and popular franchises in video gaming: Assassin’s Creed, LittleBigPlanet, Total War, God of War, and The Sims. She is the author of the award-winning bestseller A COMPOSER’S GUIDE TO GAME MUSIC, published by the Massachusetts Institute of Technology Press. As a VR game music expert, she writes frequently on the future of music in virtual reality video games. Follow her on Twitter @winphillips.
Ready or not, virtual reality is coming! Three virtual reality headsets are on their way to market and expected to hit retail in either late 2015 or sometime in 2016. These virtual reality systems are:
VR is expected to make a big splash in the gaming industry, with many studios already well underway with development of games that support the new VR experience. Clearly, VR will have a profound impact on the visual side of game development, and certainly sound design and voice performances will be impacted by the demands of such an immersive experience… but what about music? How does music fit into VR?
At GDC 2015, a presentation entitled “Environmental Audio and Processing for VR” laid out the technology of audio design and implementation for Sony’s Project Morpheus system. While the talk concentrated mainly on sound design concerns, speaker Nicholas Ward-Foxton (audio programmer for Sony Computer Entertainment) touched upon voice-over and music issues as well. Let’s explore his excellent discussion of audio implementation for a virtual space, and ponder how music fits into this brave new virtual world.
Nicholas Ward-Foxton, during his GDC 2015 talk.
But first, let’s get a brief overview on audio in VR:
3D Positional Audio
All three VR systems feature some sort of positional audio, meant to achieve a full 3D Audio Effect. With the application of the principles of 3D Audio, sounds will always seem to be originating from the virtual world in a realistic way, according to the location of the sound-creating object, the force/loudness of the sound being emitted, the acoustic character of the space in which the sound is occurring, and the influences of obstructing, reflecting and absorbing objects in the surrounding environment. The goal is to create a soundscape that seems perfectly fused with the visual reality presented to the player. Everything the player hears seems to issue from the virtual world with acoustic qualities that consistently confirm an atmosphere of perfect realism.
To get a greater appreciation of the power of 3D audio, let’s listen to the famous “Virtual Barber Shop” audio illusion, created by QSound Labs to demonstrate the power of Binaural audio.
Head Tracking and Head-Related Transfer Function
According to Nicholas Ward-Foxton’s GDC talk, to make the three-dimensional audio more powerful in a virtual space, the VR systems need to keep track of the player’s head movements and adjust the audio positioning accordingly. With this kind of head tracking, sounds swing around the player when turning or looking about. This effect helps to offset an issue of concern in regards to the differences in head size and ear placement between individuals. In short, people have differently sized noggins, and their perception of audio (including the 3D positioning of sounds) will differ as a result. This dependance on the unique anatomical details of the individual listener is known as Head-Related Transfer Function. There’s an excellent article explaining Head-Related Transfer Function on the “How Stuff Works” site.
Head-Related Transfer Function can complicate things when trying to create a convincing three-dimensional soundscape. When listening to identical binaural audio content, one person may not interpret aural signals the same way another would, and might estimate that sounds are positioned differently. Fortunately, head tracking comes to the rescue here. As Ward-Foxton explained during his talk, when we move our heads about and then listen to the way that the sounds shift in relation to our movements, our brains are able to adjust to any differences in the way that sounds are reaching us, and our estimation of the spatial origination of individual sounds becomes much more reliable. So the personal agency of the gaming experience is a critical element in completing the immersive aural world.
Music, Narration, and the Voice of God
Now, here’s where we start talking about problems relating directly to music in a VR game. Nicholas Ward-Foxton’s talk touched briefly on the issues facing music in VR by exploring the two classifications that music may fall into. When we’re playing a typical video game, we usually encounter both diegetic and non-diegetic audio content. Diegetic audio consists of sound elements that are happening in the fictional world of the game, such as environment sounds, sound effects, and music being emitted by in-game sources such as radios, public address systems, NPC musicians, etc. On the other hand, non-diegetic audio consists of sound elements that we understand to be outside the world of the story and its characters, such as a voice-over narration, or the game’s musical score. We know that the game characters can’t hear these things, but it doesn’t bother us that we can hear them. That’s just a part of the narrative.
VR changes all that. When we hear a disembodied, floating voice from within a virtual environment, we sometimes feel, according to Ward-Foxton, as though we are hearing the voice of God. Likewise, when we hear music in a VR game, we may sometimes perceive it as though it were God’s underscore. I wrote about the problems of music breaking immersion as it related to mixing game music in surround sound in Chapter 13 of my book, A Composer’s Guide to Game Music, but the problem becomes even more pronounced in VR.When an entire game is urging us to suspend our disbelief fully and become completely immersed, the sudden intrusion of the voice of the Almighty supported by the beautiful strains of the holy symphony orchestra has the potential to be pretty disruptive.
The harpist of the Almighty, hovering somewhere in the VR world…
So, what can we do about it? For non-diegetic narration, Ward-Foxton suggested that the voice would have to be contextualized within the in-game narrative in order for the “voice of God” effect to be averted. In other words, the narration needs to come from some explainable in-game source, such as a radio, a telephone, or some other logical sound conveyance that exists in the virtual world. That solution, however, doesn’t work for music, so it’s time to start thinking outside the box.
Voice in our heads
During the Q&A portion of Ward-Foxton’s talk, an audience member asked a very interesting question. When the player is assuming the role of a specific character in the game, and that character speaks, how can the audio system make the resulting spoken voice sound the way it would to the ears of the speaker? After all, whenever any of us speak aloud, we don’t hear our voices the way others do. Instead, we hear our own voice through the resonant medium of our bodies, rising from our larynx and reverberating throughout our own unique formant, or acoustical vocal tract. That’s why most of us perceive our voices as being deeper and richer than they sound when we hear them in a recording.
Ward-Foxton suggested that processing and pitch alteration might create the effect of a lower, deeper voice, helping to make the sound seem more internal and resonant (the way it would sound to the actual speaker). However, he also mentioned another approach to this issue earlier in his talk, and I think this particular approach might be an interesting solution for the “music of God” problem as well.
“I wanted to talk about proximity,” said Ward-Foxton, “because it’s a really powerful effect in VR, especially audio-wise.” Referencing the Virtual Barber Shop audio demo from QSound Labs, Ward-Foxton talked about the power of sounds that seem to be happening “right in your personal space.” In order to give sounds that intensely intimate feeling when they become very close, Ward-Foxton’s team would apply dynamic compression and bass boost to the sounds, in order to simulate the Proximity Effect.
The Proximity Effect is a phenomenon related to the physical construction of microphones, making them prone to add extra bass and richness when the source of the recording draws very close to the recording apparatus. This concept is demonstrated and explained in much more depth in this video produced by Dr. Alexander J. Turner for the blog Nerds Central:
So, if simulating the Proximity Effect can make a voice sound like it’s coming from within, as Ward-Foxton suggests, can applying some of the principles of the Proximity Effect make the music sound like it’s coming from within, too?
Music in our heads
This was the thought that crossed my mind during this part of Ward-Foxton’s talk on “Environmental Audio and Processing for VR.” In traditional music recording, instruments are assigned a position on the stereo spectrum, and the breadth from left to right can feel quite wide. Meanwhile, the instruments (especially in orchestral recordings) are often recorded in an acoustic space that would be described as “live,” or reverberant to some degree. This natural reverberance is widely regarded as desirable for an acoustic or orchestral recording, since it creates a sensation of natural space and allows the sounds of the instruments to blend with the assistance of the sonic reflections from the recording environment. However, it also creates a sensation of distance between the listener and the musicians. The music doesn’t seem to be invading our personal space. It’s set back from us, and the musicians are also spread out around us in a large arc shape.
So, in VR, these musicians would be invisibly hovering in the distance, their sounds emitting from defined positions in the stereo spectrum. Moreover the invisible musicians would fly around as we turn our heads, maintaining their position in relation to our ears, even as the sound design elements of the in-game environment remain consistently true to their places of origin in the VR world. Essentially, we’re listening to the Almighty’s holy symphony orchestra. So, how can we fix this?
One possible approach might be to record our music with a much more intimate feel. Instead of choosing reverberant spaces, we might record in perfectly neutral spaces and then add very subtle amounts of room reflection to assist in a proper blend without disrupting the sensation of intimacy. Likewise, we might somewhat limit the stereo positioning of our instruments, moving them a bit more towards the center. Finally, a bit of prudently applied compression and EQ might add the extra warmth and intimacy needed in order to make the music feel close and personal. Now, the music isn’t “out there” in the game world. Now, the music is in our heads.
Music in VR
It will be interesting to see the audio experimentation that is surely to take place in the first wave of VR games. So far, we’ve only been privy to tech demos showing the power of the VR systems, but the music in these tech demos has given us a brief peek at what music in VR might be like in the future. So far, it’s been fairly sparse and subtle… possibly a response to the “music of the Almighty” problem. It is interesting to see how this music interacts with the gameplay experience. Ward-Foxton mentioned two particular tech demos during his talk. Here’s the first, called “Street Luge.”
The simple music of this demo, while quite sparse, does include some deep, bassy tones and some dry, close-recorded percussion. Also, the stereo breadth appears to be a bit narrow as well, but this may not have been intentional.
The second tech demo mentioned during Ward-Foxton’s talk was “The Deep.”
The music of this tech demo is limited to a few atmospheric synth tones and a couple of jump-scare stingers, underscored by a deep low pulse. Again, the music doesn’t seem to have a particularly wide stereo spectrum, but this may not have been a deliberate choice.
I hope you enjoyed this exploration of some of the concepts included in Nicholas Ward-Foxton’s talk at GDC 2015, along with my own speculation about possible approaches to problems related to non-diegetic music in virtual reality. Please let me know what you think in the comments!