Text-to-Speech Shouting: Our Most Requested Capability

Request access

Try our hyper-realistic voices

Zeena Qureshi

CEO & Co-founder

April 21, 2021

Unveiling another first: a shouting AI voice

The AI technology behind text-to-speech is constantly evolving, and here at Sonantic, we’re at the forefront of these developments. Shouting is one of the newest additions to our expressive AI-voice technology, bringing a powerful new layer of intensity to our already captivating voices. 

Sonantic is the first company to develop a realistic shouting style for AI voices, and we’re proud to be pioneering this breakthrough. For our clients in the gaming and film industry, this level of projection created by text-to-speech technology is a long-awaited first for AI voices. This frequently requested addition helps studios develop compelling narratives in new games and films for a wider range of situations like action, spatial awareness, combat barks, and reactions. 

Experience our shouting feature at work

As you can hear in the demo below, shouting allows for more realistic storytelling and creating an increasingly immersive experience. As we know showing is better than telling, we’ve created a scene to illustrate how it helps take game development to the next level. 

The scene is a story most people are familiar with: a simple, everyday interaction with a loved one devolving into an argument, seemingly out of nothing. We hear a woman coming home to her male partner. She’s exhausted from a long day at work, he’s frustrated that she’s late again but didn’t let him know, and dinner is now cold. We can quickly understand and probably relate to what they’re both feeling. What starts as a civil conversation, the woman suggesting they reheat dinner, quickly escalates, and the projection and pacing of their voices develop until they’re both shouting. 

As the conversation becomes more heated, we can feel the resentment both characters have bubbling beneath the surface. The issues in their relationship clearly run much deeper than just her being late again and dinner getting cold. Through the buildup to the shouting, we discover he feels she doesn’t care how her repeated actions affect him, and she thinks he’s too needy and negative. As the narrative designer, Gaël van den Bossche explains, “The shouting allows for the scene to have an emotional climax, where rational discussion falls away and the argument descends into trying to hurt each other." The shouting begins and there’s name-calling, a threat to leave, and a clear, quick escalation. The de-escalation is equally fast, and we learn these kinds of arguments are frequent in their relationship when the woman apologizes and the man calmly asks, “Why do we always do this?” The changes in projection allow us to interpret far more than we could with just the words themselves.

Behind the demo

According to van den Bossche, “The idea behind writing this demo was to have a scene which showcases a steadily escalating display of emotion from the models. What starts as a disagreement over a trivial thing turns into a full-on shouting match over deeper issues in the relationship of the characters." 

Projection is its own vocal control and offers an additional method for shifting tone and conveying a different message. While in this demo the speakers are shouting from anger, you can be angry without shouting, and you can shout without anger. This is an important distinction and exhibits how Sonantic’s new AI shouting voice offers deep insight into the nuances of conversation.  

This kind of natural projection and escalation is an everyday occurrence and was Sonantic’s most requested feature by studios in 2020. 

Modeling exertion 

It’s no surprise studios have been yearning for this feature, since previous attempts to model shouting by researchers in academia and other companies have failed. As Sonantic’s Co-Founder and CTO John Flynn explains, "How do you get the AI to model shouting convincingly? It's quite difficult. When you first try, it just sounds like a person talking normally, just turned up in volume. To accurately model shouting we had to do some hard research to focus the models on the exertion and strain of the voice. This then gives that realistic sound - it's true shouting."

In order to create this believable, true shouting, it was first necessary to model the sound of vocal cords under loud stress. Sonantic’s researchers worked in close collaboration with voice actors to ensure their performances would enable the models to learn from the best data of exerted voice possible. According to Flynn, this was a vital step in the development process since “successful machine learning is always as much about the data as it is about the algorithm.”

More still to come

The game industry thrives due to advancements in making narratives more immersive than ever. All of Sonantic’s AI voices support studios in creating believable characters in captivating stories, and our latest release adds shouting as another level of projection. 

Sonantic’s platform is evolving alongside its technology, and our team of experts works to continuously expand our text-to-speech capabilities. Enhancing creativity, slashing costs, and streamlining workflows are top priorities for our clients, and we’re keeping those needs at the forefront of our development. We hope you’ll enjoy and benefit from our new shouting feature, and our many revolutionary capabilities still to come.

→ Return to blog

Heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

  • It prevents you to directly style elements of a RT when this RT is bound to the CMS. I hope this limitation will be removed soon, but in the meantime the workaround is pretty easy to handle. Proceed as follow:
  • CMS. I hope this limitation will be removed soon, but in the meantime the workaroun
  • Vents you to directly style elements of a RT when this RT is bound to the CMS. I hope this limitation will be removed soon, but in the meantime the wor