"The company says the key to its advance is the incorporation of non-speech sounds into its audio; training its AI models to recreate those small intakes of breath — tiny scoffs and half-hidden chuckles — that give real speech its stamp of biological authenticity," according to the Verge's James Vincent.
In the spotlight is Sonantic, an AI voice startup whose goal is to become "the CGI of audio." Although, rough products in this emerging sector have been presented to us in the past, in the last few years many leaps have been made.
As Vincent reports, "Emotional choices for delivery include anger, fear, sadness, happiness, and joy, and, with this week’s update, flirtatious, coy, teasing, and boasting. A 'director mode' allows for even more tweaking: the pitch of a voice can be adjusted, the intensity of delivery dialed up or down, and those little non-speech vocalizations like laughs and breaths inserted."
After given a line of script, the AI was able to produce - ad hoc and without manual adjustments - the following moods: flirty, pleasing, teased, cheerful, and casual. The results were mixed. Further brushing, however, could be applied to create an even more convincing product for any user.
So far, Sonantic has been able to work with Mercedes-Benz, gaming companies, and film producers. But the breadth of the industry this type of technology creates does raise some ethical concerns. Can this empower digital scam artists in ways unimaginable?
Vincent leaves off his review with the question: "If AI voices can convincingly flirt, what might they persuade you to do?"