Posted on Tuesday, January 29, 02013 by Kelsey Westphal
Speech recognition software is everywhere—businesses use it to streamline customer phone calls, digital dictation software allows you to speak emails and essays, and, most recently, the iPhone’s surprisingly cheeky Siri can call, text, or look up information online with just a few verbal commands. With the aid of Deep Neural Networks, a mathematical technique patterned after human brain behavior, researchers at the University of Toronto and Microsoft Research have found a way to increase the accuracy of speech recognition to around 85%. This complex and relatively new technology is promising on its own, but when integrated with advanced translation software, has been used to produce a prototype of what could one day become a simultaneous personal translator, not unlike the iconic Universal Translator of Star Trek.
Though not mounted on a communicator pin or ready to communicate with aliens, this technology is still highly advanced, with multiple steps. First, the original speech is translated word-for-word into the second language. Next, the translated words are rearranged into grammatically appropriate phrases in the target language. The resulting translation is then spoken, not in the stilted, metallic voice of a computer, but in your own voice! To do this, an hour or so of recordings of your voice and that of a native speaker’s of the target language are necessary in order to preserve the speakers vocal identity while also creating comprehensible expressions in another language.
There are still some kinks to work out, of course, but the possibilities this suggests for overcoming language boundaries are worth thinking about. Conversations between cultures could become more balanced: neither party would feel as though they were “imposing” their language on the other, and both could speak in the tongue they find most amenable. In diplomacy, business, travel and the arts, this new translation tool could produce profound breakthroughs in communication and more importantly, understanding between cultures and people. As anyone who has used a translation site knows, computer generated translations can often go comically awry, and this program certainly runs the same risk of miscommunication as any other. All the same, the thought of hearing your own voice in another language is a bizarre and fascinating prospect, one that will hopefully attract researchers and language lovers alike to search for solutions.
One only hopes that this technology will be adapted not only to serve speakers of Chinese or French but also of lesser known languages. One positive development on this front is Microsoft’s adaptation of Haitian Creole and the Hmong language for its Bing translation service. This slow but thorough aggregation of diverse languages will ideally make it so that eventually no language community, however small, is left without a voice in global discourse — even if it is a computer generated one.
If you would like to see a video of this process in action, check out the video above of Microsoft's Chief Research Officer Rick Rashid speaking in English to a Chinese audience.