Our findings from the recently concluded Interactive Voice-enabled alerting and situational reporting pilot revealed that Speech-To-Text and Text-To-Speech were impossible to apply with audio over low quality transmission networks (listen to this audio to get a sense how bad it can be). One could sample at much higher frequencies then that produces an extremely large mega byte file which may take hours to multi-cast; hence, not recommended for critical life-saving communications. Our conclusions drawn were mainly on the situational reporting functions.
The U.S. has evidence to support the suspension of Common Alerting Protocol (CAP) text transformations to speech for emergency alert broadcasts. These come during their efforts to deploy the Commercial Mobile Alert System.
An interesting excerpt from the article – Many of those in my community have a hard time understanding the current version of text to speech. In other words, us old folks can’t hear what the computer is saying. There’s also the issue of geographical differences in words. For example, is “soda” and “pop” the same as “soda pop” or “Coke”. If one were to write “I’d like a Coke and fries”, the computer will read that hearer may need more information, ex. “We don’t serve Coke, is Royal Crown Cola OK?”
These facts are from a CAP working document that is attempting to establish the file size of CAP messages to reduce alerting latencies, transmission bottlenecks, and terminal device constraints. They are important issues when one has to consider alerting over none IPv6 enabled video and audio terminal devices. It also affects web servers when a horde tries to load web pages with critical emergency alert information. I will write more on this topic after the report is finalized.