Bad voice quality suspends Automatic Speech Recognition for Emergency Communication

Posted on June 5, 2012  /  0 Comments

In order to establish the fact that the voice quality over currently available GSM networks are poor for converting the voice messages to text. These finds are from the Voice-enabled ICTs for Disaster Management project that field tested the use of an Interactive Voice Response system for extending emergency communications to the last-mile.

Situational reports received from Community Emergency Response Team members, through their mobile phones, resulted in an Mean Opinion Score (MOS) of less than 4.0, on a scale of 1.0 – 5.0. The trial under a speaker-dependent scenario resulted in a MOS of 3.39 and under a speaker-independent scenario it was 3.52.

When we calculated the difference between the speaker-dependent and speaker-independent MOS scores, for each of the test-sites (Districts), it resulted in the graphs to the left. The plot does present an interesting off-phase sinusoidal series of plots; however, interpreting the effect presents itself as a challenging problem.

The findings were presented at the IEEE 9th Joint International Conference on Computer Science and Software Engineering (JCSSE2012). The paper titled “Interactive Voice Response Uncertainties for Emergency Communication Suspends Automation” discusses the difficulties and recommends that speech to text transformations should be postponed until such time the cellular networks improve such as with 3G or 4G technology. The slides that were screened in Bangkok during the talk on June 1st.

Comments are closed.