Everyone is looking for the killer app that can serve the non-digizen (non digital citizens). There is a lot of hype about smart phones but the practical field level thinkers have realized voice is the better solution. CGNet Swara a citizen journalism project, TCS Innovation Lab’s work on the use of speech for querying railway information1, IITM-RTBI’s Agriculture Information exchange, are a few of many Interactive Voice Response (IVR) enabled solutions that are taking shape in the region.
Key reasons for the innovations surrounding IVR are to overcome the problems with key pad entry (pressing W thrice for Y) and traditional English based applications. It doesn’t get easier than pressing a few digits to dial a number and speak your mind or listen to a message. A larger challenge is in addressing the multitude of languages, with Asia home to hundreds of them. Voice is not restricted by language character sets.
The paper: “Challenges of implementing Standardized Emergency Data Exchange with Interactive Voice Response in Sri Lanka” was presented at the International Telecommunications Society Indian 2012 conference. It discussed results from the feasibility study that investigated the possibilities of extending the Freedom Fone IVR to Sarvodaya Community Emergency Response Team (CERT) members. Moreover, transforming the voice to text, then categorically representing the informatio in the Sahana disaster management system. The categorical information is important for analyse to facilitate rapid decision support such as scheduling of resources for various response activities. Similar to maps that visually present data I foresee the future Sahana tools accommodating voice as ways to communicate the same information.
A typical situational reporting activity would have CERT members call the Freedom Fone IVR to report field observations. Those voice messages are processed by an operator at the incident management center. The parsed information is entered in to the Sahana system. An Automatic Speech Recognition (ASR) system could, ideally, replace this laborious human process. However, there are two key shortcomings that suspends such an innovation.
- Low quality voice transmissions over the cellular networks making it harder to gather information from the noisy audio recordings; thus, such noisy audio cannot be subject to any kind of automated speech-to-text transformation
- Emergency communication requires large vocabulary continuous speech processes; ASR technology, to-date, is best suited for keyword recognition; not for large vocabulary continuous speech
The first problem is prevalent in all cases, which was also a factor recognized by the TCS authors in their paper presented at the ITS India 2012 conference. There are software algorithms that can be trained to cancel the environmental noise. The second dilemma does not apply to the TCS Railway Information case because it is based on a fixed set of attributes, namely, times and locations (keywords). Hence, an ASR can be trained quite easily to adapt to the railway scenario. On the contrary the factors associated with emergency communications are far more sparse and cumbersome for present day ASR to handle. However, research in this area can foster developments in inclusive technologies to serve such public goods.
1“Challenges in Enabling Speech as a Service Channel for Indian Railway Scenario”. Charudatta Jadhav, Imran Ahmed, Meghna Pandharipande Venkatakrishna T, Mithun BS, Vrushali Kulkarni, Chitralekha Bhat, Arun Pande, Sunil Kumar Kopparapu TCS Innovation Labs – Mumbai Tata Consultancy Services Yantra Park, Thane (West) Maharashtra 400601, INDIA.