LIRNEasia is a regional ICT policy and regulation think tank active across the Asia Pacific

Work with us: We are currently recruiting for the positions of Big Data Researcher and Project Manager - Statistician.

Utility of mHealth epidemiological surveillance hinges on quality of data

Ratio between noisy and clean data for India and Sri Lanka

Fidelity of digitized data in the Real-Time Biosurveillance Program (RTBP) was not promising; especially with the personnel in Sri Lanka with no medical knowledge but technically capable were producing up to 45% noisy data (second stacked graph). On the contrary the medically trained but less fluent in mobile phone usage Indian nurses were less prone to producing noisy data. The Indian health workers had an incentive because the erroneous data would produce false alarms, and they would need to respond to these false alarms or it would portray a bad image of the health situation in their area; while the Sri Lanka data digitizing personnel had no incentive besides picking up a paycheck for the data entry work they did. The data was submitted through the mHealthSurvey mobile software that works on less expensive Java-enabled hand-helds.

High counts of fever cases in a single location on a single day

The RTBP envisions that hospital data is submitted each day; thus, the real-time expectations. However, there were irregularities with batch entry of data. This is perfectly fine provided the actual patient visitation time is recorded (case-date).  Moreover, they were submitting cases of fever. This can be easily explained with the fact that the recruited data entry personnel were required to submit an average number of records per month and it is possible that they were cheating to meet their quota to receive the full pay check. In the statistical analyses tools this would depict an unusual escalation of fever cases for that day. Similar malicious coding of health records related to Measles and Tetanus disease burdens were found in the analyses carried out by the research analyst – Lujie Chen (Auton Lab).

Lujie also observed biases in preliminary diagnosis of certain disease, which manifest in in similar way, for example flu-like symptoms. The Variance in doctors preference can range from common cold, Cough, Respiratory Tract Infection. This issue of doctor preference may dilute signals and reduce the ability to quickly detect emerging outbreaks.

The effects of low quality data invalidating epidemiological surveillance, is a challenge that the RTBP faces (click to view paper on “automated detection of data entry errors“. These findings were presented at the International Society for Disease Surveillance 2010 conference in Park City (UT), USA by our colleagues from Carnegie Mellon Universities Auton Lab.

Click to view the ISDS 2010 data quality slides

A paper on the “challenges of introducing disease surveillance technology in developing countries“; was also submitted to the ISDS 2010 conference.


Research Mailing List

Enter your email for research updates:


Flickr Photos