Data Quality


Hernan Galperin from DIRSI had organized a session entitled Data for development: the good, the bad and the ugly. Martin Hilbert was originally featured as the star speaker who would tell the audience about the wonders of big data. Well, he did not turn up. So it was left to LIRNEasia, where we actually get our hands dirty analyzing big data of relevance to our primary clients, the poor of the developing world, to talk about big data. The slides are here.
Fidelity of digitized data in the Real-Time Biosurveillance Program (RTBP) was not promising; especially with the personnel in Sri Lanka with no medical knowledge but technically capable were producing up to 45% noisy data (second stacked graph). On the contrary the medically trained but less fluent in mobile phone usage Indian nurses were less prone to producing noisy data. The Indian health workers had an incentive because the erroneous data would produce false alarms, and they would need to respond to these false alarms or it would portray a bad image of the health situation in their area; while the Sri Lanka data digitizing personnel had no incentive besides picking up a paycheck for the data entry work they did. The data was submitted through the mHealthSurvey mobile software that works on less expensive Java-enabled hand-helds. The RTBP envisions that hospital data is submitted each day; thus, the real-time expectations.