Yesterday I listened sporadically to a live streamed conference on Big Data. Sporadic was not intentional. I am in Dili, Timor Leste, where most connectivity is via satellite with latencies in the 700ms range.
Anyway, the focus was not on big data per se. They talked about all sorts of things, mostly open data (in the parts I heard) and crowd-sourced data.
But the classic story about flu prediction made me think. Are analytics of search behavior equally useful in developed and developing countries?
The flu story about is the use of Google search terms to predict the outbreak and spread of influenza across the United States, much faster than the conventional epidemiological methods. This is based on the use of by-products of millions of individual actions in the form of people looking up flu-related search terms on Google.
In a developing country like Sri Lanka, it was estimated that there were 2.8 million Internet users (14 percent of the population). A great majority of them use their mobile handsets and dongles to connect to the Internet. The preliminary findings at the district level from the 2011-12 census show a highly disparate household pattern of Internet access from the home, ranging from a high of 26.9 percent in the district where the capital, Colombo, is located, to a low of 4.5 percent in the remote Moneragala District.
It is unlikely that the kind of “representativeness” that is yielded by the much higher number of Internet users in the United States will be yielded by a similar analysis of searches by Sri Lankan users. There is also the possibility that the cultures of searching for medical information may be different in different countries. The English language thesaurus building that would have been part of the US study would be difficult to replicate in countries where multiple languages are used.
So, like in everything else, I’d think we need to critically assess and adapt even the stories we tell about big data in developing countries.