Big data in developing v developed countries

Posted on February 25, 2014  /  3 Comments

Yesterday I listened sporadically to a live streamed conference on Big Data. Sporadic was not intentional. I am in Dili, Timor Leste, where most connectivity is via satellite with latencies in the 700ms range.

Anyway, the focus was not on big data per se. They talked about all sorts of things, mostly open data (in the parts I heard) and crowd-sourced data.

But the classic story about flu prediction made me think. Are analytics of search behavior equally useful in developed and developing countries?

The flu story about is the use of Google search terms to predict the outbreak and spread of influenza across the United States, much faster than the conventional epidemiological methods. This is based on the use of by-products of millions of individual actions in the form of people looking up flu-related search terms on Google.

In a developing country like Sri Lanka, it was estimated that there were 2.8 million Internet users (14 percent of the population). A great majority of them use their mobile handsets and dongles to connect to the Internet. The preliminary findings at the district level from the 2011-12 census show a highly disparate household pattern of Internet access from the home, ranging from a high of 26.9 percent in the district where the capital, Colombo, is located, to a low of 4.5 percent in the remote Moneragala District.

It is unlikely that the kind of “representativeness” that is yielded by the much higher number of Internet users in the United States will be yielded by a similar analysis of searches by Sri Lankan users. There is also the possibility that the cultures of searching for medical information may be different in different countries. The English language thesaurus building that would have been part of the US study would be difficult to replicate in countries where multiple languages are used.

So, like in everything else, I’d think we need to critically assess and adapt even the stories we tell about big data in developing countries.


  1. I largely agree. I was surprised not to see any mention of Google Dengue Trends on here though – any thoughts on that? They’re doing essentially the same thing with dengue fever in nine or so developing countries:

    As you mention, I’m skeptical of this data, and haven’t looked at Google’s methodology enough to see what they’re basing these numbers on and how they’re assessing their success. (Good points on multiple languages and cultural differences for Googling medical conditions.) Sri Lanka’s not on their list but I know many of the countries covered face some of the same issues.

  2. To look beyond google flu trends, big data includes online media, radio, telecommunications, economic transactions, energy consumption, remote sensing, and program data (e.g. logistics).

    To respond to the question you posed ‘are analytics of search behaviour equally useful in developed and developing countries?’

    The technology for both producing and analysing this data exists and its use will only increase over time. Our challenge is to use it appropriately to ensure that our interpretations accurately reflect reality. Digital exhaust data is not designed for purpose so this means matching our programmatic goals with appropriate data sources, and ensuring that our interpretation of the data acknowledges its inherent limitations and biases.

    Going back to your example of google search data: It is not representative of everyone, but then – does it have to be? It depends on how you want to use the data. So if you want to gain global-level insights on influenza, google flu trends could be a great tool. If you want to use it to make sub-national policy decisions in East Timor or Sri-Lanka, then you would most likely be better off with other sources of information.

    Linking big data sources to the context will require a fundamental shift in the way that we design and implement monitoring and evaluation (M&E) activities. We are not designing M&E solutions by developing indicators on the front-end – we are mining data on the back-end. This means that we need to work in two directions simultaneously to identify existing programmatic activities and existing data sources, and then connect them to gain actionable insights.

    The key will be in developing a framework through which we can determine which types of projects and data sources are likely to be a good match. Once this framework is in place, it will require the close collaboration of the development sector and big data domain specialists to critically assess contextual factors and the available data sources, before moving forward to develop and test an actionable M&E tool.

    I am currently producing a toolkit for applying big data for M&E at Pulse Lab Jakarta, United Nations Global Pulse. We are keen to collaborate and gain insights from other projects. I would be very happy to hear your thoughts and can be reached by e-mail at

  3. It appears that the flu story is not working even for the developed countries: