What is big data?

Posted on November 18, 2015

I spent the last two days at a meeting on big data in the global south. The sixty people in the room had no shared understanding of big data, which led to some interesting discussions. Then someone stated that he wished big data would be defined.

Big data is characterized by volume and variety. The third part of the 3 Vs, velocity, is irrelevant, as has been argued by many including Viktor Mayer-Schonberger. What are important are volume and variety that cannot be handled by conventional software and visualization techniques.

The data that fulfill the 2Vs may be categorized as non-behavioral big data and behavioral big data. The former is not related to human behavior and does not require the public policy interventions that many see as necessary. Examples are data from the Hubble Telescope and the content of all the books datafied by Google.

The latter is what is in the realm of social science and is a legitimate concern for public policy. Transaction-generated data is another way of describing the latter kind of big data. But it is important to always ask whether variability is present. For example, however big the list of Aaadhaar numbers and names associated with them, that does not constitute big data. This is a static, structured database, not big data. But when data are generated by transactions using Aadhaar, that is indeed big data.

