The New York Times carried a story on “big data for development” that featured Global Pulse, the UN initiative seeking to harness the potential of data to address development questions, much like what we are doing in our current research. The efforts by Global Pulse and a growing collection of scientists at universities, companies and nonprofit groups have been given the label “Big Data for development.” It is a field of great opportunity and challenge. The goal, the scientists involved agree, is to bring real-time monitoring and prediction to development and aid programs. Projects and policies, they say, can move faster, adapt to changing circumstances and be more effective, helping to lift more communities out of poverty and even save lives.
It was one thing for Gmail to ask “did you intend to attach a document to this email?” based on your use of the word “attached” in the email. But it moves things to a whole new level when an app analyzes your digital bread crumbs and tells you stuff that you haven’t even thought about. The services guess what you want to know based on the digital breadcrumbs you leave, like calendar entries, e-mails, social network activity and the places you take your phone. Many use outside services for things like coupons, news and traffic.
The ethic of reciprocity is perhaps the most fundamental principle governing human interaction. I once studied this in some depth for the purpose of teaching interconnection of all things. My favorite was Rabbi Hillel’s formulation: “That which is hateful to you, do not do to your fellow. That is the whole Torah; the rest is the explanation; go and learn it.”—Talmud, Shabbat 31a, the “Great Principle” So now, Russia wants the ethic of reciprocity applied to the metadata, the collection of which President Obama said was no problem at all.
We were not quite ready to start talking about the privacy issues surrounding the massive amounts of data generated by telcos in the course of making it possible for people to communicate, but recent news events are accelerating the schedule. I thought it might be useful to start with this quote from someone I used to work with in the 1990s: “American laws and American policy view the content of communications as the most private and the most valuable, but that is backwards today,” said Marc Rotenberg, the executive director of the Electronic Privacy Information Center, a Washington group. “The information associated with communications today is often more significant than the communications itself, and the people who do the data mining know that.” Full report.
Returning to the privacy field after a break of more than 10 years, I was struck by how inappropriate the old notice and consent approaches would be for what was actually happening on the ground. Here is an attempt to evolve new principles. Not had time to fully digest yet. Traditional approaches are no longer fit for the purposes for which they were designed, for several reasons: • They fail to account for the possibility that new and beneficial uses for the data will be discovered, long after the time of collection. • They do not account for networked data architectures that lower the cost of data collection, transfer and processing to nearly zero, and enable multiuser access to a single piece of data.
Alex Pentland of MIT has been working on mobile big data (as are we at LIRNEasia). Here is a snippet of an interview in the NYT: The phone tracks our movements, as well as our calls and texts, so it can reveal a lot about our daily lives. What did you learn about yourself by studying your own cellphone data? That I’m very predictable. We tend to pay attention only to the new things in our lives.
In the old days one needed supercomputers to analyze big data. American Express was the second largest customer for Cray after the NSA. Then you could do analysis on normal computer computers but with fancy software like T Cube. Now Microsoft plans on building these capabilities into Excel. Next year’s version of the Excel spreadsheet program, part of the Office suite of software, will be able to comb very large amounts of data.
In 1992, I wrote parts of a report for the National Regulatory Research Institute in the US on privacy and competitive implications for transaction-generated information (a term that has been eclipsed by the less informative “big data” in recent times). We covered all utilities, including electricity. Burns,Robert; Samarajiva, Rohan & Mukherjee, Roopali (1992) Customer information: Privacy and competitive implications, NRRI 92-11 . Columbus OH: National Regulatory Research Institute. Now, 20 years later, the issue is hot, the subject of a BBC story: The EDPS report voices concern over the “potential intrusiveness” of smart meters, which it says can track what members of a household do in the privacy of their homes.
Acxiom does a lot more than just analyze streams of transaction-generated information (our definition of big data). But TGI is an important element of what does into Acxiom’s machines. Few consumers have ever heard of Acxiom. But analysts say it has amassed the world’s largest commercial database on consumers — and that it wants to know much, much more. Its servers process more than 50 trillion data “transactions” a year.
I wrote about consumer transaction-generated information in the 1990s. Companies collected and analyzed data from sales points and loyalty programs. But it became sexy only recently. Why? It should not be too surprising that a Google-created entity should have this bent.