Big Data 4 Development

by Keshan de Silva and Yudhanjaya Wijeratne One of the most useful datasets we have is a collection of pseudoanaonymized call data records for all of Sri Lanka, largely from the year 2013. Given that Sri Lanka has extremely high cell coverage and subscription rates (we’re actually oversubscribed – there’s more subscribers than people in the country; an artifact of people owning multiple SIMS), this dataset is ripe for conducting analysis at a big data scale. We recently used it to examine the event attendance of the annual Nallur festival that happens in Jaffna, Sri Lanka. Using CDR records, we were able to analyze the increase in population of the given region during the time of the festival. A lengthy writeup describes it on Medium, explaining the importance of the festival and the logic for picking it.
Blumenstock, JE, Maldeniya, D, & Lokanathan, S
A confluence is the junction of two rivers, especially rivers of approximately equal width. My session at SAARC Law 2017 is entitled Confluence of Law and Technology. The way I see it, there is no alternative but to relax the requirement that the metaphorical rivers be of equal width. Unless, of course, we define law in the Lessig manner, East Coast Code being old style ink on paper interpreted by judges law and West Coast Code being self-enforcing rules built into hardware and software. So, anyway, I worked up a set of slides being from the tech side of the world.
The inaugural board meeting of the Global Partnership for Sustainable Development Data (GPSDD, more popularly known for their twitter @data4SDGs) was held on the 22nd of September.  I  participated as a GPSDD board member. Significant achievements have been made by GPSDD since its inception, culminating in high level support for the need for good data to measure SDGs, with many nation states making statements at the UN General Assembly which concluded just two days before the board meeting. But countries saying the right things (i.e.
Perera-Gomez, T. & Lokanathan, S.
A team of GIS experts at LIRNEasia is building an open re-demarcation tool to encourage trust in the process of electoral reforms.
Governments should not be flying blind. Now the tools of big data are available to reduce their ignorance. But we will not be able to use big data effectively if the narrative is dominated by utopian hype and dystopian scare mongering. For that we need effective, fit-for-purpose public public policy and regulation for big data (including algorithms), not remnants of 1970s thinking such as informed consent and strict purpose specification. For example, the above shibboleths do not provide any remedy for the real harms of lack of security of data storage.
Big data is a team sport. We have people with different skill sets in our team. I can’t code, but I sit in on meeting where arcane details of software are discussed. Our coders spend most of their time on analytics, but think about broader issues such as fairness. So here is a snippet that had the eye of Lasantha Fernando: If you’ve ever applied for a loan or checked your credit score, algorithms have played a role in your life.
Linnet Taylor correctly points out that US case law does not have applicability outside the US. However, the third-party doctrine set out in the Smith v Maryland case differentiated between transaction-generated data on a telecom network and the content of what was communicated. Now there’s likely to be a different governing precedent, for those under US law: The Supreme Court agreed on Monday to decide whether the government needs a warrant to obtain information from cellphone companies showing their customers’ locations. The Supreme Court has limited the government’s ability to use GPS devices to track suspects’ movements, and it has required a warrant to search cellphones. The new case, Carpenter v.
The digital world is exploding with uncountable data. Millions of users generate information via thousands of sources every day. This data is then consumed for a number of purposes from business to entertainment. Is there a purpose and potential for big data beyond business and entertainment? The big data team at LIRNEasia is trying to answer this question.
Professor Gregg Vanderheiden has a record of achievements in enabling the differently abled to use technology such as personal computers and automated teller machines. Through Raising the Floor, an international organization that he established, Professor Vanderheiden is working on an ambitious initiative to create a platform that will make it possible for various interfaces to “morph” into forms accessible to users with disabilities (which includes many people who are not so identified ordinarily). For the interfaces to be fully responsive to the unique needs of each of the users, the platform would have to know about their preferences and behaviors. Raising the Floor is taking the issues of putting in place strong safeguards for these data and to ensure that harms are avoided. For this purpose, they convened expert groups in Geneva and Washington DC.
I’ve been working on privacy since 1991. I guess when one has been engaged with a subject deeply, one escapes the bubble effect: that of believing that one particular issue/value is paramount. But I interact with many people now, who seem to think that privacy is a paramount value even if some of the “safeguards” they want to put in place would basically make it impossible to use big data for the public good. Humans understand through analogical reasoning. So perhaps understanding about what we want to do with big data for the public good can be understood by this analogy with medical research using leftover materials from medical procedures?
Preparing for a session of the Privacy Advisory Group of UN Global Pulse and the UN Privacy Policy Group on 17-18 April, I had cause to reflect on some moves to develop new definitions (sensitive data, meta data and micro data). I may change my mind after listening to the deliberation, but here’s my starting position: Definitions are developed with some purpose in mind. A definition that is appropriate for one purpose may not be useful for another. Definitions embody assumptions and agendas. I believe that personally identifiable information (PII), a venerable category of data deeply embedded in privacy theory and practice is the only category of data requiring hard protection.
Fernando, L., Perera, A. S., Lokanathan, S., Ghouse, A.
LIRNEasia research fellow, Dharshana Kasthurirathna, Ph.D. presented a paper, ‘Detecting Geographically Distributed Communities using Community Networks,’ at the International Workshop on Mining for Actionable Insights in Social Networks that was held in conjunction with the Tenth ACM International Web Search and Data Mining Conference in Cambridge in February 2017. The paper was co-authored by three LIRNEasia research fellows (Dharshana Kasthurirathna, Madhushi Bandara, Danaja Maldeniya) and Mahendra Piraveenan from the University of Sydney. Based on the presentation, there was an invitation to extend the paper to be submitted to a special issue of the Elsevier Information System’s journal, with a draft journal paper due in April 2017.
In July of 2016, the Global Partnership for Sustainable Development Data, announced a new multi-million dollar funding initiative to support collaborative data innovations for sustainable development. The University of Tokyo and Colombo-based LIRNEasia are among the winners in the pilot round of this initiative. Their proposal, entitled “Dynamic Census,” aims to improve the existing census approach by deriving insights from mobile operators’ call detail records (CDR). It will supplement population and housing census data by adding dynamic aspects of population distribution to changes in population distribution over time, at high frequency. More details.