On two occasions I have been asked [by members of Parliament], “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
— Charles Babbage, Passages from the Life of a Philosopher (1864), Chap. 5, 59.
We live in a society where machines, algorithms and humans intertwine; where the “consensual hallucination” of cyberspace is no longer a separate part of our lives, but a swamp through which we wade, leaving data trails for the world to see; where the wrong people put the wrong figures into the wrong machines and wonder why the answer isn’t right. Terms like “Big Data” and “AI” have become Rorschach blots on the public consciousness
LIRNEasia’s role is to participate in the public policy dialogue around our algorithmically-inclined society with critical research and technical expertise. Since 2013, as cross-disciplinary team of data scientists, lawyers, and social scientists, we have conducted our own analyses, engaged deeply with policy makers and with private, data-heavy organizations.
Rohan Samarajiva and Ramathi Bandaranayake presented preliminary findings from our work on risk communication during COVID-19.
This policy brief details guidance on making decisions in a pandemic.
Wijeratne, Y., de Silva, N. (2020). Sinhala Language Corpora and Stopwords from a Decade of Sri Lankan Facebook. LIRNEasia. Last updated: July 13, 2020. This paper presents two colloquial Sinhala language corpora extracted from Facebook, as well as a list of algorithmically derived stopwords. Corpus-Alpha The larger of the two corpora spans trilingual text posted by 533 Sri Lankan Facebook pages, including politics, media, celebrities, and other categories, from 2010 to 2020. It contains 28,825,820 to 29,549,672 words of text, mostly in Sinhala, English and Tamil (the three main languages used in Sri Lanka). It contains URLs, punctuation and other noise, making it more suitable for discourse analysis and the study of codemixing in colloquial Sinhala. Corpus-Sinhala-Redux The smaller corpus amounts to 5,402,76 words of only Sinhala text extracted from Corpus-Alpha. It has been cleaned of URLs, punctuation and noise. Both corpora have markers for their date of creation, page of origin, and content type. License These datasets are released under the principles of Open Access. As such, this work is licensed under a Creative Commons 4.0 CC BY licence: you may distribute, remix, adapt, and build upon this work, even commercially, as long as you credit the authors for […]
A research brief which explores the key data sources, algorithmic techniques and roadblocks in applying remote sensing techniques for development.
A white paper exploring how bias in algorithms and data affect development problems, especially when they interact with socio-legal systems
This tour d’horizon examines the possible of uses of data to help stop or slow the spread of COVID-19 directly. It gives weight to what can be done in the short term.
A research paper exploring an alternative approach to address the concern of privacy in sharing big data datasets by generating privacy-preserving artificial call detail records (CDRs) in accordance with the desired macro features of the dataset.
A whitepaper outlining the development of an alternative socioeconomic index for Sri Lanka, using principal component analysis (PCA) and publicly available census data