For more than a year, we at LIRNEasia have been working on the analysis of images. The NYT story on Stanford researchers working on Google Street View describes the potential well.
For computers, as for humans, reading and observation are two distinct ways to understand the world, Mr. Lieberman Aiden said. In that sense, he said, “computers don’t have one hand tied behind their backs anymore.”
Text has been easier for A.I. to handle, because words have discrete characters — 26 letters, in the case of English. That makes it much closer to the natural language of computers than the freehand chaos of imagery. But image recognition technology, much of it developed by major technology companies, has improved greatly in recent years.
The Stanford project gives a glimpse at the potential. By pulling the vehicles’ makes, models and years from the images, and then linking that information with other data sources, the project was able to predict factors like pollution and voting patterns at the neighborhood level.
“This kind of social analysis using image data is a new tool to draw insights,” said Timnit Gebru, who led the Stanford research effort. The research has been published in stages, the most recent in late November in the Proceedings of the National Academy of Sciences.
The whole concept of economically and socially homogeneous “blocks” or areas defined by zip codes is foreign to developing countries. Perhaps we will become like that once the developers do their thing. But not right now. I found slum housing is every single municipal ward in my city, Colombo (except perhaps Colombo 1 who has a total population in the double digits). So we need to adapt these models with care to our circumstances.