The full title of Mauricio's talk is "NYPL Labs Building Inspector: Extracting Data from Historic Maps."
From the OpenVis Conf website:
Mauricio enjoys playing with code, objects and all things interactive. He is currently an interaction designer at NYPL Labs, The New York Public Library’s digital innovation unit.
The New York Public Library map collection contains hundreds of atlases and maps spanning several centuries. Among them are US insurance atlases from the 19th and early 20th centuries. These atlases offer a wealth of geographic information about buildings in New York City such as addresses, building materials, height and use. However, this data is currently 'trapped' in these atlases, unavailable for public research outside of the NYPL map room.
The Building Inspector is the latest tool by NYPL Labs to extract data from these atlases through a combination of computational (vectorization, computer vision, alpha shapes) and human (crowdsourcing, game design concepts) processes.
This session will describe the workflow & computational methods behind the Building Inspector and provide additional information on uses of this data as well as the implications of open access to historical data.
It's a lot of work to manually extract data from printed historical maps.
Digitizing the maps is a first step, to allow people not located in NYC to have access to the maps. The second step is "geo-rectification," or making the maps match Open Street Map. About 120,000 building footprints were produced in three years by staff and volunteers--but that covered just one historical year of maps. Asking around, they eventually found someone--Mike Resig, the brother of John Resig who also presented this morning, in fact--who worked to develop an automated way to develop the maps.
What is a building, in these maps? It has these qualities:
But the maps aren't perfect, with gaps, artifacts, and wear and tear on the paper. First, they made the map black and white.
Then they used a Python script to interpret the lines as polygons, and then they simplified these polygons. They then use a set of methods (brute force) to create a set of points that describe the outline of the building.
Once this is done, they sample the color within each building, find the average color and compare it to the color of paper in order to determine where there is shading.
This results in a set of polygons, which has errors but which is fast and accurate enough more often than not.
To verify they data, the work is then crowdsourced, and each footprint is validated by multiple inspectors--and this was the point where many of us heard of the project. The building inspection can be done on the web or on a smartphone with a simple 'YES,' 'NO,' 'FIX' choices. There's even a button that lets you tweet how many buildings you've checked, and people will sometimes check tens or hundreds of thousands of buildings.
Based on the success of this work, New York Public Library is now thinking about what else they can work on. They decided to crowdsource fixing the footprints and reading addresses, as classifying the color.
Thanks for reading!
Lectures and exhibitions, news and events, now primarily from the Bay Area! Please note that all live blogs are abridged and approximate. If you want to see exactly what happened, in many cases a video of the event is posted online by the event's hosts.