Automating Extracting GIS Data from Scanned Maps – GIS Lounge

By Caitlin Dempsey | November 8, 2013

by 

he New York Public Library Labs (NYPL Labs) has posted on Github the code to its open source map-vectorizer project.  NYPL Lab’s map-vectorizer project is seeking to automate (“like OCR for maps”) the process of extracting polygon and attribute information from old scanned maps.  The code was developed with the purpose of extracting building information from New York City insurance atlases published in the 19th and early 20th centuries of which the NYPL has hundreds of containing thousands of map sheets.

As the NYPL Lab explains on the read me page for the project, the process has saved thousands of hours in creating GIS data from old scanned maps:  [I]t took NYPL staff coordinating a small army of volunteers three years to produce 170,000 polygons with attributes (from just four of hundreds of atlases at NYPL).  It now takes a period of time closer to 24 hours to generate a comparable number of polygons with some basic metadata.  

Currently, the map-vectorizer project can extract polygon shapes and color attribute information from scanned maps.  Future planned enhancements include extracting dot presence, dot count, and dot type (full vs outline).

To use the project code, the following dependencies need to be already installed on your machine: Python with OpenCVImageMagickRGIMP and GDAL Tools (full details available on the Github project page.

Read full post here. (Originally posted 3 November 2013)