University Consortium for Geographic Information Science
Prev MonthPrev Month Next MonthNext Month
Machines Reading Maps
Tuesday, May 04, 2021, 2:00 PM - 3:00 PM EDT
Category: Webinars

Machines Reading Maps

Description: Historical maps contain detailed geographic information difficult to find elsewhere, but they typically exist as scanned images without searchable metadata. Existing approaches for making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate their metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., “Black” and “Mountain” vs. “Black Mountain”). Also, these recognized words are plain text and do not have semantic labels (e.g., place vs. road names). In an ongoing project, collaborating with the Alan Turing Institute, the British Library, the Library of Congress, and the University of Southern California Digital Library, we are developing a machine-learning map processing pipeline that automatically reads text from thousands of scanned historical maps and makes that text meaningful. The machine learning pipeline will transform how cultural heritage institutions can enrich and expose metadata about their digitized map collections. As a preliminary work for this project, this talk will present an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata linked to large external geospatial knowledge bases. The linked metadata support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Google AI Kartta Labs project at  https://github.com/kartta-labs/Project.

Presenter: Yao-Yi Chiang is an Associate Professor (Research) of Spatial Sciences in the Spatial Sciences Institute and Associate Director of the Data Science Institute, University of Southern California, Viterbi School of Engineering. His general area of research is artificial intelligence and data science, with a focus on information integration and spatial data analytics. He develops computer algorithms and applications that discover, collect, fuse, and analyze data from heterogeneous sources to solve real world problems. Chiang is an expert in digital map processing, pattern recognition, and geospatial information systems (GIS), and predictive analytics. In addition, he teaches data mining, spatial databases, and mobile GIS.

A recording of this presentation can be found here.