A new collaboration between University Libraries and the Department of History at Virginia Tech is opening the doors to the past by making early 20th century real estate documents searchable.

Associate Professor LaDale Winling is leading research to track the path of housing racial covenants in early 20th century Chicago. To do this, Winling needs access to the text of hundreds of old real estate covenants held in archives. The challenge is that these documents have only been preserved physically and cannot be searched digitally.

This is where University Libraries steps in. Through optical character recognition (OCR) technology, library experts are extracting the text from scanned images of these documents, improving readability, and making the contents searchable. University Libraries Data Informatics Consultant Chreston Miller is heading up the effort.

"I'm passionate about finding solutions to challenging problems and making an impact," Miller said. "It was extremely rewarding to discover a way to process these scanned documents and extract as much text as possible. Now we can help unlock insights into key questions about our country's past."

Racial covenants uncovered

Real estate and racial segregation are tightly linked in American society. “There have been numerous ways that this exclusion has been wrought upon the landscapes and communities of American cities,” said Winling. “I am someone who wants to shine a bright light on these injustices and to show that the way things are isn’t the way things have to be. Segregation was actively created, practiced, and reinforced in numerous ways and we have to be just as active and intentional to understand how it came to be, and just as active and intentional in order to undo the work of segregation.”

The history department and University Libraries have long been collaborators on digital projects. “This is a continuation of a great relationship,” said Winling. “This kind of collaboration with the library enhances our ability to take on big projects and to explore important research topics in a variety of ways that would not be possible with individual research alone.”

This is an example of how text is extracted and made more readable.
Optical character recognition makes text more readable and searchable.

No needle in a haystack

The bulk of the project has concentrated on Chicago, but Winling has linked up with other researchers across the country. “They are everywhere,” said Winling. “I have graduate students researching locally and we have even found dozens of these in Montgomery County [in Virginia]. There is almost no community that existed in the first half of the 20th century untouched by racial covenants.”

The widespread nature of this project makes it difficult to identify covenants systematically. “They are so quotidian, they do not stand out,” said Winling. “It’s not just a needle in a haystack, they are hay in a haystack.” 

This challenge alone led Winling to reach out to Miller. “Having large document collections meant that we would have to develop some automated ways of doing some research, because we could not look at every single document by eye to draw our conclusions,” said Winling. “Thus OCR, natural language processing, and visualization have become a major part of our research process.” 

Obstacles of digitizing old documents

The project has had its challenges. Although the documents are scanned at a high-resolution, the quality of the text in the old documents makes accurate optical character recognition difficult. Because of age, there is fading that makes lines of letters thin and sometimes very weak or even creates breaks. The background often is not white or near white, which means less contrast. However, Miller has managed to develop customized pre-processing methods for the images to improve text extraction. He has also secured access to the high-accuracy optical character recognition program John Snow Labs.

The completion of this project will have wide-ranging effects. Most directly, it will provide Winling the information he needs to complete his research into housing segregation. More broadly, it showcases innovative digital solutions from University Libraries to open access to primary sources. This benefits the research community, the university, and the general public.

As for what lies ahead, this is just the start. "Figuring out solutions for digitizing historical documents paves the way for answering so many key questions about our shared history," explains Miller.

Related news

Research collaboration uncovers stories of racial discrimination in housing covenants

Share this story