HyperAI
Back to Headlines

Researchers Use Machine Learning to Ethically Automate Early Modern Text Transcription

a day ago

In the past two decades, mass digitization has revolutionized scholarly research, enabling faster and more efficient access to historical texts. Digital transcriptions allow scholars to search for specific keywords, saving valuable time and breaking the barriers that once limited research to physical archives and libraries. However, the increased reliance on digital transcriptions has raised ethical concerns about the labor involved in creating them. A recent article in The Sixteenth Century Journal, authored by Serena Strecker and Kimberly Lifton, offers strategies for researchers to ethically obtain transcriptions of digitized early modern texts. The article, titled "Unlocking the Digitized Archive of Early Modern Print: The Automatic Transcription of Early Modern Printed Books," starts with a concise overview of the two main types of transcription software. Optical Character Recognition (OCR) software, while effective for transcribing late 19th-century and 20th-century works, struggles with the irregularities found in early modern prints. This limitation has led researchers to turn to Handwritten Text Recognition (HTR) technology. One of the leading HTR tools, Transkribus, provides users with the option to utilize publicly available transcription models or to create and train their own custom models. Strecker and Lifton detail the process of using Transkribus to generate highly accurate transcriptions, noting that the software can transform the workflow for early modern scholars. They tested various HTR models on pages from four 16th-century exempla collections, highlighting Transkribus's capability to tailor transcription models in just five basic steps. By leveraging Transkribus's public models to create training data, researchers can develop their own precise transcription models. This approach not only enhances efficiency but also minimizes the need for outsourced labor, such as that of graduate students or workers in the Global South, which the authors identify as problematic due to potential inequities and ethical issues. "The accurate and automated transcription of early modern print is no longer a distant goal but a present reality," Strecker and Lifton assert. They emphasize the importance of balancing human labor and machine learning technology as the field of early modern studies evolves. "Scholars must insist on ethical labor practices to avoid exacerbating inequalities within the academic hierarchy or perpetuating the legacies of colonialism." Their study underscores that ethical considerations should guide the adoption and advancement of machine learning technologies in academic research, ensuring that technological progress does not come at the expense of fair and equitable practices. For more information, see the full article: Serena Strecker et al., "Unlocking the Digitized Archive of Early Modern Print: The Automatic Transcription of Early Modern Printed Books," The Sixteenth Century Journal (2025). DOI: 10.1086/735052.

Related Links