RCR Forum: Digital Humanities Research: Acquiring and Preparing a Corpus of Texts (GS717.07)

Name: RCR Forum: Digital Humanities Research: Acquiring and Preparing a Corpus of Texts (GS717.07)
Start: 2023-03-21T09:30:00-04:00

Before you can undertake automated text analysis, it's necessary to obtain a corpus of digitized texts and, in many instances, take steps to prepare them for further processing. This hands-on digital humanities workshop focuses on the technical dimensions of corpus development. We will explore the risks and benefits of optical character recognition (OCR); file formatting and naming issues; organization strategies for large corpora; problems of data cleaning and preparation; common sources for textual research data; and common legal concerns around the use of textual corpora.

More information: Will Shaw william.shaw@duke.edu

RCR_forums: Digital Humanities Research - 3.21.2023