More and more historical texts are becoming available in digital form. Digitization of paper documents is motivated by the aim of preserving cultural heritage and making it more accessible, both to laypeople and scholars. As digital images cannot be searched for text, digitization projects increasingly strive to create digital text, which can be searched and otherwise automatically processed, in addition to facsimiles. Indeed, the emerging field of digital humanities heavily relies on the availability of digital text for its studies. Together with the increasing availability of historical texts in digital form, there is a growing interest in applying natural language processing (NLP) methods and tools to historical texts. However, the specific linguistic properties of historical texts--the lack of standardized orthography in particular--pose special challenges for NLP 1. Introduction -- 1.1 Historical languages and modern languages -- 1.2 Intended audience -- 1.3 Outline -- 2. NLP and digital humanities -- 2.1 Origins of digital humanities -- 2.2 Convergence of NLP and digital humanities -- 2.3 Summary -- 3. Spelling in historical texts -- 3.1 The role of orthography in NLP -- 3.2 Spelling and historical texts -- 3.3 Summary -- 4. Acquiring historical texts -- 4.1 Digitization of historical texts -- 4.2 Scanning -- 4.3 Optical character recognition -- 4.4 Manual text entry -- 4.5 Computer-assisted transcription -- 4.6 Summary -- 5. Text encoding and annotation schemes -- 5.1 Unicode for historical text -- 5.2 TEI for historical texts -- 5.3 Summary -- 6. Handling spelling variation -- 6.1 Spelling canonicalization -- 6.2 Edit distance -- 6.3 Approaches for handling spelling variation in historical texts -- 6.4 Detecting and correcting OCR errors -- 6.5 Limits of spelling canonicalization -- 6.6 Summary -- 7. NLP tools for historical languages -- 7.1 Part-of-speech tagging -- 7.2 Lemmatization and morphological analysis -- 7.3 Syntactic parsing -- 7.4 Summary -- 8. Historical corpora -- 8.1 Arabic -- 8.2 Chinese -- 8.3 Dutch -- 8.4 English -- 8.5 French -- 8.6 German -- 8.7 Nordic languages -- 8.8 Latin and ancient Greek -- 8.9 Portuguese -- 8.10 Summary -- 9. Conclusion -- Bibliography -- Author's biography |