Abstract
We present a public text recognition PyLaia model accompanied by a baseline model for the layout of community regulations in Yiddish and a dataset for Yiddish texts printed in Vaybertaytsh typeface. The model was built using legal documents, namely regulations written by the Ashkenazi Jewish community in Amsterdam during the 18th century. The necessity of such a model for Vaybertaytsh typeface stems from the substantial differences between it and other Yiddish or Hebrew typefaces. Existing text recognition models for Yiddish are dedicated to handwritten texts or substantially other typefaces, followed by a short description of the dataset, its unique characteristics, and how it can be used further. The process of training the text recognition model is explained, and challenges encountered are specified, as well as strategies for coping with them. The model is publicly accessible via Transkribus, and the complete dataset used to train the model is available via Figshare. The models and dataset offer valuable contributions to the digital humanities, specifically for research on linguistics, Jewish History and related fields.
Original language | English |
---|---|
Article number | 35 |
Pages (from-to) | 1-10 |
Number of pages | 10 |
Journal | Journal of Open Humanities Data |
Volume | 10 |
DOIs | |
Publication status | Published - 6 May 2024 |
Bibliographical note
Publisher Copyright: © 2024 The Author(s).Research programs
- RSM ORG