Nuremberg Letterbooks: A Multi-Transcriptional Dataset of Early 15th Century Manuscripts for Document Analysis.

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Nina Brolich, Anna Bub, Simon Bürcky, Vincent Christlein, Peter Fleischmann, Mechthild Habermann, Klaus Herbers, Julian Krenz, Andreas Maier, Martin Mayr, Katharina Neumeier

Ngôn ngữ: eng

Ký hiệu phân loại: 133.594 Types or schools of astrology originating in or associated with a

Thông tin xuất bản: England : Scientific data , 2025

Mô tả vật lý:

Bộ sưu tập: NCBI

ID: 746438

Most datasets in the field of document analysis utilize highly standardized labels, which, while simplifying specific tasks, often produce outputs that are not directly applicable to humanities research. In contrast, the Nuremberg Letterbooks dataset, which comprises historical documents from the early 15th century, addresses this gap by providing multiple types of transcriptions and accompanying metadata. This approach allows for developing methods that are more closely aligned with the needs of the humanities. The dataset includes 4 books containing 1711 labeled pages written by 10 scribes. Three types of transcriptions are provided for handwritten text recognition: Basic, diplomatic, and regularized. For the latter two, versions with and without expanded abbreviations are also available. A combination of letter ID and writer ID supports writer identification due to changing writers within pages. Additionally, we provide metadata, including line bounding boxes and text regions. In the technical validation, we established baselines for various tasks, demonstrating data consistency and providing benchmarks for future research to build upon.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 36225755 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH