Transcription+
The project Transcription+ (full title: ISO 24624:2016 - Transcription of spoken language: Resources, Documentation and multilingual demo corpus) is a cooperation project within the Text+ consortium of the National Research Data Infrastructure (NFDI).
The project is carried out between January 2026 and September 2026 by Thomas Schmidt at the Institute for Communication Studies of the University Duisburg-Essen.
The goal of the project is to improve documentation and tool support for the standard ISO 24624:2016 - Transcription of spoken language.
Documentation & Resources
Existing documentation of the standard will be revised, completed and published on a website. Resources like XML schemas and XSLT stylesheets will be gathered from different sources, harmonized, documented and assembled in a GitHub repository.
Demo Corpus
An improved version of the multilingual EXMARaLDA demo corpus will be made available in the standard format. The demo corpus will also be made accessible through an instance of the ZuMult platform.
Web services
An extended version of the TEILIcht web services (Fisseni/Schmidt 2020) will be published. The web services serve to convert, normalize, annotate or visualize transcript documents in the ISO/TEI format.
The project starts in January 2026. Watch this page for news. Feel free to contact me for any questions.
References
Schmidt, Thomas (2011): A TEI-based approach to standardising spoken language transcription. In: Journal of the Text Encoding Initiative 1. 2011.
Bernhard Fisseni, Thomas Schmidt (2020): CLARIN Web Services for TEI-annotated Transcripts of Spoken Language. In: Selected Papers from the CLARIN Annual Conference 2019. Linköping University Electronic Press: Linköping, pp. 12–22.
Hanna Hedeland, Thomas Schmidt (2022): The TEI-based ISO Standard ‘Transcription of spoken language’ as an Exchange Format within CLARIN and beyond. In: Selected Papers from the CLARIN Annual Conference 2021. Linköping University Electronic Press: Linköping, pp. 34–45.
Werthmann, Antonina (2025): From spoken language data to TEI-based ISO standard. In: Bański, Piotr, Heid, Ulrich, Herzberg, Laura (eds.): Harmonizing language data. Standards for linguistic resources. (= Digital Linguistics 4). Berlin / Boston: de Gruyter. S. 145-168.