Software Development
Data Management
For Linguist(ic)s
Contact usAbout LinguisticBits
Supporting linguist(ic)s
LinguisticBits is a subsidiary of MusicalBits GmbH. LinguisticBits is run by Dr. Thomas Schmidt.
After twenty-two years of building corpus tools, heading research projects, running corpus platforms, archives and infrastructure units and supporting linguists in their work with corpora, I found that some tasks are best tackled by research projects within academic institutions, and some are better outsourced to professional service providers. LinguisticBits is a partner for the latter.
The mission of LinguisticBits is to effectively and efficiently support (people in) research projects and data centres in technical matters around corpora and corpus workflows.
- Name:Dr. Thomas Schmidt
What I Do?
LinguisticBits can advise your research project or your institution in sustainable data management, choice of technology and other aspects of corpus work.
Workflow development
LinguisticBits can help you to setup, configure and optimise efficient, effective and sustainable workflows for your corpus from data acquisition to data dissemination.
Software development
LinguisticBits can develop software for data entry, conversion, visualisation and analysis. Java and XML technologies are our speciality. We also do Python, C#, Javascript and R.
EXMARaLDA support and training
LinguisticBits can offer your project reliable and competent support for the EXMARaLDA tools. We offer EXMARaLDA support licences and on-site or online training courses for EXMARaLDA.
Data curation
LinguisticBits can spice up your corpus data, render it consistent and standard conformant, enrich it with automatic annotation methods and make it ready for analysis and publication.
Research partner
LinguisticBits can act as a partner for your research project, supporting you in conceptualizing your proposal and work plans and taking on work packages.
About myself
Education and experience
I studied General Linguistics, Mathematics, English, French, Artificial Intelligence, Computer Science and Electrical Engineering at the University of Kaiserslautern, the Johannes Gutenberg-University Mainz, the University of Edinburgh, the FU Berlin and the Université Paris VIII (not strictly in that order, for details see my LinkedIn profile). I have an M.A. in General Linguistics, a European Master's Degree of Linguistics and an intermediate diploma (≈ B.A.) in mathematics and computer science. I got a PhD degree (summa cum laude) in Germanic Linguistics from the University of Dortmund in 2004. The title of my doctoral thesis translates as "Computer-aided transcription: Modelling and visualising spoken language by text-technological means".
I worked for the University of Hamburg, the Institute for the German Language in Mannheim, the University of Basel, the Berlin-Brandenburg Academy of Sciences, Philipps Speech Processing and the European Language Resource Distribution Agency (again, not in that order, for details see my LinkedIn profile). Before LinguisticBits, I have done freelance work for Texas Instruments and CASIO, for the Free University of Bozen, for the Universities of Ghent, Olomuc, Düsseldorf, Duisburg-Essen and Hamburg, and for the Institute for the German Language.
I spent a post-doc year at the International Computer Science Institute in Berkeley, CA, and was a visting researcher at the Middle Eastern Technical University in Ankara and at the University of Texas in Austin.
I have initiated and worked as managing director for the Hamburg Centre of Language Corpora (HZSK). I headed the Archive for Spoken German (AGD) at the Leibniz Institute for the German Language for ten years and represented it in the Committee for Data Access of the RatSWD.
Research profile
I was principal investigator in more than 10 research projects (see Projects section) and have published more than 100 research papers and books in the field of corpus linguistics and computational lexicography (see my ORCiD profile). I have acted and am acting as a reviewer for numerous workshops, conferences and journals and for the German Academic Exchange Service (DAAD), the Haut Conseil de l’évaluation de la recherche et de l’enseignement supérieur (Hcéres), the Swiss National Science Foundation (SNF) and SwissUniversities, and for the Ministry of Education, Youth and Sports of the Czech Republic. I am or was member of the scientific advisory boards of NITE, GeWiss, Camomile, KompAS, INEL, and Oral-History.Digital. Besides numerous smaller workshops, I have organised the GSCL Conference 2011 in Hamburg, and was one of the program chairs for the 58. Jahrestagung des Leibniz-Instituts für Deutsche Sprache.
I am an expert (senior/leader) Java developer (both desktop and server) and an expert in XML and surrounding technologies (XSLT, XPath etc.). I also do Python and C#,
SQL and HTML/CSS/JavaScript. I am the lead developer of
the Kicktionary, and I have developed large parts of
the Database for Spoken German
and ZuMult.
I speak and write German, English and French.

- Name:Dr. Thomas Schmidt
Contract work I do or did

News & Activities
Date(s) | Link(s) | |
26 June 2025 | Talk "Beyond words – a case study in corpus approaches to interaction and multimodality" at IPrA-Panel «Corpus Pragmatics» in Brisbane | Conference program |
25 April 2025 | Presentation of the TGDP ZuMult platform at 49th SGAS Annual Symposium, Milwaukee | Symposium website |
18 December 2024 | New EXMARaLDA version online | Blog post |
01 December 2024 | First official version of the ZuMult platform for the Texas German Dialect project online | Platform at UT Austin |
7/8 October 2024 | Workshop for the ALMA-B project at KU Eichstätt | |
27 June 2024 | Public beta version of the ZuMult platform for the Texas German Dialect project online | Platform at UT Austin |
5 June 2024 | New EXMARaLDA version online | Blog post |
May 2024 | Three EXMARaLDA workshops for a team of researchers at Universidade Federal de Minas Gerais, Brazil | |
13 March 2024 | EXMARaLDA workshop at UC Louvain-La-Neuve, part of séminaire "Récolte, nettoyage et enrichissement de corpus" | Seminar program |
31 January 2024 | Talk on 'Music, Artificial Intelligence and Linguistics' at University of Texas in Austin | Announcement |
14 November 2023 | 'Pioniergeist' Award for Musical Bits | LinkedIn post |
10/11 November 2023 | EXMARaLDA Workshop for the "Symposion Deutschdidaktik" at the University of Hildesheim | SDD website |
20 October 2023 | EXMARaLDA Support licences officially available | EXMARaLDA Blog post |
13 October 2023 | Talk "Getting ready for TGDA 2.0 – Enriching the Texas German Dialect Corpus for (comparative) corpus analyses" at German Abroad 5 | Conference website |
9 October 2023 | Data management workshop at German Abroad 5 | Conference website |
29 September 2023 | Keynote at Kick-off event "CLARIN-CH Working Group on sensitive data management" | website |
15 September 2023 | Workshop on multimodality in CMC data at the International Conference on CMC and Social Media Corpora for the Humanities (Mannheim) | Conference website |
15 September 2023 | EXMARaLDA workshop for the International Graduate School German Jordanian University / PH Freiburg | GJU homepage |
12 September 2023 | Workshop "Annotation and Interoperability" for the Mezzanine project at the University of Maribor | Project homepage |
06 August 2023 |
New publication: Christian Fandrych, Thomas Schmidt, Franziska Wallner, Kai Wörner (eds.): Zugänge zu mündlichen Korpora für DaF und DaZ: Das ZuMult-Projekt. KorDaF (Korpora Deutsch als Fremdsprache). Jahrgang 3 • Ausgabe 1 • 2023 |
Journal page |
20 July 2023 | New official EXMARaLDA version | EXMARaLDA website |
14 July 2023 | EXMARaLDA workshop for the International Graduate School German Jordanian University / PH Freiburg | GJU homepage |
22 June 2023 |
Keynote: "Manual and automated, qualitative and quantitative approaches to spoken interaction" as a contribution to the workshop "Computational and Quantitative Approaches to Multimodal Video Analysis - CAMVA 2023" at the University of Zürich |
Workshop page |
23 May 2023 |
New publication: Marc Kupietz and Thomas Schmidt (eds.): Neue Entwicklungen in der Korpuslandschaft der Germanistik. Beiträge zur IDS-Methodenmesse 2022. Tübingen: Narr Francke Attempto. |
Publisher's page |
8 May 2023 | Corpus of Spoken Spanish in Equatorial-Guinea completed | Blog post |
05 March 2023 |
Republished in Open Access: Thomas Schmidt (2005/2023): Computergestützte Transkription - Modellierung und Visualisierung gesprochener Sprache mit texttechnologischen Mitteln. Dissertation (Universität Dortmund). Frankfurt a.M.: Peter Lang / Göttingen: Verlag für Gesprächsforschung. |
Publisher's page |
10 February 2023 |
Talk (with Hanna Hedeland): "Best Practices, Werkzeuge Workflows und Standards zur Erschließung audiovisueller Sammlungen" as a contribution to the workshop "“Hört, hört!” – Zum Umgang mit Audio in den DH" at the University of Wuppertal |
Workshop program |
6/7 February 2023 | Two-day EXMARaLDA training course at the University of Basel | EXMARaLDA page |
30 January 2023 |
New publication: Arnulf Deppermann, Christian Fandrych, Marc Kupietz and Thomas Schmidt (eds.): Korpora in der germanistischen Sprachwissenschaft Mündlich, schriftlich, multimedial. Band 2022 der Reihe Jahrbuch des Instituts für Deutsche Sprache. Berlin: de Gruyter. |
DeGruyter page |
13 January 2023 |
Talk: "Accéder aux corpus oraux: méthodes et technologies" as a contribution to the conference "Qu’est-ce que (se) représenter la parole ? Hommage à Gabriel Bergounioux" at the Université d'Orléans |
Workshop page |
23 December 2022 | EXMARaLDA Christmas Previews are online. | EXMARaLDA blog post |
12 December 2022 |
Lecture: "Mündliche Korpora - Manuelle und automatisierte Herangehensweisen an Gespräche und gesprochene Sprache" in the lecture series "Computer, Mensch, Sprache – interdisziplinäre Perspektiven an der Schnittstelle Sprachforschung/Informatik" at the University of Oldenburg |
Series program |
21 October 2022 | Evaluation results of the EXMARaLDA mini survey are online. | EXMARaLDA blog post |
Research and Software Projects

Get in Touch
MusicalBits GmbH
Nahestraße 28
D-55411 Bingen
+49 6721 3096931