Software Development

Data Management





For Linguist(ic)s

Contact us

About LinguisticBits

Supporting linguist(ic)s

LinguisticBits is a subsidiary of MusicalBits GmbH. LinguisticBits is run by Dr. Thomas Schmidt.

After twenty-two years of building corpus tools, heading research projects, running corpus platforms, archives and infrastructure units and supporting linguists in their work with corpora, I found that some tasks are best tackled by research projects within academic institutions, and some are better outsourced to professional service providers. LinguisticBits is a partner for the latter.

The mission of LinguisticBits is to effectively and efficiently support (people in) research projects and data centres in technical matters around corpora and corpus workflows.

ORCID profile


What I Do?

About myself

Education and experience


I studied General Linguistics, Mathematics, English, French, Artificial Intelligence, Computer Science and Electrical Engineering at the University of Kaiserslautern, the Johannes Gutenberg-University Mainz, the University of Edinburgh, the FU Berlin and the Université Paris VIII (not strictly in that order, for details see my LinkedIn profile). I have an M.A. in General Linguistics, a European Master's Degree of Linguistics and an intermediate diploma (≈ B.A.) in mathematics and computer science. I got a PhD degree (summa cum laude) in Germanic Linguistics from the University of Dortmund in 2004. The title of my doctoral thesis translates as "Computer-aided transcription: Modelling and visualising spoken language by text-technological means".


I worked for the University of Hamburg, the Institute for the German Language in Mannheim, the University of Basel, the Berlin-Brandenburg Academy of Sciences, Philipps Speech Processing and the European Language Resource Distribution Agency (again, not in that order, for details see my LinkedIn profile). Before LinguisticBits, I have done freelance work for Texas Instruments and CASIO, for the Free University of Bozen, for the Universities of Ghent, Olomuc, Düsseldorf, Duisburg-Essen and Hamburg, and for the Institute for the German Language.

I spent a post-doc year at the International Computer Science Institute in Berkeley, CA, and was a visting researcher at the Middle Eastern Technical University in Ankara and at the University of Texas in Austin.

I have initiated and worked as managing director for the Hamburg Centre of Language Corpora (HZSK). I headed the Archive for Spoken German (AGD) at the Leibniz Institute for the German Language for ten years and represented it in the Committee for Data Access of the RatSWD.

Research profile

I was principal investigator in more than 10 research projects (see Projects section) and have published more than 100 research papers and books in the field of corpus linguistics and computational lexicography (see my ORCiD profile). I have acted as a reviewer for numerous workshops, conferences and journals and for the German Academic Exchange Service (DAAD), the Haut Conseil de l’évaluation de la recherche et de l’enseignement supérieur (Hcéres), the Swiss National Science Foundation (SNF) and SwissUniversities, and for the Ministry of Education, Youth and Sports of the Czech Republic. I am or was member of the scientific advisory boards of NITE, GeWiss, Camomile, KompAS, INEL, and Oral-History.Digital. Besides numerous smaller workshops, I have organised the GSCL Conference 2011 in Hamburg, and was one of the program chairs for the 58. Jahrestagung des Leibniz-Instituts für Deutsche Sprache.


I am an experienced Java developer (both desktop and server) and an expert in XML and surrounding technologies (XSLT, XPath etc.). I also do Python and C#, SQL and HTML/CSS/JavaScript. I am the lead developer of EXMARaLDA, the Kicktionary, and I have developed large parts of the Database for Spoken German and ZuMult.
I speak and write German, English and French.


Contract work I do or did

Card image
Leibniz-Institute for the German Language (IDS)

The development of the transcription editor FOLKER and the annotation tool OrthoNormal was originally commissioned by the FOLK project at the IDS.

Card image
University of Hamburg

For the HZSK at the University of Hamburg, I integrated CLARIN web services into the EXMARaLDA system and worked out a business model for EXMARaLDA.

Card image
Heinrich Heine-Universität Düsseldorf

For the Institute for German Linguistic at HHU, I helped develop, curate and make accessible online the "Düsseldorfer Gesprächskorpus".

Card image
Universität Duisburg-Essen

For a project at the German Institute of the University of Duisburg-Essen, I developed extensions and output methods for the EXMARaLDA Partitur-Editor and the EXAKT tool.

Card image
TU Dortmund

For the MuM-Multi project at the Universities of Dortmund and Hamburg, I developped a conversion workflow for data transcribed and annotated in Transana.

Card image
Free University of Bozen

For the Komma Corpus at the Free University of Bozen, I developed a conversion, tokenisation and part-of-speech tagging workflow for data transcribed in ELAN.

Card image
Universiteit Ghent

For the corpus of Southern Dutch Dialects at Ghent University, I prototyped a Part-Of-Speech tagging workflow.

Card image
University of Basel

For RISE at the University of Basel, I provide EXMARaLDA support and an expertise on ASR technology for the humanities and social sciences.

Card image
Endangered Languages Archive

For the Endangered Languages Archive at BBAW, I provide consulting for the development of a Latin American portal.

Card image

I act as an expert reviewer for SwissUniversities' Open Science funding programs.

Card image
University of Texas, Austin

I work with the Texas German Dialect Archive at the University of Texas in Austin to enhance and curate the corpus data of the Texas German Dialect Project and

Card image
University of Basel

For the Institute of Romance Languages at the University of Basel, I help develop and curate a socio-linguistic corpus of Spanish in Equatorial Guinea.


News & Activities

Date(s) Link(s)
22 June 2023 Keynote:
"Manual and automated, qualitative and quantitative approaches to spoken interaction"
as a contribution to the workshop "Computational and Quantitative Approaches to Multimodal Video Analysis - CAMVA 2023" at the University of Zürich
Workshop page
05 March 2023 Republished in Open Access:
Thomas Schmidt (2005/2023):
Computergestützte Transkription - Modellierung und Visualisierung gesprochener Sprache mit texttechnologischen Mitteln.
Dissertation (Universität Dortmund). Frankfurt a.M.: Peter Lang / Göttingen: Verlag für Gesprächsforschung.
Publisher's page
10 February 2023 Talk (with Hanna Hedeland):
"Best Practices, Werkzeuge Workflows und Standards zur Erschließung audiovisueller Sammlungen"
as a contribution to the workshop "“Hört, hört!” – Zum Umgang mit Audio in den DH" at the University of Wuppertal
Workshop program
6/7 February 2023 Two-day EXMARaLDA training course at the University of Basel EXMARaLDA page
30 January 2023 New publication:
Arnulf Deppermann, Christian Fandrych, Marc Kupietz and Thomas Schmidt (eds.):
Korpora in der germanistischen Sprachwissenschaft Mündlich, schriftlich, multimedial.
Band 2022 der Reihe Jahrbuch des Instituts für Deutsche Sprache. Berlin: de Gruyter.
DeGruyter page
13 January 2023 Talk:
"Accéder aux corpus oraux: méthodes et technologies"
as a contribution to the conference "Qu’est-ce que (se) représenter la parole ? Hommage à Gabriel Bergounioux" at the Université d'Orléans
Workshop page
23 December 2022 EXMARaLDA Christmas Previews are online. EXMARaLDA blog post
12 December 2022 Lecture:
"Mündliche Korpora - Manuelle und automatisierte Herangehensweisen an Gespräche und gesprochene Sprache"
in the lecture series "Computer, Mensch, Sprache – interdisziplinäre Perspektiven an der Schnittstelle Sprachforschung/Informatik" at the University of Oldenburg
Series program
21 October 2022 Evaluation results of the EXMARaLDA mini survey are online. EXMARaLDA blog post


Research and Software Projects


Get in Touch


MusicalBits GmbH
Franz-Kirsten-Str. 1
D-55411 Bingen

+49 6721 3096931

Follow Me

Send us a note