Software Development

Data Management

Consulting

Training

Support

LinguisticBits

For Linguist(ic)s

About LinguisticBits

Supporting linguist(ic)s

LinguisticBits is a subsidiary of MusicalBits GmbH. LinguisticBits is run by Dr. Thomas Schmidt.

After twenty-two years of building corpus tools, heading research projects, running corpus platforms, archives and infrastructure units and supporting linguists in their work with corpora, I found that some tasks are best tackled by research projects within academic institutions, and some are better outsourced to professional service providers. LinguisticBits is a partner for the latter.

The mission of LinguisticBits is to effectively and efficiently support (people in) research projects and data centres in technical matters around corpora and corpus workflows.

Name:Dr. Thomas Schmidt
Email:thomas@linguisticbits.de

ORCID profile

Services

What I Do?

Consulting

LinguisticBits can advise your research project or your institution in sustainable data management, choice of technology and other aspects of corpus work.

Workflow development

LinguisticBits can help you to setup, configure and optimise efficient, effective and sustainable workflows for your corpus from data acquisition to data dissemination.

Software development

LinguisticBits can develop software for data entry, conversion, visualisation and analysis. Java and XML technologies are our speciality. We also do Python, C#, Javascript and R.

EXMARaLDA support and training

LinguisticBits can offer your project reliable and competent support for the EXMARaLDA tools. We offer EXMARaLDA support licences and on-site or online training courses for EXMARaLDA.

Data curation

LinguisticBits can spice up your corpus data, render it consistent and standard conformant, enrich it with automatic annotation methods and make it ready for analysis and publication.

Research partner

LinguisticBits can act as a partner for your research project, supporting you in conceptualizing your proposal and work plans and taking on work packages.

About myself

Education and experience

Education

I studied General Linguistics, Mathematics, English, French, Artificial Intelligence, Computer Science and Electrical Engineering at the University of Kaiserslautern, the Johannes Gutenberg-University Mainz, the University of Edinburgh, the FU Berlin and the Université Paris VIII (not strictly in that order, for details see my LinkedIn profile). I have an M.A. in General Linguistics, a European Master's Degree of Linguistics and an intermediate diploma (≈ B.A.) in mathematics and computer science. I got a PhD degree (summa cum laude) in Germanic Linguistics from the University of Dortmund in 2004. The title of my doctoral thesis translates as "Computer-aided transcription: Modelling and visualising spoken language by text-technological means".

Experience

I worked for the University of Hamburg, the Institute for the German Language in Mannheim, the University of Basel, the Berlin-Brandenburg Academy of Sciences, Philipps Speech Processing and the European Language Resource Distribution Agency (again, not in that order, for details see my LinkedIn profile). I am currently employed with a 50% position by the University of Duisburg-Essen in the ZAT project. Before LinguisticBits, I have done freelance work for Texas Instruments and CASIO, for the Free University of Bozen, for the Universities of Ghent, Olomuc, Düsseldorf, Duisburg-Essen and Hamburg, and for the Institute for the German Language.

I spent a post-doc year at the International Computer Science Institute in Berkeley, CA, and was a visting researcher at the Middle Eastern Technical University in Ankara and at the University of Texas in Austin.

I have initiated and worked as managing director for the Hamburg Centre of Language Corpora (HZSK). I headed the Archive for Spoken German (AGD) at the Leibniz Institute for the German Language for ten years and represented it in the Committee for Data Access of the RatSWD.

Research profile

I was principal investigator in more than 10 research projects (see Projects section) and have published more than 100 research papers and books in the field of corpus linguistics and computational lexicography (see my ORCiD profile). I have acted and am acting as a reviewer for numerous workshops, conferences and journals and for the German Academic Exchange Service (DAAD), the Haut Conseil de l’évaluation de la recherche et de l’enseignement supérieur (Hcéres), the Swiss National Science Foundation (SNF) and SwissUniversities, and for the Ministry of Education, Youth and Sports of the Czech Republic. I am or was member of the scientific advisory boards of NITE, GeWiss, Camomile, KompAS, INEL, and Oral-History.Digital. Besides numerous smaller workshops, I have organised the GSCL Conference 2011 in Hamburg, and was one of the program chairs for the 58. Jahrestagung des Leibniz-Instituts für Deutsche Sprache.

Skills

I am an expert (senior/leader) Java developer (both desktop and server) and an expert in XML and surrounding technologies (XSLT, XPath etc.). I also do Python and C#, SQL and HTML/CSS/JavaScript. I am the lead developer of EXMARaLDA, the Kicktionary, and I have developed large parts of the Database for Spoken German and ZuMult.
I speak and write German, English and French.

LinguisticBits

Name:Dr. Thomas Schmidt
Email:thomas@linguisticbits.de

ORCID profile LinkedIn profile Profile @ IDS Profile @ Kicktionary

References

Contract work I do or did

Leibniz Institute for the German Language

News

News & Activities

Date(s)		Link(s)
June 2026	Talk at panel "Data in CA: Preparing, sharing, and reusing recordings from naturally occurring interaction " at ICCA in Edmonton, Alberta (CA)	Conference website
19/20 March 2026	Talk at workshop "Listening to the Past: Digital Approaches to the History of Sound and Language" at GHI Washington	Workshop page
11 March 2026	Contribution "Die ZuMult-Plattform als Instrument für sprachvergleichende Analysen auf mündlichen Daten at IDS Annual Conference (project and methods fair).	Conference website
05 March 2026	Talk "Transcription+ - Dokumentation und Tool-Support für den ISO-Standard zur Transkription audiovisueller Daten" at Workshop "Text+ Collections: Sprach- und Textdaten für Gesellschaft, Gesundheit und Medizin.	Workshop website
12 January 2026	New version of ZuMult@TGDP at UT Austin	Blog post
8 January 2026	New official EXMARaLDA version	Blog post
1 January 2026	Start of NFDI/Text+ cooperation project: Transcription+	Preliminary project page
8 December 2025	New publication: Schmidt, Thomas (2025): Représenter et accéder à la parole dans les corpus oraux : diversification et adaptation des méthodes et technologies.	Volume at DeGruyter
1 December 2025	New publication: Frick, Elena and Schmidt, Thomas. "Querying spoken language data".	Paper in Open Access
1 October 2025	Talk "Putting things on top of other things -The ZuMult platform for multimodal corpora and its ecosystem" at CLARIN Annual Conference 2025, Vienna	Conference website
26 September 2025	Poster at Market of Opportunities at the NFDI4Culture Community Plenary 5, Mainz	Plenary website
24 September 2025	Poster "Ein Multi-Tool für den didaktischen und linguistischen Zugang zu Korpora" at Conference 'Große Lernerkorpora - Möglichkeiten und Grenzen', Leipzig	Conference website
27 - 29 August 2025	Participation in Scoping Workshop "Künstliche Intelligenz und regionale Identität – Potenziale der Dialektologie im Zeitalter digitaler Transformation”, Hannover
21 August 2025	Talk "Complex Interactional Corpora: Workflows and Platform for video-based Studies of Embodied Interaction and Human-Machine Interaction" at the International Workshop of Embodied Interaction & Embodied Intelligence, Essen
10 July 2025	Poster "Open Multimodal data and qualitative exploration of audiovisual language corpora - the ZuMult corpus platform in 2025" at Workshop "Corpus linguistics 2040: Which data, which methods, which models?", Mannheim	Poster on Zenodo
26 June 2025	Talk "Beyond words – a case study in corpus approaches to interaction and multimodality" at IPrA-Panel «Corpus Pragmatics» in Brisbane	Conference program
25 April 2025	Presentation of the TGDP ZuMult platform at "Varietätenkontakt: Phänomene - Methoden - Theorien" (TU Dortmund)	Conference website
25 April 2025	Presentation of the TGDP ZuMult platform at 49th SGAS Annual Symposium, Milwaukee	Symposium website
16 April 2025	Talk "Speeding up the game - The Kicktionary and AI Lexicography" at UT Austin	Talk announcement
11 April 2025	New Publication "Sprachkorpora im Deutschunterricht"	Publisher's website
20 March 2025	Poster Presentation "Reading, listening to and watching concordances of audiovisual interaction corpora" with Elena Frick at Symposium "Symposium: Reading Concordances in the 21st Century"	Poster on Zenodo
11 March 2025	Talk with Atiba Pertila at "Kick-Off Open Up - New Research Spaces for the Humanities and Cultural Studies" (Volkswagen Foundation)	Event booklet
18 December 2024	New EXMARaLDA version online	Blog post
01 December 2024	First official version of the ZuMult platform for the Texas German Dialect project online	Platform at UT Austin
7/8 October 2024	Workshop for the ALMA-B project at KU Eichstätt
27 June 2024	Public beta version of the ZuMult platform for the Texas German Dialect project online	Platform at UT Austin
5 June 2024	New EXMARaLDA version online	Blog post
May 2024	Three EXMARaLDA workshops for a team of researchers at Universidade Federal de Minas Gerais, Brazil
13 March 2024	EXMARaLDA workshop at UC Louvain-La-Neuve, part of séminaire "Récolte, nettoyage et enrichissement de corpus"	Seminar program
31 January 2024	Talk on 'Music, Artificial Intelligence and Linguistics' at University of Texas in Austin	Announcement
14 November 2023	'Pioniergeist' Award for Musical Bits	LinkedIn post
10/11 November 2023	EXMARaLDA Workshop for the "Symposion Deutschdidaktik" at the University of Hildesheim	SDD website
20 October 2023	EXMARaLDA Support licences officially available	EXMARaLDA Blog post
13 October 2023	Talk "Getting ready for TGDA 2.0 – Enriching the Texas German Dialect Corpus for (comparative) corpus analyses" at German Abroad 5	Conference website
9 October 2023	Data management workshop at German Abroad 5	Conference website
29 September 2023	Keynote at Kick-off event "CLARIN-CH Working Group on sensitive data management"	Clarin.ch website
15 September 2023	Workshop on multimodality in CMC data at the International Conference on CMC and Social Media Corpora for the Humanities (Mannheim)	Conference website
15 September 2023	EXMARaLDA workshop for the International Graduate School German Jordanian University / PH Freiburg	GJU homepage
12 September 2023	Workshop "Annotation and Interoperability" for the Mezzanine project at the University of Maribor	Project homepage
06 August 2023	New publication: Christian Fandrych, Thomas Schmidt, Franziska Wallner, Kai Wörner (eds.): Zugänge zu mündlichen Korpora für DaF und DaZ: Das ZuMult-Projekt. KorDaF (Korpora Deutsch als Fremdsprache). Jahrgang 3 • Ausgabe 1 • 2023	Journal page
20 July 2023	New official EXMARaLDA version	EXMARaLDA website
14 July 2023	EXMARaLDA workshop for the International Graduate School German Jordanian University / PH Freiburg	GJU homepage
22 June 2023	Keynote: "Manual and automated, qualitative and quantitative approaches to spoken interaction" as a contribution to the workshop "Computational and Quantitative Approaches to Multimodal Video Analysis - CAMVA 2023" at the University of Zürich	Workshop page
23 May 2023	New publication: Marc Kupietz and Thomas Schmidt (eds.): Neue Entwicklungen in der Korpuslandschaft der Germanistik. Beiträge zur IDS-Methodenmesse 2022. Tübingen: Narr Francke Attempto.	Publisher's page
8 May 2023	Corpus of Spoken Spanish in Equatorial-Guinea completed	Blog post
05 March 2023	Republished in Open Access: Thomas Schmidt (2005/2023): Computergestützte Transkription - Modellierung und Visualisierung gesprochener Sprache mit texttechnologischen Mitteln. Dissertation (Universität Dortmund). Frankfurt a.M.: Peter Lang / Göttingen: Verlag für Gesprächsforschung.	Publisher's page
10 February 2023	Talk (with Hanna Hedeland): "Best Practices, Werkzeuge Workflows und Standards zur Erschließung audiovisueller Sammlungen" as a contribution to the workshop "“Hört, hört!” – Zum Umgang mit Audio in den DH" at the University of Wuppertal	Workshop program
6/7 February 2023	Two-day EXMARaLDA training course at the University of Basel	EXMARaLDA page
30 January 2023	New publication: Arnulf Deppermann, Christian Fandrych, Marc Kupietz and Thomas Schmidt (eds.): Korpora in der germanistischen Sprachwissenschaft Mündlich, schriftlich, multimedial. Band 2022 der Reihe Jahrbuch des Instituts für Deutsche Sprache. Berlin: de Gruyter.	DeGruyter page
13 January 2023	Talk: "Accéder aux corpus oraux: méthodes et technologies" as a contribution to the conference "Qu’est-ce que (se) représenter la parole ? Hommage à Gabriel Bergounioux" at the Université d'Orléans	Workshop page
23 December 2022	EXMARaLDA Christmas Previews are online.	EXMARaLDA blog post
12 December 2022	Lecture: "Mündliche Korpora - Manuelle und automatisierte Herangehensweisen an Gespräche und gesprochene Sprache" in the lecture series "Computer, Mensch, Sprache – interdisziplinäre Perspektiven an der Schnittstelle Sprachforschung/Informatik" at the University of Oldenburg	Series program
21 October 2022	Evaluation results of the EXMARaLDA mini survey are online.	EXMARaLDA blog post