Transcription is a translation between forms of data, most commonly to convert audio-visual recordings to text in qualitative and quantitative research. It should match the analytic and methodological aims of the research. Whilst transcription is often part of the analysis process, it also enhances the sharing, disclosure and reuse potential of research data.
Full transcription is recommended for data sharing.

terrorismeScreenshot of the manual correction (in SubtitleEdit) of the transcription that was generated with a (Vocapia) ASR-engine.

If the transcription is done with ASR or Forced Aligment, each transcribed/spoken word will automatically get a start- and end-time. This makes it possible to access the AV-files directly on the word-level: clicking the selected fragment in the search window may result in playing that fragment aloud.

Separation of content and presentation

Transcripts contain (a lot of) information that can be parsed by computers and humans. Human parsing is robust for small errors but computer parsing is not.
The content is therefor best written in XML (or JSON) using UTF-8. XML enforces a structured way of storing the the data, making it possible to unambiguously parse the transcripts with a computer.

Storing the transcripts in a text-editor format (e.g. docx or pdf) is therefor not recommended. Small, nearly noticable, errors may disable the parsing of the transcript. For example by using a less suitable font: Rl and RI look the same when using the helvetica-font (but clearly different when using the courier-font Rl and RI).
The same is true for the use of a hard-return (RETURN) and a soft-return (SHIFT-RETURN). For the human eye it looks the same but not for a computer, so parsing may go wrong.


XSLT schema3 XSLT-files (left) for export to third party software and 3 XSLT-files (right) for reading by humans


When presenting the transcripts, XSLT-files can be used to generate a human-readable document that

  1. shows just the information that is desired (for example all information or only the text of the transcript) 
  2. presents the information in the look-and-feel of the institution (font, size, colours, etc.) including logo's and standard text.

Finally, if the layout of the transcripts need to be modified, only one XSLT-file need to be changed (in stead of hundreds of word-files).

Use of transcripts in third-party software

When planning the structure of the transcription template, best practice is to:

  • Consider compatibility with the import features of qualitative data analysis software. Which information is needed (a must) or nice-to-have in that particular analysis software package and which information can not be used (so does it make sense to collect that info in the transcription documents?). Again, an XSLT-file can be used to generate XML-files that can be imported in the third-party software.
    Moreover, different XSLT-files can be used to generate different export-files for different third-party software (for example one XSLT for export to AtlasTI, another XSLT for export to MaxQDA).
  • Write transcriber instructions or guidelines to get consistancy in the transcripts, especially when different people make or correct the transcriptions. How to deal with non-verbal, not-understandable or inauditable speech? How to write foreign or dialect words? How to mark sensitive information for later anonymisation?
  • Provide a translation or at least a summary of each interview in English, when the speech is in another laguage.
  • Never trust the transcription results of ASR-software (automatic speech recognition). ASR becomes better and better but the software cannot recognise words that are not in the vocabularct (jargon, foreing and dialect words, acronyms, abreviations, etc. ).

Transcription methods

Transcription methods depends very much upon your theoretical and methodological approach, and can vary between disciplines.

  • A thematic sociological research project usually requires a denaturalised approach, i.e. most like written language (Bucholtz, 2000), because the focus is on the content of what was said and the themes that emerge from that.
  • A project using conversation analysis would use a naturalised approach, i.e. most like speech, whereby a transcriber seeks to capture all the sounds they hear and use a range of symbols to represent particular features of speech in addition to the spoken words; for example representing the length of pauses, laughter, overlapping speech, turn­taking or intonation.
  • A psycho­social method transcript may include detailed notes on emotional reactions, physical orientation, body language, use of space, as well as the psycho-dynamics in the relationship between the interviewer and interviewee.
  • Some transcribers may try to make a transcript look correct in grammar and punctuation, considerably changing the sense of flow and dynamics of the spoken interaction. Transcription should capture the essence of the spoken word, but need not go as far as the naturalised approach. This kind of transcripts is, in combination with forced alignment, often used for the automatic generation of subtitles.

Reference: Bucholtz, M. (2000) The Politics of Transcription. Journal of Pragmatics 32: 1439­1465.

(this text is partly based on the information on the UK Data Service website)


Data Documentation Initiative (DDI) XML
social science data, mandatory and optional metadata elements for study description, data file description and variable description, codebook version (DDI2 or DDI-C) and lifecycle version (DDI3 or DDI-L)
Dublin Core (DC) XML
basic, generic, discipline-agnostic, web resources, 15 (optional) metadata elements
Text Encoding Initiative (TEI) XML
for mark-up of textual data, e.g. turn takers in speech, typos, formatting text on screen
Data Cite XML, RDF
publishing digital datasets with persistent identifier (DOI), five mandatory and multiple recommended/optional elements, discipline-agnostic
ISO 19115 XML
geographic information
QuDex (Qualitative Data Exchange) XML
rich file-level description, document coding and annotation and intra-collection relationships. Allows identification of data objects such as: Interview transcript or audio recording etc.; Relationship to another data object or part of data; Descriptive categories at the object level, e.g. interview characteristics, interview setting; Capacity to capture rich annotation of parts of data
Common European Research Information Format (CERIF) XML
record research information about people, projects, outputs (publications, patents, products), funding, events, facilities, equipment
Metadata Encoding and Transmission Standard (METS) XML
encoding descriptive, administrative, and structural metadata regarding objects within a digital library
Metadata preparation/markup guidelines QualiBank  
QualiBank Processing Procedures
Metadata preparation/markup procedures QualiBank  
Qualitative data collection ingest processing procedures



All about documentation for usinf the technology for and in OH-projects