There are various software packages for the transcription of spoken content. The software can be divided in Free/Open Source versus Commercial an Device based versus Web based: each of them with their own pro's and cons. Moreover, there is alway the possibility to transcribe the spoken content with just a text editor.

The advantage of transcription software however is that it offers "playing" with "typing" and that the resulting transcription is time-aligned. i.e. the start and end time of each text-fragment (a word, a couple of words, a phrase or even a paragraph) is known. This time-alignment makes it possible to search for spoken words and to generate subtitles.

Transcriptions made with an ordinary text editor (Notepad, Word, etc.) lack this time-alignment and the result is just text. Combining this text with forced alignment however will result in the same time-aligned transcriptions as with dedicated transcription software.


The "time resolution" of the transcription software depends on the human editor who selects short fragments (words or even phonemes) or rather long fragments (paragraph). Another often used method for the time-alignment is to place time-stamps in a fixed interval (e.g. each 30 sec or each 5 minutes).

Once the transcription is made (with a text editor with or without time-stamps or dedicated transcription software on an utterance level) a final foreced alignment will result in a more precise determination of the start- and end-times of each word and, if desired, the start- and end-times of the spoken phonemes.

For Oral Historians, time-aligment on the utterance level will be "enough", but modern technolgy makes it extremely simple to automatically add a higher granularity on the time-aligned transcriptions.