Once the ASR has been done, one can download the results. However, each ASR-engine seems to have its own set of output-formats. Sometimes you can get a special XML-file, sometimes a CSV-file and sometime something else.
Moreover, if it is XML, each ASR-engine seems to use its own XML-schema.
So we made some software that reads the output of the ASR-engines we support, and transforms it into one of the following output formats:
- SRT: the standard subtitle format used by nearly all existing video and audio players (like VLC)
- VTT: the new internetversion of the SRT-format, used by all the modern browsers
- Karaoke: a html-file where each recognised word is "connected" with the audio-file and where clicking on a word results in playing the audio-file from that word. Words played, are highlighted.
- CHA: the format used in the CHILDES format (spring 2020: work in progress).
The software (FromTo) can be downloaded below in DOWNLOADS for Windows 64 and MacOS 64
DOWNLOADS
Downloads
- Written by Arjan van Hessen
Here the download of various programs that can be used for the " elaboration" of AV-files and Text-files.
What | Description | Apple | Windows |
FromTo | The FromTo program: software that converts the output of various ASR-engines into subtitles (SRT and VTT) and the Karaoke-style output |
FromTo_MacOS.zip![]() |
FromTo_Win64.zip![]() |
ASRcorrector | The ASRcorrector program: software you can use to correct the output of an ASR-engine. |
![]() |
![]() |
Manual TXT2CXML
Manual TXT2CXML
- Written by Maureen Haaker & Arjan van Hessen
Background
Most transcriptions, if made by humans, are written with a text-editor like Microsoft Word. Word has a lot of functionalities, is well-known, available on nearly each computer and is considered as the default text-processor.
However, the result is stored in doc or docx, a Microsoft proprietary format and, it is nearly always unstructured. People can write whatever they want in the transcription. They do so, with other people (i.e. readers) in mind but normally not with a computer in mind.
For example. one can write:
John: Can you tell me about that episode? (Mary starts crying and John waits for 5 minutes).
Mary: yes, I can. So,.......
For human readers it is clear that the part between the brackets, is a comment and not a transcription. For the computer however.....
So, in order to do something with the transcripts, other than reading, the transcripts need to be structured.
TXT2CXML
To structur the hand-made transcripts, a software program TXT2CXML was written. TXT2CXM (text-to-CXML) converts a transcription-document into a more structured CXML-file. CXML stands for Conversational XML and is a protocol created by Telecats (Enschede, NL) for the transcriptions of the Dutch Parliament.
What the program does
The program tries to figure out who are the speakers in a transcript. It does so by assuming that the speaker is the first word of a new line if that word is followed by a colon and a space (= ": ").
Lines without an initial word followed by this colon+space, are considered as belonging to the previous turn.
Additionally, empty lines are skipped (and not recorded as a speaker).
Example
Original | Result |
john: I felt a sleep | john: I felt a sleep |
mary: what did you do? I mean, once you realized that.... |
mary: what did you do? I mean, once you realized that.... |
Finally, the program counts the number of turns.
The result, is stored into a XML-file with the file extension *.cxml. The file is a XML-file and can be read/processed as a normal xml-file. However, to increase the readability, a xslt-file is provided that converts the cxml-file into a more readable html-file (see the example below.
The result, is stored into a XML-file with the file extension *.cxml. The file is a XML-file and can be read/processed as a normal xml-file. However, to increase the readability, a xslt-file is provided that converts the cxml-file into a more readable html-file (see the example in Fig. 1 above).
Preprocessing the transcription
Sometimes, the transcription contains a lot of additional text, lines, and other unusable information such as lines, file location on the hard disk, footnotes and more. In TXT2CXML text can be added/modified/deleted but it lacks the full functionality of a modern word-processor. So, before saving it as UTF-8 text only, one may use the word-processor to do some search-and-replaces (for example, unify the way a city is written by replacing all mentions of ‘Rome’ by ‘Roma’).
Using TXT2CXML
Fig. 2: Screenshot of “save as” command As said, many transcription documents are made in MS-word. In order to use this programme, however, files cannot be saved as .doc or .docx – they need to be UTF-8. This can be done easily within Microsoft Word - simply save the transcription in your text-editor as "plain UTF-8 text" by using the Save As...
The resulting txt-document can be read into the TXT2CXML program.
Opening the UTF-8-Text-only transcription
The first step, is to open the txt-file. The file is read, processed and shown in the "Original" tab.
Identifying speakers
Besides the text of the document, some metadata is "calculated": e.g. the number of speaker turns, the number of different speakers, etc. The speakers found are showed in a small table. By default, the program detects the speaker IDs (for example, "AvH: I was wondering..." → Speaker-ID = AvH) Name, gender, role and description of the speaker are automatically (?) set to unknown.
Fig. 3: Speaker metadata table. The orange arrow point to the "Original" tab, the green arrow to the small table with the (3) speakers.
Abbreviation is the ID as written in the transcript, Name is the full name (if available and desirable), Role is the role of the speaker in the interview.
Editing Speakers
There are 2 main ways of editing the speaker metadata. The first way can be done by editing the transcription itself. For example, in Figure 3, three speakers are shown: Interviewer, Respondent, and Respondent 1. The the easiest way to amend these speakers is to edit the transcription by changing all Respondent 1 to Respondent (if that is the case). After editing the transcript, click “Recaluc Metadata” on the speaker metadata table. This will recalculate and accurately show 2 speakers (see Figure 4).
Fig. 4: The recalculated speaker metadata table
The other way to edit speaker metadata is to modify the metadata in the speaker metadata table (see green arrow in Fig. 3). Editing can be done by just clicking on the cell of the table and replacing the old text by the new one. All cells can be modified with the exception of the Abbreviation (= speaker ID), in the first column.
The result is a more complete and informative set of information about the speakers in the interview (see Fig. 5).
Fig. 5: Modified speaker metadata table
Writing CXML
The final step in preparing your transcripts as a CXML format is to save it as a CXML file. This is done by clicking the “Write CXML” button. The filename will automatically be the same as the input UTF-8_Text-only transcription file, except that the file extension .txt is replaced by .cxml.
The file location will be based on the default settings in the setting tab of the TXT2CXML program.