ASR (Automatic Speech Recognition) is the transformation of spoken content into its textual representation. Or simply said: Speech-to-Text.

Speech Recognition can be done via a computer program on your own computer (Windows, Linux, OSX) or via a webservice. The advantage of software on your own computer is that you are completely in control. Especially when you have (very) sensitive data that may not leave the building, it make sense to run the ASR on your own computer.

Local ASR

Dragon DictateOne of the best "personal" ASR-packages (for English and other languages) is Dragon Dictate. The mobile version of Dragon Dictate can be downloaded for free and works reasonable well if you speak clearly and there is not too much background noice.
However, as the name suggests, Dragon Dictate may be very useful to dictate messages, reports, chats and e-mails but due to its character, is less suitable for Oral History or other narratives recognition. OH-interviews are stored in files that you want to be recognised (often more than one file).

An option is re-speaking: you repeat aloud what is said in the interview. That may be a literally repetition with all the hestitation, repetitions and errors or a (grammatical correct) summary of what was said. This may work fine but it is a enormous amount of work if you have a lot of interviews.

Webbased ASR

The advantage of webbased ASR is that you just upload a bunch of files to a server and receive a message once the recogntion is done (ideally spoken). No fast computer needed and the ASR-software is always up-to-date. The drawback is that you have to pay for the recognition. Or with money or with your data! Companies like Google or Microsoft offer free ASR-services (mostly only for English) but you pay with your audio-data. Your data is used for the improvement (that's ok) of the software and may be for other reasons yet unknown. However, if you have sensitive data this may be a less ideal solution.

WebASR

logo webasrThe Sheffield University based WebASR is the world's first web-based fully functioning automatic speech recogniser. It's free and very easy to use - just convert your audio-files into a suitable format, upload the file and after a short processing period, the transcript will be available in a range of formats including PDF.
You can use the web interface for uploading and retrieving files or you may write your own software and connect to WebASR via their API if you want to process a batch of files programmatically.

Now it's easier than ever to have access to state of the art speech technologies, including transcription of meetings, lectures and general media, speaker diarisation of lectures and media and even automatic translation of lectures. Depending of the type of recording (meetings, media, lectures, interviews) you may add text-documents to improve the Language Model (LM). WebASR can only recognise English spoken content.

input webasr

To use the service, you must be registered. The recognition result are good and the result come in a number of formats. A small tool is available to convert the WebASR-output into subtitles (srt-files). The data you upload will remain on the hard disks of the Sheffield University (you "pay" with your data). However, in case of sensitive data, you may contact them to see what is possible.

Output format

  • a PDF-file with all the recognised speech without time information (when was which word spoken)
  • TTML-file, the Timed Text Markup Language
  • XML-file with two kind of elements with the starttime, endtime, recognised word and the adjacent tri-phones of that word