As a follow up of the CLARIN-PLUS workshops on Oral History (OH) archives in Oxford (April 2016) and Utrecht (dec 2016), the Arezzo workshop is meant for the finalization of the setup of a transcription chain for OH interviews.
The envisaged outcome of the Arezzo workshop is an implementation plan for an OH transcription chain that can be integrated into the CLARIN infrastructure. Once the implementation plan is written, it will be submitted to CLARIN ERIC for final approval. The funding has been reserved already.
The second workshop (10-12 May 2017) in Arezzo is a two-day workshop for max 30 participants (on invitation only).
Main goal of the workshop is to:
- finalize the proposal for the "ideal transcription chain" for oral historians
- find necessary colleagues/partners
- identify possible (CLARIN) hosts for OH transcription services for the three languages.
The location is very near to the railway station of Arezzo and the historical centre is less than 10 minutes by foot.
Directions: Once you get to the railway station, walk through the underpass to Campo di Marte and take the exit on the right, walk straight to the traffic light, cross the road and walk in the opposite direction to the cars. After a few meters, you will find the Campus on your left.
Here you can find a virtual tour of the Campus.
Here the draft version of the workshop-program. The times are just an indication: it may happen that some parts will need more or less time than expected.
Wednesday 10 May
|14:15||Overview||Henk van den Heuvel|
|14:30||Transcription chain||Henk van den Heuvel||The various building blocks of a transcription chain, as discussed in Utrecht workshop.|
|14:45||AD-conversion||Arjan van Hessen||
ASR-tools: Full Speech Recognition for different languages
|15:00||ASR tools, English||Thomas Hain|
|15:20||ASR tools, Dutch||Roeland Ordelman|
|15:40||ASR tools, Dutch||Henk van den Heuvel||
ASR-tools: Alignment of audio and transcripts for various languages
|16:15||WebMAUS||John Coleman &
|16:30||Italian Alignment||Piero Cosi|
|16:45||Experience feedback||Graham Gibbs||Participants reports on their experiences with the ASR tools and Alignment tools|
|17:15||DIY||Arjan van Hessen||Discussion about desired formats of the ASR-tools. What do you want to get back from the ASR-engine?
|18:30||Close of first day||Silvia Calamai|
Thursday 11 May
|9:15||Buon Giorno||Henk van den Heuvel|
Transcription: Guidelines, Standards, Editors, Crowdsourcing
|9:25||Transcription guidelines||Stef Scagliola & Silvia Calamai||Various standards, best practices for Oral History|
|9:45||Manual transcription correction services||Arjan van Hessen||What is there to be used by individual researchers (for example SubtitleEdit)|
|10:00||Web-based annotation editors||Christoph Draxler||Portal for individual researchers and in in a crowdsourcing environment|
|11:15||Crowdsourcing||Arjan van Hessen||Crowdflower (in 2020 bought by Appen) crowdsourcing strategies and transcription correction|
|11:25||Discussion||All||Participants reports on their experiences with Transcription services and crowdsourcing platforms|
|12:00||Hand-on experience||Arjan van Hessen & Christoph Draxler||Do a correction of your own transcriptions, set-up a crowdsourcing experiment where people can help you with the transcriptions, and try-out the transcription guidelines (good or not and what is missing)|
Metadata: Guidelines, Standards, Editors
|14:00||Metadata||Stef Scagliola & Louise Corti||Overview of standards, relevant categories, language of metadata, translation etc|
|14:30||Metadata editor||Henk van den Heuvel||A metadata editor as implemented at CLST|
|14:45||Discussion||All||Participants reports on their experiences with Metadata-editing|
Presentations on data management/hosting in NL, UK, IT ((persistent) archiving options)
|15:15||National Infra: NL||Rene van Horik||About the data infrastructure in the country and how our services could fit into that & access to data, tools, metadata for the research community at large & IPR / informed consent / ethical issues|
|15:30||National Infra: UK||Louise Corti||About the data infrastructure in the country and how our services could fit into that & access to data, tools, metadata for the research community at large & IPR / informed consent / ethical issues|
|15:45||National Infra: IT||Monica Monachini||About the data infrastructure in the country and how our services could fit into that & access to data, tools, metadata for the research community at large & IPR / informed consent / ethical issues|
|16:00||National Infra: CZ||Pavel Stranak|
|16:20||Discussion||Henk van den Heuvel|
|18:00||Close of meeting||Silvia Calamai|
Friday 12 May
|9:45||Buongiorno||Henk van den Heuvel||Summary of day 2 and overview of day 3|
|10:00||Wrapping up||Henk van den Heuvel||
|10:30||Proposal||Arjan van Hessen||Concluding actions for finalising the implementation proposal|
|11:45||Time schedule||Arjan van Hessen||Setup of the time schedules for the next months: from workshop to proposal.|
|12:15||Plan for a publication||Stef Scagliola||How to set up some publications based on the work done in this workshop?|
|14:00||Adjourn||Henk van den Heuvel & Silvia Calamai|
At this moment (1 May 2017) the following persons have confirmed their availability at the workshop.
|IT||Dipartimento di Scienze della formazione, scienze umane e della comunicazione interculturale||Linguistics|
|IT||Dipartimento Culture e civiltà, Università di Verona||Oral History|
|IT||Institute of Cognitive Sciences and Technologies||Infrastructure|
|IT||Dipartimento di Ingegneria elettrica e delle Tecnologie dell'Informazione||Language and Speech Technology|
|LU||Faculté des Lettres, des Sciences Humaines, des Arts et des Sciences de l'Education||Oral History|
Charles University, Czech Republic
|Language and Speech Technology|
|D||Ludwig-Maximilians-University of Munich Institut für Phonetik und SprachverarbeitungMunich, Bavaria, Germany||Language and Speech Technology|
|UK||UK Data Archive||Oral History|
|UK||University of Huddersfield||Oral History|
|UK||Department of Sociology, University of EssexSociology, University of Essex||Oral History|
|UK||Phonetics Laboratory, University of Oxford||Language and Speech Technology|
|UK||Bodleian Libraries, University of Oxford||Infrastructure|
|UK||Speech and Hearing Group at the Department of Computer Science, University of Sheffield||Language and Speech Technology|
|NL||International Institute of Social History||Oral History|
|NL||Netherlands Institute of Sound and Vision||Language and Speech Technology|
|NL||CLARIAH - WP5||Oral History|
van den Heuvel
|NL||CLST, Radboud University||Language and Speech Technology|
|NL||HMI, University of Twente||Language and Speech Technology|
More images of the workshop can be found here.
Comments on the ASR-engines
Here the comments of various participants on the three ASR-engines (English, Dutch and Italian). The majority of the comments deal with the Sheffield WebASR-engine. Some with the Radboud ASR-Engine and just one comment is about the Italian ASR-engine. This is because the Italian ASR-engine was not online at the time of the workshop. It could only be accessed via the computer of Piero Cosi.
Before one could use the web-based engine, you need an aproved registration by the Speechgroup of the Sheffield University. WebASr is at this moment the best featured ASR-engine..
I try it with some youtube file. The first one was a journalistic interview and the result is good a part for some misunderstanding. The two voices of interviewer and interviewed are not split. The second file was a oral history interview to an old irish woman speaking about her family and origins (probably the worst choice I could made) and the result is - understandably - awful. It would be better to have a .doc or similar output file.
Activated login ; uploaded an mp3 of child in tiime, Deep purple. Updated the XML metadata to select the segments to transcribe. Got the transcription quite distant from the original text.. No problem.. it had to be such. I'll try another wav at work
Riccardo Del Gratta
Activated login Uploaded 1 x mp3 (anthropology) 2 x wav files (popper and HLS). Didn’t realise one could upload metadata though e.g interview summary but for a short clip might not be so useful?). Got a zip, ttml (didnt understand format?), XML and pdf files. Actually did a very good job indeed, although speech very clear and pronounced. No punctuation!! However, once downloaded the file names are not intuitive and better if could include the name of the the input file. Also extract as rtf as another default setting? Hard to match up. XML = really detailed metadata with phonetic output! Converted pdf to word to compare against my own gold standard transcript. Pretty good!
‘Alex interview start.wav’ gives,
‘Alex interview start Transcript.pdf’
Does not pick up different speakers and misses start of text.
Gives good transcript of single speaker. But with some missing repeats and no acronyms.
Lecture recording did less well. Poor sound at start and two speakers. Lots of mistakes but editable text.
When I first tried to use this programme, I had several 500 errors. I did eventually get this to work for my audio files after trying at a different time, however the transcripts were not particularly useful. This, however, is clearly down to the quality of audio - these were mp3s with background noise, elderly voices, etc. I will check other audio files on here to experiment with best practices for sound recording. Very valuable resource - will run all audio files through here first (and hopefully find a good method for recording reliably high quality audio).
Signed up and got a reply after a few hours, IS THERE A PERSON OR A MACHINE THAT REACTS? successfully uploaded audio and downloaded outcome in pdf, running text with no speaker turn, took me 1 hour to correct and edit transcription of 5 min audio, had no clue about alternative formatsin output that have this requirement already, if presented in XML I can't make sense out of it. I need something readable with speaker turn and time codes, this saves me time and makes use of ASR an advantage - 2nd attempt after the presentations. I tried to upload an English speaking focus group, in webASR, first I had to convert an Mpeg audio file of 1.15 hours, into something manage-able. I used Movavi converter on my own MAC, and that was easy. Then I tried AUdacity to cut a 5 min fragment. That was a tedious task, as the functionality to mark the segment was not clear, had to ask help, then I managed to cut a fragment, but interface Audacity not quite clear whether is was really 5 min. Then I uploaded the audio, and tried to create a text file with muy notes on the focus group to upload and improve the performance of the ASR. WHen I asked Arjan, it appeared that this function is not supported yet. All this took me en entire hour. I had expected that the speed with which the document was uploaded would reflect the speed of processing, but It took more time, it is processing for some time now, about 30 min. THen I asked THomas and it appeared that I had not uploaded 5 min, but 1.15 hour, the problem is that you must not save the project, but export the fragment in Audacity, if you forget, it will give the original file a new name.
Several errors in the beginning, Thomas helped me in the workshop. Results are not as good as the Dutch resluts.
- works fine for me
- extra info wrt expected (remaining) processing time would be nice
- names of outputfiles have no correspondance to name of audio input file which is confusing!
Henk van den Heuvel
Before one could use the web-based engine, you need an aproved registration by the CLST of the Radboud University. Currently only 16kHz-16-bit-mon wav files are supported.
Easy to use, but I had no idea how to convert the file. I tried otranscribe immediately after this, and then went back. Worked fluently, but the outcome was not very good. Short interventions are missing, bad transcription. It would be useful to have an info sheet that summarizes the weaknesses of ASR. I would have understood from the onset why some parts of the interview were badly transcribed (speaker turns, different languages in one recording etc.).
First attempts failed because not clear what to upload and what went wrong. Onsite help during the workshop in Arezzo made it very easy. Some very good results, some poorer results. Not sure whether the results are influenced by the lack of domainsepcific volabulary or more by the quality of the recording.
Some remarks of missing features or unclear manuals after the testing of the engine during and after the Arezzo workshop.
- Input audio is limited to wav 16 kHz 16bit; (should be mentioned)
- Projectnames and filenames: alphanumeric characters should be allowed
- The program suggests multiple audio file upload but works on single files only
- Outputformats should be clearly explained
Henk van den Heuvel
The windows-based engine has to be downloaded and installed on a personal computer.
The results Piero shows are encouraging. Is important to stress on the quality of the audio document that can be useful for training the system (e.g. no voices overlapping, good audio quality, standard speech, orthography accurance in transcription, correct identification of disfluences). I think collection of recent and brand-new interviews could help this process.