USERS AND THEIR SCHOLARLY PRACTICES IN A MULTIDISCIPLINARY WORLD
Over the past 2 years a number of researchers from various backgrounds have been working on the exploitation of digital techniques and tools for working with oral history (OH) data. These endeavours have been supported by CLARIN (Common Language Resources and Technology Infrastructure) as this European infrastructure for harmonizing and sharing data and tools for linguists, aims at broadening its audience, for example, to the social science community. It intends to realize this objective by reaching out to scholars who are interested in cross disciplinary approaches and who work with interviews/oral histories/qualitative data.
CLARIN is interested in how it can better support the diversity of practices among social scientists, oral historians and digital humanists. and how it can lower barriers to the use and take up of its resource and technologies. Through supporting workshops they can get researcher-oriented feedback and suggestions for refinement and improvement of its resources.
The third of our programme of workshops, supported by the funding program, CLARIN will be held in Munich, Germany on 19-21 September 2018, and will focus on the analysis phase of the research process. This builds on the work already done during previous workshops, in order to create automatically generated transcripts. By bringing together CLARIN technologies for speech retrieval and alignment from different countries, the ‘Transcription Chain’ has been developed, a prototype that will also be tested during the workshop.
The invitation-only workshop aims to gather of evidence on scholars' everyday practices when working with OH data. By documenting and comparing these engrained practices, and by venturing into different approaches and methods to their data, namely by using unfamiliar annotation. For the purpose of this workshop we have agreed to work with audiovisual and textual data in 4 languages: Dutch, English, German and Italian, and will have prepared some materials for participants to work with, in language groups.
There will be some preparatory work that has to be done prior to the workshop that will consist of completing a matrix of user approaches and tools for your specialty, analysing some extracts of data and reviewing some documentation that we will send in advance. Our estimate is that this will take somewhere between 5 and 10 hours of your time and the 'homework' aims to reflect participants' various expertise that will be used in the workshop.
No specific technical knowledge is required.
All participants need to be able to commit to the full 3 days - Wednesday lunchtime on 19th September to lunchtime on Friday 21st September. We are able to pay up to €275 maximum towards your economy class travel. Please ensure that your travel is booked well in advance and check with us in advance if you need to. Hotels and meals will be provided from Wednesday lunchtime to Friday lunch. We may be able to arrange pick-ups from airports if people's travel plans coincide.
Below the draft version of the workshop-program. The times and chairs are just an indication.
Wednesday 19 September
Overview and demos
|14:00 - 14:15||Welcome and overview of the workshop
|14:15 - 15.30||Very brief summary of landscape(s)
|15.30 - 16.00||Coffee|
|16.00 - 18.00||Demonstration of the TChain workflow Christoph Draxler and Arjan van Hessen
Introduction to the workshop sessions/resources Louise and Maureen
Expected outcomes and evaluation method St the workshop Norah and Max Broekhuizen
Introductions in language groups
Thursday 20 September
Information exchange: envisaging and embracing others’ techniques and tools
|9:15 - 9:30||Assemble into language groups|
|9.30 - 11.00||Session 1: The Transcription Chain (TChain) (hands on) Arjan and Christoph
|11.00 - 11.30||Coffee|
|11.30 - 13.00||Session 2: Researcher annotation tools (hands on) Liliana Melgar and Silvana di Gregorio
|13:00 - 14:00||Lunch|
|14:00-15.45||Continue Session 2 Researcher annotation tools Liliana Melgar and Silvana di Gregorio
|15.45 - 16.15||Coffee|
|16.15 - 18.00||Session 3: On the fly linguistic tools (hands on) Jeannine Beeken
Friday 21 September
Information exchange: envisaging and embracing others’ techniques and tools
|9:00 - 11.00||
Session 4: Preprocessing and textometry tools (hands on) Florentina
|11:00 - 11:15||Coffee|
|11.15 - 13.15||Session 5: Emotion recognition tools: audio and video Khiet Truong
|13:15 - 13:30||Short summary and adjourn|
|13.30- 14.00||optional lunch|
The following persons have confirmed their availability at the workshop.
Florentina Armaselu is a research scientist at the Luxembourg Centre for Contemporary and Digital History (C2DH) of the University of Luxembourg. She holds a PhD in Comparative Literature and an MSc in Computer Science with a thesis in Computational Linguistics, from the University of Montreal, Canada. Her research focuses on text analysis and text encoding, human computer interaction and digital editions, digital history and literary studies.
Oral history connection: Florentina has been involved in a project applying language technology to the analysis of oral history data in European Integration. The project will be presented at the CLARIN 2018 Conference (Florentina Armaselu, Elena Danescu and François Klein, Oral History and Linguistic Analysis. A Study in Digital and Contemporary European History).
Is a linguist with a background in lexicology and lexical semantics, morphology and formal syntax. She received her PhD in 1991 from University of Leuven and later initiated the Dutch/Flemish agency for Human Language Technology and was the scientific and managing director of the (now called) Institute for the Dutch Language in Leiden. She is currently works at the UK Data Service at Essex where she serves as the key liaison between discovery and technical staff, communicating requirements, evolving rules and ensuring implementation of metadata and resource discovery systems, including vocabularies.
Oral history connection: Jeannine is interested in the discovery of all data the UKDS harvests on study, variable and question level. She is currently involved in and working on metadata, ontologies and semi-automatic indexing.
Is a bachelor student of history at the VU University mostly interested in modern history and migration history. Recently he started to work with Norah Karrouche on the digitalization of the House of Stories Archive of interviews of Rotterdam migrant stories.
Oral history connection: Max’s interest in oral history stems from my interest in migrant stories. By interviewing, he finds he Ican most completely immerse himself into their world and experiences. In this workshop, he hopes he can learn more about how to analyze these stories with digital tools.
Silvia Calamai (workshop committee)
Is Associate Professor in Linguistics and Sociolinguistics at the University of Siena and the scientific co-coordinator of the Project on Tuscan Oral archives Grammo-foni (Gra.fo) “Le soffitte della voce” (2010-14). Her main research interests are sociophonetics, oral archives and dialectology. She is a member of the CLARIN Legal Issues Committee and is currently coordinating the project Chinese Culture and Languages in Italy (Wenzhou University & Siena University) and the scientific committee of the Historical Archive of the Arezzo psychiatric hospital (2017-). She is on the board of Italian Association of Speech Sciences and of Sonorités - Bulletin de l’AFAS Association française des détenteurs de documents audiovisuels et sonores.
Oral history connection: SIlvia has been working extensively with oral history archives and transcripts through my work in the Gra.fo project. She re-uses oral history data for linguistic analysis (mainly in the realm of sociophonetics and morphology) and for teaching her Sociolinguistics classes (about transcription and linguistic variation).
leads the Centro di Sonologia Computazionale, University of Padova and is responsible for the course of "Fundamental of Computer Science" in the Department of Information Engineering and is a lecturer for the course of "Sound and Music Computing". His main research interests involve 1) Speech and music restoration, 2) Expressive information processing, 3) Auditory displays, and 4) musical cultural heritage preservation and exploitation. He has authored over 200 publications and is member of editorial board of Journal of New Music Research, and has sat on Technical Committees at several conferences. He is CEO of Audioinnova, an university spin-off enterprise, that has its roots in a research laboratory with the aim of commercially promoting research results.
Oral history connection:
is a linguist with a background in phonetics, phonology, language acquisition and change. She received her PhD in 2006 at Scuola Normale Superiore, Pisa where she currently works as a researcher in experimental phonology.
Oral history connection: Chiara’s interest in oral history stems from recent work on the transmission of sociolinguistic information in the speech of adult migrants abroad, where migrants’ narratives about past and present life experiences constitute the major source of linguistic data to analyze.
Louise Corti (workshop committee)
Louise Corti is an Associate Director at the UK Data Archive, since 2000, and currently leads the UK Data Service Collections Development and Data Publishing teams. Her current research activities are focused around standards and technologies for ingesting, archiving and presenting digital social science data, particularly using open source infrastructures and tools. She has published widely on research data management for the social sciences.
Oral history connection: Before joining the Archive in 2000, Louise helped establish, as Deputy Director of Qualidata, the world’s first national qualitative data archive, from 1994. Here she helped pioneer the systematic archiving of qualitative data, whose collections including oral and life story interviews. She led on the development of QualiBank, an online publishing system for indepth interviews.
Silvana di Gregorio
Silvana is a sociologist and a former academic. She has been involved with training, consulting and publishing about qualitative data analysis software since 1995. For 16 years she had her own training/consultancy business – SdG Associates - focussed on qualitative analysis using a range of softwares, delivering training worldwide. Since 2013, she works for QSR International, the developers of NVivo, until 2016 as their Training and Research Consultancy Manager in Europe, the Middle East and Africa. She is currently Director of QSR Research, working closely with the user community and potential users in ascertaining their analysis software needs.
Oral history connection: Her PhD was based on the life histories of older people in Leeds, UK born between 1885-1905 - looking at the relationship between their earlier work, health and family histories and how they were able to manage in later life.
Christoph Draxler (workshop committee)
Is a computer scientist by education, and has been working at the phonetics lab at LMU Munich since 1991. His research interests include web based services, crowdsourcing, and speech databases. He is the project manager of the local CLARIN-D center “Bavarian Archive for Speech Signals” (BAS), together with Florian Schiel. The BAS offers a number of web services and standalone software for processing spoken language, e.g. SpeechRecorder for high-quality scripted audio recordings, G2P for automatic phoneme to grapheme conversion, WebMAUS for automatic word alignment and phoneme segmentation of speech, Octra to facilitate the creation of orthographic transcripts, and Emu WebApp for detailed phonetic analysis of speech data. These tools and services support many languages, and they are free to academic users.
Oral history connection: Christoph has co-developed the CLARIN Transcription Chain for oral history data.
Maureen Haaker (workshop committee)
works at the UK Data Service at the University of Essex and University of Suffolk specialising in working with qualitative data, including training around effective and ethical ways to process and publish data, as well as basic research methods classes that cover qualitative methodologies. In addition to teaching, she is also a student herself and halfway through her PhD in Sociology on subjectivity during pregnancy.
Oral history connection: Maureen is using the Biographical Narrative Interview Methods (BNIM), which uses an oral history interview style informed by an psychoanalytical approach.She has worked extensively to prepare and publish oral history transcripts through her work at the UK Data Service, and has re-used oral history data to create teaching resources, case studies, webinars, etc.
Is Professor of Speech and Audio Technology at the University of Sheffield, Head of the Speech and Hearing Group, the subgroup of Machine Intelligence for Natural Interfaces, and the Voicebase Centre for Speech and Language Technologies. Track record in speech recognition, diarisation, speaker identification, accent identification, emotion recognition, machine translation etc and core technologies such as machine learning and signal processing. His team is responsible for the service webasr.org that provides recognition, alignment, diarisation and translation services for large audio files.
Oral history connection: Thomas is interested in how speech technology can be adapted to suit the needs in OH, and what attributes would be interesting. Webasr.org is adaptable and changeable in input and output formats. If you are interesting in using it, please get in touch!
Arjan van Hessen (workshop committee)
I'm a researcher in the field of Language and Speech Technology (HLT) for the past 2 decades. I work on two universities (Twente and Utrecht) and a commercial company (Telecats). At all affiliations I mainly work on the disclosure of spoken content with the help of technology.
My initial interest was on men-machine communication where ASR was used to automatically process telephony calls. The last years the interest was shifted to the disclosure of spoken content: meetings, interviews, RTV-content and more. One of thecurrent focus points is on the adaptation of Language Models to improve the recognition and the use of "AI" to go from Spoken Speech to Written Speech.
Oral History connection: how to improve the recognition results of (often) older interviews and how to connect parts of the spoken content to external documents like newspapers, books and articles.
Is an Associate Professor for History and Cultural Studies in the Program ‘Citizenship and the Humanisation of the Public Sector’ at the University of Humanistic Studies in Utrecht; specialising in memory culture (‘family memory’), generation, recognition and transitional justice research.
Oral history connection: Her research 'Narrated (In)justice', doing oral history research into narratives of injustice, studies how post-war and post-colonial memories in the Netherlands interact. She is interviewing in three communities (Dutch Jewish, Indo/Indonesians, Surinamese and Dutch Antilleans). Her research on Trauma & Resilience: Intergenerational Holocaust research from an existential perspective focuses on 11 Jewish families over 3 generations (trained interviewers who are spiritual counselors; they use the biographic narrative interpretive Wengraf method and a life history calendar) explore experiences/narratives of trauma, resilience, (moral) injury, generation, generationality, positionality, recognition, (in)justice.
specializes in historical culture in Morocco and Algeria and among Maghrebi communities in Europe, and the history and anthropology of mobility in the Mediterranean. In 2016, she started working as a scholar in CLARIAH-NL at the Erasmus Studio, Erasmus University Rotterdam.
Oral history connection: Norah is interested in the uses of oral history in migration research, both as fieldwork method and as research ethic. She used oral history as part of her dissertation research on the representation of the past in life stories of Moroccan migrants.
has a doctorate in history, and is a research associate at the Institute for History and Biography at the FernUniversität in Hagen and head the archive "Deutsches Gedächtnis".
Oral history connection: Since the 1980s Almut has conducted, documented and analysed oral history interviews in various research and documentation projects. Over the past 25 year she has built up an archive for biographical narrative interviews, which now comprises 3000 audio and video interviews, dealing with issues of long-term archiving, digitisation, metadata standards, online provision, computer-aided access and analysis procedures. She is currently doing an interview project on the history of Freie Universität Berlin and is working with the Data Center for the Humanities of the University of Cologne and the Fraunhofer Institute for Intelligent Analysis Systems on the improvement of language technologies for the analysis and archiving of qualitative interviews (automatic speech recognition, forced alignment).
holds a PhD in information science and is currently a postdoctoral researcher at the University of Amsterdam for the CLARIAH-NL project (Dutch infrastructure for digital humanities and social sciences). Her research focuses on scholarly annotations and the tools that support the research process of humanities scholars, more specifically of media scholars and oral historians. This research is done both from the methodological side and from the technical side, as user requirement analyst. She conducts user studies for CLARIAH, as well as testing and dissemination of CLARIAH tools. Other research interests include information behaviour, multimedia access and retrieval, classification theory, and film studies.
Oral history connection: As part of the CLARIAH project, Liliana works with oral historians (e.g., Norah Karrouche, Susan Hogervoorst, among others) gathering their requirements for the infrastructure.
is a Senior Research Officer on the oral history project 'National Service Life Stories' at the University of Essex. His doctoral research at Queen Mary University of London used oral history to explore how understandings of the First World War shaped military enlistment and conceptions of masculinity in the Second World War. He co-convenes the Oral History Seminar at the Institute of Historical Research, London.
Oral history connection: Joel uses oral history to research the social and cultural history of war and masculinity in twentieth century Britain, because it provides unrivalled opportunities to understand subjective experience and to challenge the assumed influence of popular culture and hegemonic gender discourses. He is interested in oral history and popular memory; with practical methods of analysing and managing large quantities of oral history data; and with analytical approaches that preserve the variations in subjective experience.
is associate Professor of Digital Information Studies in the Department of Information Studies, UCL, where she leads the Digital Humanities MA/MSc programme. She is also Deputy Director of the UCL Centre for Digital Humanities and on the Leadership group of the UCL Centre for Critical Heritage . She has published widely on the history of Digital Humanities and her work has been published into a number of languages, including Russian, Polish and Chinese. Her research projects include a Leverhulme-funded collaboration with the British Museum on the manuscript catalogues of Sir Hans Sloane; an historical newspaper data mining project; and a Marie Curie action 'Critical Heritage Studies and the Future of Europe'.
Oral history connection: Researching the history of Digital Humanities using an oral history methodology is Julianne’s passion of life! This work has most recently resulted in the 2016 open access book Computation and the Humanities: towards an Oral History of Digital Humanities. She is finishing work on a book about the female keypunch operators who worked on Roberto Busa’s Index Thomisticus project, again drawing heavily on an oral history methodology.
is an historian and research associate at Center for Digital Systems (CeDiS) of Freie Universität Berlin where he co-coordinates the E-Research & E-Publishing department. Holding a M.A. on migration history and a PhD on Visual History of tourism, he published articles and edited volumes on forced labor, tourism research, migration history, visual history, remembrance cultures and online education. He worked previously at Ravensbrück Memorial Museum and for Berlin Office for Compensation of Nazi Victims, and is active in the Berlin History Workshop.
Oral history connection: Since 2008, he has worked with digital interview collections, mainly the online archive “Forced Labor 1939-1945. Memory and History” containing around 600 interviews with transcriptions, translations and additional material. He also has developed interview-based digital learning environments as well as a smartphone application.
Is a PhD student of the PhD programme in Historical, Geographical, Anthropological Studies (University of Padova, University of Ca’ Foscari Venice, University of Verona). She is currently working on the Italian psychiatric Reform during the 70th and in particular on the Reformation project of the Psychiatric Hospital of Arezzo.
Oral history connection: For her research Caterina has already collected 35 interviews in order to understand the subjective experience of who worked and lived the introduction to new method and approach to mental illness. In February 2017 Caterina attended the workshop, Memorie Immaginate, organized by LabOr, Laboratorio di Storia Orale, University of Padova and she took part as auditor in CLARIN workshop in Arezzo (10-12/03/2017). She represents the Italian Oral History Association ( AISO).
received a Master's degree in Electronic Engineering from the University of Padova (1996), and a PhD degree in Audiovisual Studies from the University of Udine (2007). He is with the Sound and Music Computing Group of the Dept of Information Engineering and his main research interests are in sound and speech restoration, interactive multimodal systems, models for expressiveness in music. He has participated in 11 national and international research projects, among them: 2010-2012 DREAM (Digital Reworking/reappropriation of ElectroAcoustic Music), Culture 2007; 2009-2012 SRSnet: Smart Multi-Resource-Aware Sensor Network, Interreg IV; 2005-2006 Preservation and On-line Fruition of the Audio Documents from the European Archives of Ethnic Music, Culture 2000; 2004-2008 ENACTIVE (Enactive Interfaces), European Network of Excellence; 2000-2003 MEGA-Multisensory Expressive Gesture Applications, IST-1999-20410. He is responsible for the course Computer Architectures for the undergraduate degrees of the Information Engineering area.
Stefania Scagliola (workshop committee)
Stefania (Stef) Scagliola is a historian specialized in digital history, with an emphasis on oral history collections. She is presently working on a digital teaching platform at the Centre for European and Digital History at the university of Luxembourg.
Oral history connection: From 2006 to 2011 she was the coordinator of an oral history project conducted at the Netherlands Institute for Veterans resulting in an interview collection consisting of 1000 audio interviews with a representative number of Dutch veterans of war and military missions. From 2011 to 2016 she was a postdoc researcher at the Erasmus Studio for e-research, and involved in feeding in the methodology of the veterans project into the video-oral history project Post Yugoslav Voices. She was involved in several usability studies for scholarly users and in developing a methodology for re-use of oral history archives in a multidisciplinary setting.
is assistant professor in the Human Media Interaction group, University of Twente, actively working in the fields of affective computing and social signal processing. She holds a master in Computational Linguistics (Utrecht University) and a PhD in Computer Science (where she investigated emotion recognition in speech and automatic laughter detection). Khiet supervises PhD students and master/bachelor students of Interaction Technology and Creative Technology and teaches courses in Speech Processing, Affective Computing, Foundations of Interaction Technology. She is an elected executive member of the Association for the Advancement of Affective Computing (AAAC).
Oral history connection: Khiet’s interests lie in the automatic analysis and understanding of verbal and nonverbal, specifically vocal behaviors in human-human and human-machine interaction, and the design of socially interactive technology to support human needs.
For questions related to the workshop, feel free to ask:
Institut für Phonetik und Sprachverarbeitung
Tel: +49 1577 1894251
UK Data Service
Tel: +44 7960 053 281
UK Data Service
Tel: +44 7510 137 255
Two blogs were written shortly after the workshop by Stef Scagliola and Arjan van Hessen.
At the CLARIN-days in October (Pisa, Italy) a poster was presented.
In december 2018, three papers and a workshop were submitted for the DH2019 conference in Utrecht. unfortunately, two papers were rejected
Evaluation of the OH portal (Arjan van Hessen)
Evaluation of the OH portal
During the successful and enjoyable workshop in Arezzo (May 2017), it became clear that, if done properly, automatic transcription of interviews could be useful to get a quick overview of what was said in the interviews. The participants in Arezzo were aware of the imperfections of automatic speech recognition (ASR) and knew that the recognition results will decrease when audio quality is low, when there is background noise, or the speech spoken in (heavy) dialect.
What was explicitly asked in Arezzo, was to keep the portal to be developed, as simple as possible by making no or just a few demands on the audio input, having clear instructions and using as little technical jargon as possible.
After the request for making the portal was approved in autumn 2017, the team of Christoph Draxler at the LMU in München started to build the first version of the OH portal. An upgraded beta version (1.0.0) of the portal was presented to participants of the 2018 workshop in München.
The idea behind the OH portal is simple. You go to the website (https://www.phonetik.uni-muenchen.de/apps/oh-portal), select one or more sound files, upload them and download the automatically generated transcription. Currently, audio-files must be formatted as wav files, but in the near future the portal itself will transform a range of submitted audio-files into the correct wav format. What is already possible is that it does not matter with which sample frequency the files are recorded or whether they are mono or stereo. In case of stereo, the portal asks the user whether he or she wants to process both audio-channels separately or together (i.e. added to one signal). If the user chooses for separately, both channels are done one after the other so that when you have recorded the interview with 2 speakers each on a single channel, you can better separate the different speakers, determine turn-taking and get even a better recognition result.
Within the portal, the button to select the wav files recognised from a user’s own computer. Then click on the button whereafter a selection window will open (see below) where you can set the different options.
At this moment, the choices made in the "Verify Files window", are valid for all the selected files: so, you cannot select different languages or recognisers when you select more than one audio-file.
Once the choices have been made, you can start the process via the button.
The audio-files are uploaded and then processed. As said, if a stereo file is included, you will be asked how you want to process the stereo file.
It is widely recognised that speech recognition hardly ever works flawlessly. Depending on the quality of the recordings, the way of speaking and the use of words/jargon by the various speakers, their accents and the presence of background noise, speech recognition will be more or less successful. With good recordings, clear, coherent speech, an error rate of less than 10% is possible for the four languages in the current portal (En, Nl, It, and De).
But even with very good recognition, something can go wrong. The Manual Transcription button offers the possibility to make corrections in the recognition results. However, by editing the recognized text, the connection between the recognized word and the time of the spoken words in the audio-file is broken. After the automatically obtained transcription has been corrected manually, you can restore this connection by choosing Word alignment. The ASR-engine will redo the job, but this time it knows exactly what was said. The result is now a perfect transcription where from every word it is exactly known when it was pronounced. This offers the possibility to automatically generate subtitles and make a karaoke version where the pronounced word is highlighted when played.
During the first day of the 2018 workshop, the way automatic speech recognition works, the choices made and the problems that appeared when building the portal, were explained to the audience. The current OH portal is a web service that "collects" the audio files and then, depending on the choices made, forwards the files to the different speech recognizers (WebASR in Sheffield, LST-NL in Nijmegen, LST-En in Enschede, EML-D/It in Germany).
Each recognizer then returns its output in a particular output format. So, to get a uniform output result, the results must be re-written by the OH portal to one of the selected standards. When additional languages are added in the near future, this rewriting process has to be done over and over again.
Commercial versus Open Source
There are many more recognizers available (and also for more languages than the current 4), but they are almost all (semi-) commercial. It is very easy to connect the excellent working Google recognizer and in the beta versions of the OH portal this was done. But there is a price to pay.
Paying with money is usually not a problem because it is nearly almost just a few euros per hour of speech. But almost always the audio data used is stored on the discs of the commercial parties for extra training, testing or something else. And that is often a problem because in many interviews the content is sensitive and likely subject to the GDPR. Even in our situation with a (reliable) portal - where all user’s data are removed 24 hours after they have been processed - it may be a problem because collection-owners may expressly state that the data cannot leave the "building" without permission, or have not yet put in place GDPR-compliant processing agreements.
As a safety measure, for use of the OH portal during the workshop it was therefore decided to remove the commercial recognizers as an option of choice; and open collections were used as far as possible for testing purposes.
At the first evaluation on Wednesday afternoon, however, participants questioned whether it might be useful to restore this “commercial” option, and to explicitly indicate that the recognizers X, Y and Z are "commercial and that they will probably keep your data on their disks”. It is then up to the users to decide whether or not to use these recognizers. This is something we will consider for the next version(s).
On Thursday morning, following a short demonstration of how the portal could be used, participants were invited to upload a short sound fragment (own sound file or one available via workshop portal) and recognize, edit, align and finally download the results.
In most cases, this worked well but the systems of the LMU were unable to operate with 20 users in parallel, so error messages appeared, and some participants had to wait a very long time to get the results of a fragment of 5 minutes.
The biggest problems were solved overnight by the team of Christoph, but scalability is certainly something to look at for the next version. Fortunately, most participants were very pleased with the simplicity of the portal. The only thing that turned out to be tricky was extracting the audio from video interviews and / or converting special formats (eg *.wma or *.mp3) into the proscribed *.wav format.
Technically this transformation is a piece of cake, but where users do not know how to do it and do not have the right software on their computer, this may be a barrier. The future option to do it in the portal was therefore greeted with enthusiasm.
Most participants were more than satisfied with the recognition results and did understood that automatic speech recognition of sound fragments that were barely audible was a difficult, if not an impossible ask.
Participants asked whether additional output formats could be included so that they could import the results of the entire process directly into their own systems (Zwangsarbeit Archiv, ELAN), and whether XML files, marked up in, say TEI (Text coding Initiative), could be exported to be exploited in onward tools. Technically this is no problem, but we cannot support all formats of all OH projects. The portal builders have indicated that in the short term they will look at potentially interesting export formats and will add these to the current export formats.
In general, the participants were satisfied with the opportunities presented by the OH portal. Everyone could, after some help with converting the sound files, process their files, correct the automatic transcription manually, re-align it and download the final results. The fact that the load of the services was too high due to the simultaneous use of 20+ participants, which caused the systems to fail, was actually the only thing that went wrong during the hands-on session. For the builders of the portal however, it was a useful wake-up call ?.
In the coming months the scaling problem will be solved and several other recognizers (both commercial and non-commercial) will be added. During the CLARIN conference days in Pisa we will see which other CLARIN participants have a recognizer available and would like to participate in this OH portal.
Finally, given that participants at Munich were testing other stand-alone CLARIN and non CLAIN speech processing and analysis tools, the idea of extending the idea of a TChain to an “AChain“ (annotation and analysis) might be useful, thereby offering a more seamless journey from audio recording to an annotated (knowledge-rich) interview.
At the moment Kaldi is the most popular platform for Deep Neural Network (DNN) based speech recognition. The Dutch and English recognizers are already working with Kaldi and both in Germany and Italy scholars are working on a Kaldi-based recognizer for their own language. Because it would be a shame to invent the wheel several times, it was agreed to investigate to what extent we can join forces and work together on Kaldi-based recognizers.
Arjan van Hessen
Oral History under scrutiny in München (Stef Scagliola & Louise Corti)
Oral History under scrutiny in München
Cross disciplinary overtures between linguists, historians and social scientists
Stef Scagliola, Louise Corti.
Heading to München at the end September offers the spectacle of cheerful Germans wearing dirndls and lederhosen, celebrating their Oktoberfest with remarkable patriotism and tons of good beer. This year, the Bavarian stronghold hosted another cheerful gathering of a dedicated community: a CLARIN multidisciplinary workshop in which scholars in the fields of speech technology, social sciences, human computer interaction, oral history and linguistics engaged with each others’ methods and digital tools. The idea is that as the use of language and speech is a common practice in all these scholarly fields, the use of a digital tool that is already mainstream in a parallel discipline, could open up new perspectives and approaches for searching, finding, selecting, processing and interpreting data.
This was the fourth workshop supported by CLARIN ERIC, a European Research Infrastructure Consortium for Language Resources and Technology, offering a digital infrastructure that gives access to text and speech corpora and language technology tools for humanity scholars. One of CLARIN’s objectives is to reach out to social science and humanities scholars in order to assess how the CLARIN assets can be taken up by other disciplines than (computational) linguistics and language technology.
At the first two workshops in Oxford (2016) and Utrecht (2016), we assessed what the potential could be of bringing together state of the art speech technology, descriptive and analytical tools for linguistic analysis and oral history data, to open up massive amounts of interview data and analyse them in new, often unexpected ways. Also the website oralhistory.eu was set up to cross-disciplinary communicate work from this group. In Arezzo in 2017, the first challenge was taken up, applying speech recognition software to Italian, German, Dutch and English oral history data and evaluating the experiences of scholars. The reasons why CLARIN can make a difference in the world of oral history is explained in a series of short multilingual videoclips with speech technologist Henk van de Heuvel, linguist Silvia Calamai, and data curator Louise Corti.
Arezzo yielded a roadmap for the development of a Transcription Chain (T-Chain), in which various open source tools are combined to support transcription and alignment of audio and text in various languages. In München we had the opportunity for ‘the proof of the pudding’, testing the prototype of the T-Chain, known as the OH Portal, with data that had been pre-selected and prepared by the workshop organisers and sessions leaders.
In our München workshop, we devoted 2 days to experimenting with 5 tools, building on the homework the participants were asked to do, i.e. install software and become familiar with it. The tools ranged from annotation of digital sources (ELAN and NVivo) to linguistic identification and information extraction. These were applied to text and audio-visual sources, with the intent of detecting language and speech features by looking at concordances and correlations, processing syntactic tree structures, searching for named entities, and applying emotion recognition (VOYANT, Stanford NLPCore, TXM and Praat). Some participants struggled to download software which suggested a lack of basic technical proficiency. This could turn out to be a significant barrier to the use of open source tools, as many of them require a bit more familiarity with, say, laptop operating systems. It was useful to have language technologists sitting amongst the scholars, witnessing first-hand some of the really basic challenges in getting started..
Sessions were conducted in four language groups (Dutch, English, German and Italian) and comprised 5-6 people (linguists, oral historians, social scientists and digital humanities scholars); a formal group evaluation followed each session. Their feedback suggested an overall positive experience. However, some of the approaches, for example, language features identified through concordances, such as use of particular and co-occurrent words and multi-word expressions in an interview, were very new to some of the scholars. Due to unfamiliar terminology and the unknown/unusual methodology of linguistic research, some people initially really struggled to comprehend how they worked and what their purpose was. However, we also witnessed some pleasing ‘Eureka’ moments and ‘Aha-Erlebnisse’ where scholars appreciated how (much) such analytic tools might help complement their own approaches to working with OH data, enabling them to elucidate features of spoken language, in addition to content.
In Arezzo, one key takeaway message was to keep on developing the OH portal, and to keep it as simple as possible by making no or just a few demands on the audio input, having clear instructions and using as little technical jargon as possible. In autumn 2017, the team of Christoph Draxler at the LMU in München started to build the first version of the OH portal. Version 1.0.0 of the portal was presented to participants of the September 2018 workshop in München.
The overall assessment was that the portal met what was required: it is easy to use, the different steps are clear and the final results/outputs are easy to download.
Hiccups: Scalability and Conversion
The biggest problem encountered in the München workshop was the scalability: the computers of the LMU could not handle 25 simultaneous requests to process an audio-file. The problem was solved overnight by the team of Christoph, but scalability is certainly something to work on in the next version. Moreover, it also would be very welcomed if the portal could give the users an estimation of the waiting time, or the certainty that the T-Chain is actually processing the request, and is not stuck because of an error or a crash. It is this uncertainty that can strongly discourage the uptake of such technology. Another issue that was a challenge to the participants, was extracting the audio from video interviews, which are increasingly becoming mainstream, and/ or converting the huge variety of formats (e.g. *.wma or *.mp3) into the prescribed *.wav format. For the time being this the only format that is supported by the T-Chain. See for a detailed blog by Arjan van Hessen and Christoph Draxler on the evaluation of the OH Portal in München.
Landscape of disciplines
During the workshop’s introduction, participants from a variety of disciplines provided insights into how oral histories are approached and analysed in their respective disciplines. Perhaps not surprisingly, every discipline consists of distinct sub-disciplines that use different approaches and often refute the usefulness of, or are ignorant about, each other’s methods and tools. In fact, talking about ‘linguistics’ is a simplification, just as the term ‘oral history’ is an aggregation of a huge variety of approaches to interpreting interviews on people’s personal past. For instance, whereas most oral historians will approach an oral history interview as an intersubjective account of a past experience, some historians might wish to approach the same source as a factual testimony of an event. A social scientist may want to compare differences in recounting the past across the study’s interviewees. These approaches represent distinct analytical frameworks and may require different analytic tools. To illustrate this variety of landscapes within even one single discipline, we had invited, in advance, workshop participants to provide a couple of typical ‘research trajectories’ that reflected their own approach(es) to working with oral history data. A high-level simplified journey of an oral historian’s work with data looks something like this:
During the workshop, leaders of the four sessions covering data annotation, analysis and interpretation, were also invited to provide a brief sketch and characterization of the different approaches: a parade of disciplinary landscapes.
Presentation (CLARIN-OH_Munich18_Session0_Introduction.pdf)These yielded many insights into how specific practices are the same, yet have been assigned different names over time, or how the same term may signify different aspects in a different discipline. For instance, social scientific and historical approaches are actually quite similar, but reflection on analytic frameworks (i.e. content analysis, discourse analysis, narrative analysis) is rather weak in the oral historians’ methodologies, where oral history is first and foremost seen as an interviewing method. With these disciplinary overviews and insights in mind, we set out to explore whether or not the same annotation, linguistic and emotion recognition tools can cater to the needs of historians, social scientists and linguists in the same way. Examples of their typical work flows are shown below, and see their joint
|Three screenshots of xxxx|
Researcher Annotation Tools
Annotation tools are familiar to linguists, oral historians and social scientists alike, but the way these tools are used and the terminology to describe what is being done varies considerably. Participants were given the opportunity to work with two different annotation tools: NVivo, a proprietary software designed with social scientists in mind, and ELAN, an open source tool favoured by linguists. While the two tools had a similar concept and objective, the vastly different terminology and user interface meant that users had to spend additional time acquainting themselves to the tool’s unique layout before being able to annotate.
NVivo allowed participants to upload and group (code) data sources and mark-up text and images with “nodes” and memos. This tool worked particularly well with written transcripts, and allowed users to visually see mark-up and notes in the context of a transcript. Being able to collate all documents related to a single research project proved to be a clear benefit of the tool, with one user commenting that ELAN had a much more visual display and worked solely with audio and video data sources.
ELAN allowed users to create “tiers” of annotation, differentiating types of tiers and specifying “parent tiers”. The ability to annotate the audio allowed users to engage with all aspects of an oral history interview from as early as the point of recording the data. Overall, the familiarity of annotation across disciplines made these tools more accessible to participants and allowed for easy cross-discipline collaboration. However, participants were reluctant to take the time to learn a new controlled vocabulary for each tool, and were unlikely to vary and be distracted from the tools they already knew. While the learning curve for annotation tools isn’t steep, CLARIN tools could be developed to ensure a uniformity of language and terminology for features, so the unique way of annotating within each tool becomes the focus, rather than a wholly unfamiliar terminology.
A user quote on ELAN, “I would use this for an exploratory analysis of my oral history data.”
A user quote on NVivo: “It makes such a difference to be able to analyze all of your transcripts and AV-data in one single environment.”
On-the-fly linguistic tools (no pre-processing)
After a short introduction to different types of linguistic tools, for example lemmatizers, syntactic parsers, named entity recognizers, auto-summarizers, tools for detecting concordances/n-grams and semantic correlations, the open source online tools Voyant and Stanford CoreNLP were used to give an illustration of their possible uses within the research area of oral histories and social sciences.
Whereas the introduction was very much welcomed to gain insight in the generic linguistic tools and their shortcomings/opportunities, the free tools were met with some varied reactions. While many saw the advantages of using linguistic features, the limited functionality of such free tools was a barrier to their use. An example is limiting the amount of text than can be analysed. If the opportunities for use of these tools by non-linguists can be better defined, then CLARIN tools can be developed to meet these more basic needs.
Sociolinguists may take advantages from the use of Voyant: word frequency analysis may be rather interesting in oral history data, observed with the lenses of a sociolinguist. Although word frequency appears to be a rather controversial topic in linguistics, it is widely accepted that frequent words may influence phonetic change, and, secondly, frequent words may act as ‘locus of style’ for a given speaker. At the same time, it seemed that Voyant was not sophisticated enough to process uncleaned transcriptions.
A user quote on Voyant “I’d like the tool to be more transparent about how it generates a word cloud.”
Linguistic tools with pre-processing
The range of tools for supporting the identification and mark-up of linguistic features vary in their complexity and ease of use. The learning curve for those unfamiliar with the technique was found to be very high. TXM is an example of a ‘textometry’ tool that requires cleaned and partially processed data, necessitating some input before it can be used; much the same as many other tools that require structured input, such as XML. For using TXM, speakers need to be split, and noncompliant signs and symbols taken out, so that more accurate results can be gained. In the case of the 10 interviews about ‘Black Immigrants’ coming to the UK from the Caribbean from 1950-70s, the outcomes from TXM offer insights through features that can help with identifying specific features of the interview process such as: the relation between words expressed by interviewer and interviewee, the difference in active and passive use of verbs between gender, age or profession, or the specificity of certain words for a respondent. However, the methodological challenge is how to translate these insights into the paradigm the oral historian usually uses: how does this person attribute meaning to his or her past? In some ways, this might require the scholar to move away from the specific individual features in the creation of meaning – what is this person trying to tell me – to observing features that point to patterns in a corpus – of all the interviews this vocabulary seems to stands out. This requires a widening of methodological perspective in data analysis.
User quote on TXM “A bit of a struggle at first, but this helps you to do a close reading of an interview, and I think it fits perfectly within my traditional hermeneutical approach”
Emotion recognition tools
One of the most surprising dimensions of analysing a dialogue between interviewer and interviewee, was offered by Computer Scientist, Khiet Truong, who demonstrated how computer scientists measure emotions.
She started off with a simple cartoon. Our immediate observation varied; are they singing, arguing, or laughing? It is easy to make assumptions, yet these can seriously colour our interpretation. In a similar way, when we read an oral history, but do not listen to it, we are missing out on emotions that may underpin the conversation.
Indeed, a social signal (or emotion) can be a complex installation of behavioural cues. Studying social sign processing opens up the option of re-interpreting an interview, by reflecting on the function of the silence, or tone and whether they occur as a generic or a specific feature of communication within a corpus /collection of interviews. Once again, the tool, Praat has a high learning curve for those unfamiliar with speech technology and linguistics.
The introduction to disciplinary approaches and their analytic tools, plus the hands-on tools’ sessions were welcomed by participants. While the oral historians and social scientists saw some possibilities in using linguistic features, both the limited functionality of the free easy-to-use tools and the complexity and jargon-laden nature of the dedicated downloadable (and sometimes technologically challenging) tools were both seen as significant barriers to use; certainly in everyday research practice. This caused some frustration. Even in the process of selecting tools to be showcased and tested for the workshop, we, as organizers, encountered significant barriers in their selection. Many of the tools on the CLARIN site were not suitable for introduction due to their explicit lack of information on: their state of development; technical skills needed to download them; and even what they are for, explained in lay terms. We had to do a lot of work in preparing an additional simplified ‘layer of information’ on top of tools to make a hands-on workshop session.
This opens up a challenge for the CLARIN community for expanding the reach of the tools:
If we want CLARIN tools used by more disciplines, for example, those that work with oral history data, how can we dejargonise and break down some of these barriers to encourage new users? And, how can we present user-friendly tools that do not require a technologist to help install them?
If the opportunities for use of these tools by non-linguists can be better defined, then CLARIN tools can be both developed and explained to meet these more introductory needs. A new simplified ‘layer of information’ would be beneficial for tools. What does the tool do? What are key features? What are the input requirements (XML etc.); How does one access them and what are any technical requirements (Windows, Mac, Linux, versions of operating systems and browsers supported), links to simple documentation, and in what state of development they are? (A software maturity approach might be useful here). Once a user becomes ‘converted’ then they move into the realms of being a regular user!
Why not offer short and inspiring use cases of oral history processed with CLARIN tools?
The final point to make concerns ‘data’. We need sources that are well-documented and have rich-enough metadata. We also need to put in place a legal framework for processing data, so that options for conditions are clearly stated, and documented, and a user will know what will happen to a source once it is uploaded (deleted and so on). We propose that the CLARIN and CESSDA legal groups could work towards a standard GDPR-compliant agreement for use of tools that work with (potentially) personal data.
We are delighted about the positive energy created during the workshop and note the value of the coming together of a multi-disciplinary team of workshop organisers, who had to step out of their own disciplinary comfort zones to design and run a workshop. This was not a quick process and it took months of meeting weekly to define and finalise this successful event. We really want to keep alive this momentum and rich dynamic for our Technology and Tools for Oral History initiative. We have a poster at the Bazaar at the forthcoming Pisa 2018 CLARIN Conference, and a number of meetings and training events will follow where a linguistic approach and language technology tools are introduced to social science, social history and oral history scholars. The workshop feedback was excellent, and we look forward to a further blog that uncovers user experiences and perceptions based on evaluation of the workshop.
Some final quotes from participants
On interdisciplinarity: ‘I have listened to some of the recordings, I have read your article, I know what your intention was, you have read my article, you have thought of what you might find interesting. Tell me what you would want, and then we can figure out together whether this makes sense. Let’s write an article about this interview and see how we can understand and try to embrace the legitimacy of each other’s approach, in terms of knowledge production”
On appreciating new approaches: “I learned about tools that I didn’t know existed, that do things I didn’t know could be done, that answer questions that I hadn’t even thought about asking and that I had no awareness that I might be interested in.” Joel Morley
We would like to thank our workshop organiser and sessions lead colleagues for contributing to this blog text:
Arjan van Hessen, Norah Karrouche, Jeannine Beeken, Maureen Haaker, Max Broekhuisen and Christoph Draxler.
Moreover, we are very grateful to CLARIN ERIC for the opportunity to hold this workshop (and the previous ones).