IPRS Meetings & Diamesic Conference of the 54th Congress

54th Intersteno Congress, July 13th – 18th, 2024

Live subtitling done by:

Diamesic 7 Conference – Monday, July 15th, 2024

Session on shorthand and research

Shorthand in journalism: a dying art?

Andrew Hill – Journalist at Financial Times

Andrew Hill is a senior business writer at the Financial Times. He has been using shorthand daily, since the beginning of his journalistic career. He proposes reasons to be optimistic and pessimistic about the future of shorthand in journalism.

The National Council for the Training of Journalists (NCTJ) considers shorthand a fundamental skill for journalists. Shorthand was a compulsory part of NCTJ exams but has become an option since 2016, recognizing that there are many other skills, from data journalism to podcasting, that trainee reporters might find more useful to acquire. Interest in mastering the skill is declining, considering the time it takes to make shorthand writing useful, speedy, accurate, or even readable.

In his articles, Hill won’t put into quotations anything that he hasn’t heard and recorded in his shorthand notes or recordings, or he will use indirect speech since the making up of quotations is journalistic malpractice. Legal requirements are helping to keep the art of shorthand alive. Judges consider steno notes evidence in liable and defamation cases. Shorthand has another important function in the courtroom. Since it is forbidden to take photographs, film or make sound recordings in court buildings, stenography remains indispensable for reporters covering court cases.

With the help of artificial intelligence and the combination of voice recording, voice recognition, and automated transcription, technology is becoming more accurate and efficient. For Hill, identifying key quotes from a short interview is still easier from notes than from a long transcript of a recording. The level of concentration required to bring the spoken word via shorthand to the notebook page helps, according to Hill, “to cement the key points of any interview”. For Hill, short still plays a vital role in the construction of his articles.

The Presidency minutes services in Turkey

Kadriye Aktay – Head of the Turkish Parliamentary Reporting Office

Since the Ottoman Empire, and later on, starting with the Republic of Türkiye in 1923 and into the present, official minutes have been made. So the minute’s service history reaches back some 150 years and is a source of official information for researchers and reporters.

Currently, the minutes service employs around 72 stenographers, 27 expert stenographers, and 2 stenographers assigned to committee reports. In general, they work 8 minutes each during a session, taking their turn in the last 2 minutes. They transcribe their turns with the help of their steno notes and the audio recording. Expert stenographers check and validate these turns in chunks of 20 minutes. Guidelines used are the official Turkish language and interpunction rules. After a last check, the reports are sent to the different speakers and published on the internet.

In recent years technological development, audio recording, computers, and artificial intelligence were taken into account. For instance, a project started to create a single platform, a Minutes Information System, including a Minutes Management System. This platform covers all work, including producing, correcting, publishing, and printing reports. A speech recognition system created with the help of the scientific institution TÜBITAK has been in use for a year now. It also includes speaker recognition, which prevents confusion and enables stenographers to work efficiently.

A Stenographer Information System helps with the recruitment and training of new stenographers. Stenographer training can take up to 3 years. Recruits can only apply when under 30, because “the active and dynamic Turkish Parliament needs active and dynamic stenographers”.

Shorthand for the first nations and other indigenous peoples

Boris Neubauer – Professor at FH Aachen

Shorthand started in ancient Greece and Rome and has later been re-invented and further developed in many countries all over the world. Often, a national shorthand system was created. In general, no independent systems were invented for languages of remote countries, minority languages, and languages of indigenous peoples, but there are some exceptions.

Of course, indigenous peoples also developed their own, sometimes very complex, writing systems. To simplify these writing systems, the first attempt was to use Latin-based orthography. When this didn’t work out, shorthand inventors traveling abroad, adapted systems like Pitman or Gabelsberg to the language of the host country, such as Maori, Dinka, or Cherokee.

Neubauer collected examples of these shorthand-inspired writing systems, for instance, the system of James Evans. Evans, a missionary, was familiar with the Pitman shorthand system and invented a syllabary for the Ojibwe language in 1840. A syllabary is a set of written characters representing syllables, serving the purpose of an alphabet. 20 years later, Evans adapted this syllabary also to the Cree language, which was so successful that the whole Cree community became literate. Other missionaries later modified his system for the languages of the Blackfoot, Dene, Inuktitut, Naskapi, and other indigenous nations.

Text alternatives to video recordings in European parliaments

Eero Voutilainen and Riikka Kuronen – Reporters at the Finnish Parliament

The European Union requires public organizations to offer a text alternative to all audio and video on websites and apps (web accessibility directive (2016/2102)). Eero Voutilainen and Riikka Kuronen researched the question: how have parliaments in Europe implemented this directive?

Their conclusions:

European parliaments offer several types of text alternatives, the most common being the verbatim report of plenary debates. In addition, subtitles, live captioning and sometimes summaries or press notes are offered as a text alternative.
Various techniques are used to offer text alternatives: shorthand, typing with a normal keyboard, automatic speech recognition, respeaking, and automatic subtitling.
Automated captions are a rising trend. 3 out of 21 parliaments use ASR (automatic speech recognition) regularly or occasionally. Other parliaments carry out tests for different applications. Quality is a source of concern. Some parliaments solve this by showing a disclaimer.
There are differences in organization, schedules, and criteria. In general, parliamentary staff produces verbatim reports. Freelancers are often hired to provide other text alternatives, such as subtitles.
There is usually not much feedback from the public on the given text alternatives. The most common criticism concerns the lack of subtitles during a debate. Too much summarizing or editing in live subtitles can also be a source of criticism.

Visual attention during speech-to-text interpreting: an eye-tracking study

Julia Matzenberger – Graduate at the University of Vienna

Julia Matzenberger gave a presentation on her master’s thesis, an eye-tracking study during speech-to-text interpreting. In daily life, she is a certified speech-to-text interpreter herself. In her thesis, Matzenberger examines where the gaze focuses during speech-to-text interpreting using the speech recognition method. During interpreting, eye tracking can help explain how one understands the cognitive load while processing in real-time, with different attention patterns. Interpreters need to continuously oversee and adjust the textual output.

Eye-tracking studies can provide valuable insights into the challenges of speech-to-text interpreting. How does gaze direction change during monitoring? What decision-making processes arise? Matzenberger used the speech recognition method because there always are areas you need to be aware of, especially if you want to say a specific word but another word appears totally out of context, causing a stressful situation. What are the eye movements and which decisions can be made in such a short time?

The thesis is based on one participant, who had to do a simulated on-site interpreting assignment, intralingually, into German via speech recognition, using Dragon NaturallySpeaking software. The eye-tracking model was the EyeLink Portable Duo. The participant followed a presentation of the Linguistics degree program at the University of Hamburg. She didn’t get any preparation material in advance, because the examiner wanted to cause typical stressful situations. The participant had her device with her output text and she had to follow the speaker. The third element was the PowerPoint presentation used by the speaker.

The key finding of the study is that the participant’s eyes focused on the text-output screen for about half of the time of the presentation. One-third of the time, the gaze was directed to the PowerPoint presentation. Matzenberger showed the audience a very interesting eye-tracking example of the participant trying to change the term “in prospektiven” into the word “introspektiv”. The gazing behavior existed of 19 glances during only 5 seconds to get the correct output.

Chinese stenography Training – Two focuses on court reporting and coaching technology

Zhao Weike – Vice President of Jiangxi Justice Police Vocational College, and He Danqing – Associate professor at Mianyang Polytechnic

The Jiangxi Justice Police Vocational College offers 21 majors in the fields of justice, police, and law. Judicial secretaries, who function as assistants to judges and lawyers, receive their training there. They play an important auxiliary role in judicial work, and function as court clerks, litigators, legal translators, and technicians.

Court clerks are responsible for recording relevant information during the trial process, and drafting court documents and rulings, to ensure the completeness and accuracy of case material. Knowledge of shorthand is essential in this process. About 38% of judges and prosecutors do not have clerks. There is a shortage of about 80.000 clerks.

The college put great effort into thinking about which skills to cultivate for judicial talents. The school determined training objectives and specifications. The college based their renewed curriculum for stenography training on scientific insights, talent demand analysis and the needs of institutions and enterprises in the legal field. The college constructed teaching models developed teaching resources, and created practical teaching scenarios.

Through research and practice on improving the stenography skills of judicial secretaries, the college has achieved significant results. The stenographic module the school offers is regarded as an innovative model for colleges and universities in the Jiangxi province. In the national vocational college stenography competition, their students have won more than 16 medals. More than 200 students have obtained certificates from the National Ministry of Education.

The enactment of the Stenographer Act in Korea

Cho, Jung-koo – President of the Korean Stenographers Association

The Korean Stenographer Association (KSA) was founded in 1955, representing Korea internationally. It joined Instersteno in 1983. The KSA is improving the rights and interests of shorthand practitioners and developing the shorthand field. Strong institutional support is needed for the education and training of the stenographers.

Currently, the demand for stenographic services is increasing in Korea, but there are concerns that the market will shrink, because of new developments in the field of voice recognition (VR) systems equipped with AI. Although they can interpret and translate, such systems cannot record non-verbal parts. Therefore, it is indispensable that a stenographer is in charge of tasks such as correction, inspection, management, and certification.

The KSA needs a strong status to be able to represent the stenographers in the new environment, so it is necessary to enact a separate law to convert the current KSA into the Korean Stenographer Society. This Society should improve the quality of stenographic services through research and development and through expanding exchanges with other countries.

A huge financial investment is required to develop a response to the new environment of stenographic services. For this reason, the enactment of the Stenographer Act in Korea is essential. To ensure a smooth passing of this law, the Korean stenographers need the active support and encouragement of Intersteno!

Diamesic 7 Conference – Tuesday, July 16th, 2024

Session on workflows and accessibility

Taking stock of AI in 2024: suggestions and facts

Henk-Jan Eras and Deru Schelhaas, Parliamentary Reporting Office of the House of Representatives in the Netherlands

Deepfake pictures and videos have real consequences and form a real threat. However alarming reports in the media of an information apocalypse of disinformation in 2024, with some 140 elections the world around, do not ring true. Sure there is misinformation and false information, but the impact of political deepfakes is not disruptive. A study in 2024 found that 98% of deepfake videos online are not political but pornographic. Also, deepfakes have two major problems: they are of poor quality and largely implausible.

The AI Act of the European Union in 2024 values deepfakes as having a limited or low risk. Deepfakes thus are not banned. Makers however must mark their deepfakes as such, which not surprisingly they do not. The Edelman Trust Barometer shows that as AI innovation speeds up, trust in AI Companies declines with even tech employees being skeptical.

Using official dispensation to disregard the AI ban for civil servants in the Netherlands, Deru Schelhaas has been experimenting with ChatGPT 3.5. The goal was to establish if AI could be of any help with the production of so-called short web reports. These are labor-intensive short summaries of sessions in newspaper style and require special training. In the experiment, ChatGPT was fed the official report with prompts like: “summarize this text in 150 words” and asked to compare the ChatGPT result with the human-made summary. ChatGPT values the AI summary as more detailed, complete, formal, and less detailed, and the human-made summary as more brief, concise, emotional, and dramatic.

Later in 2024, the PRO will anew start, with the aforementioned dispensation, experiments with the Open AI tool Whisper. Whisper has major advantages: high recognition accuracy, removes repeated and filler words, corrects sentences, fast delivery, and offers a cheap subscription. Additionally, an ECPRD request shows that a quarter of the parliaments using AI for parliamentary reporting mention using Whisper.

Harnessing Whisper: a user-driven approach to AI-supported parliamentary reporting

Dan Kerr – Manager at British Columbia’s Parliament

Dan Kerr works as a manager of publishing systems at the Hansard Office of British Columbia, Canada. He gave a presentation on the support of AI in parliamentary reporting. Before, editors received closed captioning as a typing supplement. Although it functioned quite well, reporters found captioning less suitable, because it can summarize, skip segments, or there can be blank spaces.

OpenAI released Whisper, its automatic speech recognition (ASR) system at the end of November 2022. The Hansard Office had to admit that it was significantly better than what they worked with. Whisper gets the best results, with a word error rate of 8,81%. Only computer-assisted human reporting is better (7,61%), as Kerr showed during his presentation.

The Hansard Office wanted to figure out how to make Whisper part of their workflow. A critical point for Kerr was that Whisper was released as an open-source system. Some of the important objectives were to make a low-cost system that would provide a high-quality supplement for employees, which would reduce strain on staff, both physical and mental. Further, the system should be under the control of Kerr and his staff, combined with an ethical, user-driven implementation.

Eight months ago Kerr and his staff replaced their vendor-supplied ASR system with an internally-developed implementation of Whisper, called Parrot. The system was extremely well received by the team of reporters. Kerr: “Even our most senior editors, with a very long career with the official report, have embraced it and told me they would find it very difficult to deal without.” At the end of the presentation, Kerr demonstrated Parrot reporting a short piece of debate.

Generative AI, video reports and podcasts: how to offer debates in accessible language levels

Anneke Faaij-Nulle – Team manager at the Dutch Parliamentary Reporting Office

The Parliamentary Reporting Office of the Dutch Parliament aims at making the debates of the House of Representatives transparent. Traditionally, this department transcribes reports of the meetings, but to make the debate accessible to all citizens, other activities have been taken up. For example, the PRO makes it possible to watch videos of debates on the Debate Direct platform. Also, PRO-reporters write summary reports (which are published on the House of Representatives website) and provide (live) subtitles to the videos of debates.

The next step in accessibility could be to address the language level that is often used in debates. 70% of the Dutch population do not understand what is being said in parliamentary debates, because the language level of the debaters is too complicated for them.

How can a department that wants to make the debate transparent for all citizens deal with this? Technologically, there are already AI possibilities to display complex text at an accessible language level. However, is it legally allowed and safe to use generative AI in parliamentary data?

To tackle the issue of complex language, the PRO has done some pilots in video and audio. PRO reporters have been practicing making short video reports and podcasts with easily accessible language. Preliminary research among users of the video report shows that this product is appreciated. This is a good motivation to further explore possibilities in making debates accessible.

Access for people with intellectual disabilities to the deliberative process – the iDEM project

Carlo Eugeni – Associate professor at the University of Leeds

The iDEM is a Horizon-UKRI program funded by the EU and the UK. This project aims to create access to the deliberative process in a democracy through easy language for people with reading difficulties. Easy Language is simplified written or spoken information, which is easy to understand. In the European Union, there are different levels of language. A very easy language is called A1, easier language A2 en easy language B1.

It is important to understand the linguistic barriers and to implement NLP and AI tools to simplify difficult texts for people with reading difficulties to empower them to participate in democracy. People can for instance have reading difficulties as a result of dyslexia, intellectual disabilities, cognitive disabilities, temporary difficulties, or contextual difficulties. The latter can be like foreigners, people with low literacy, or people simply not understanding a given topic.

At the moment, iDEM is doing a pilot with an app, which converts difficult language to texts of level B1, A2, or A1. The pilot of iDEM contains the languages Catalan, Italian, and Spanish and is tested in Bologna, Barcelona, and Madrid in different deliberative processes by people with reading and/or writing difficulties, cognitive disabilities, and migrants. Of course, the program considers ethical principles and data protection restrictions, and it will also evaluate the solutions with the users.

A multi-media portal for meetings of Japanese local assemblies based on ASR technology

Tatsuya Kawahara – Professor at Kyoto University

In Japan, the National Parliament provides video streaming and an archive for all meetings. Official transcripts are produced with the help of ASR (automatic speech recognition) since 2011. ASR performance improved significantly with the introduction of deep learning. The word error rate has been decreased from 10% to 5%. However, in Parliament the video and text are not aligned, so no captions are provided with the video.

The multi-media portal for local assemblies does provide subtitles. There are about 1.000 local assemblies in Japan, such as prefectures and city councils. About 350 have their own YouTube channels. This number increases because the use of YouTube is the most economical. Text and video can be aligned with the help of ASR. A recording of the speech is provided to the ASR system. The output is post-edited by reporters to make an official transcript.

The ASR transcript provides a time stamp for every word. This makes it possible to align text and video, in other words: to provide captions. The system also allows for keyword search, video browsing, and socio-political analysis of issues in local regions. The ASR transcript can be used before the official transcript is available. Professor Kawahara and his team set up a portal site for local assemblies based on the aforementioned technologies. This website collects the URLs of the YouTube channels of local assemblies, to generate automated captions.

The transcript of dialects remains a challenge for both humans and artificial intelligence. There are many variations of standard Japanese on the lexical level. What to do when the system doesn’t recognize dialect? The guidelines on how to handle this can differ between local assemblies and the National Parliament.

Verbatim vs. Sensatim live parliamentary subtitling and live editing

Alice Pagano – Post-doc researcher at the University of Trieste

Pagano presents her research on live subtitled parliamentary sessions, more specifically a pilot of one session of the legislative body of the Rome City Council. In general, all sessions are made accessible to the deaf and hard of hearing by hand-signing and live subtitling. The latter with the use of a platform of the supplier PerVoice/Almawave. Two modes of live subtitling are being used, respeaking and ASR. Both modes allow for live editing. These live editors add interpunction, correct mistakes, and so on. In the pilot, the use of both modes was quite balanced. Generally speaking, the respeaking mode is considered more sensatim, and the ASR mode is more verbatim.

Respoken subtitles are more edited and more condensed. The community of the deaf and hard of hearing sometimes considers this as a form of censorship, because pieces of information are lost. For instance, orality features, self-corrections, mumbling, false starts, hesitations, and parenthetical elements can be left out. This makes the information more accessible, for instance by showing strategically condensing titles, i.e. only one instead of two lines of titles.

ASR produces verbatim subtitles that work well but do not filter selfcorrections, mumbling et cetera. This technique also has problems with recognizing overlapping voices or crosstalk. It therefore needs editing, otherwise, it just produces a flow of words without punctuation, an audio transcript of the original speech, with warts and all.

In general, ASR-produced texts can be characterized as verbatim, with complex syntax unvaried, containing repetitions, lack of punctuation, typos, and small word addition. The respoken-produced texts are of a different character: sensatim, shorter sentences, less hypotaxis, no useless features of orality, and fewer typos.

Deaf and hearing impaired, the end users, sometimes choose verbatim above sensatim titles. After all, a user is not 100% sure what is being said if some information is left out. Producing good subtitles is always a matter of balance. The message of a speaker has to be conveyed and must not change when translated into subtitles.