The use of innovative technologies in speech-language pathology is revolutionizing diagnostic and treatment approaches for individuals with communication disorders. This evolution has required educators to integrate the use of technologies into the clinical training pedagogy. Phonetic transcription is a foundational skill presented early in the undergraduate speech-pathology curriculum and serves as the basis for advanced course work in clinical diagnostic decision-making. Mastery requires regular practice and performance feedback. One factor that impedes the provision of more practice opportunities is the widely agreed-upon problem of grading phonetic transcription assignments by hand. The development of a computational tool that automatically grades transcription assignments served as the mechanism for an integrated learning opportunity between the departments of Communication Disorders and Computer Science and Software Engineering at Auburn University.
The use of innovative technologies for clinical practice in speech-language pathology is revolutionizing practices for diagnosis and treatment of communication-related disorders across the lifespan. This evolution has also required educators to integrate the use of technologies into the clinical training pedagogy. One such area is in the teaching of phonetic transcription (Abel et al., 2016; Mompeán, Ashby, and Fraser, 2011; Sullivan and Czigler, 2002; Titterington, and Bates, 2018; Vassière, 2003 Verhoeven and Davey, 2007). Phonetic transcription allows speech-language pathologists (SLPs) to (1) create a visual representation of the status of speech production skills and (2) to interpret the coded speech in order to make diagnostic decisions for individuals at risk for communication disorders.
Phonetic transcription is a foundational skill presented early in the undergraduate communication disorders curriculum (Howard and Heselwood, 2002; Randolph, 2015). Students of communication disorders must become experts in phonetic transcription, which involves capturing the sounds of speech in written form in order to create a transcript that represents how words were produced by an individual speaker (Knight, 2010). This written phonetic transcript is important for continued assessment and clinical diagnostics. However, phonetic transcription requires the development of the ability to categorize speech sounds perceptually into phonemic categories and to write what was perceived using the International Phonetic Alphabet (IPA) coding system (Howard and Heselwood, 2002, Ladefoged, 1990). The IPA coding system contains over 100 symbols representing consonants, vowels, diacritics, accents, and suprasegmentals. This is a substantial number of symbols to become familiar with, learn to identify, and use, within a single course. As in other scientific disciplines such as chemistry and computer science, a universal code allows for the standardization of the documentation, analysis, and interpretation of the code by specialists in the field, and just as the periodic table or JAVA Code may seem at first to be a foreign language to novices, the International Phonetic Alphabet (IPA) presents as a new language as well (Müller and Papakyritsis, 2011). Many students find this written code to be challenging, as it requires a cognitive shift from the standard written alphabetic code system to a perceptual system that captures the contrastive distinctions between the sounds in language (Knight, 2011). For example, although the words ‘coat’ and ‘king’ start with different letters in the standard written alphabet, phonetically, there is no distinction, and so the IPA characters are the same (‘k’). Similarly, a single alphabetic character, such as the ‘s’ in ‘sing’ and ‘has,’ may be represented by different IPA characters (‘s’ and ‘z’, respectively, in the previous example). In some cases, such as the words ‘ball’ and ‘light,’ the IPA characters have to be further notated with additional symbols (diacritic [ł] versus phoneme /l/, respectively) that describe the variation in how these two same sounds are produced in different places in the mouth although they are the same sound. This challenge is compounded as phonetic transcription tasks increase in complexity from individual sounds to full words and sentences. Advanced skills are required to transcribe using diacritics.
Students who want to become speech pathologists typically receive one semester of instruction in phonetics; however, recent attention has been drawn to whether this provides students with enough opportunities for learning (Randolph, 2015). Recent evidence supports the idea that additional opportunities for practice may positively affect student success (Hillenbrand, 2014; Hillenbrand, Gayvert, and Clark, 2015). Conversely, “the less experience students have in conducting phonetic transcriptions, the less apt they are at becoming proficient in this skill” (Randolph, 2015, p. 1). Surveyed practicing clinicians have also expressed the need for additional practice opportunities as students and for meaningful opportunities to extend their training further as practitioners (Knight, Bandali, Woodhead, and Vansadia, 2018).
The Real-World Issue
When learning methods for the transcription of disordered speech, it is beneficial for students to receive regular feedback on their progress and to have opportunities to collaborate with peers to understand the flexibility of speech perception during the transcription process. One factor that limits the provision of such experiences is the widely agreed-upon problem of grading phonetic transcription assignments (Heselwood, 2007). Traditionally, phonetic symbols are taught sequentially in a face-to-face instruction model, the students are assigned phonetic practice assignments on paper, and the assignments are graded later by hand. Students rarely get immediate feedback on transcriptions since grading by hand is time intensive. Additionally, when trying to provide timely feedback to students, it is often difficult for an instructor to get a clear picture of the overall types of mistakes students are frequently making and to utilize this feedback to inform instruction. The teaching of phonetic transcription therefore presents a unique pedagogical opportunity for enhancing student learning with the support of online learning platforms that could automate some of these processes (Titterington and Bates, 2018). The lack of an automated grading model for phonetic transcription assignments presents an important gap in the existing teaching tools. To address this gap, faculty from the Auburn University Department of Communication Disorders proposed the development of a computational tool, the Automated Phonetic Transcription Grading Tool, to automatically compare students’ phonetic transcriptions of speech samples to their instructor’s transcriptions.
Operationalizing and automating the phonetic transcription grading process through the implementation of such a computational tool has many benefits, including (1) decreasing instructor time and effort in grading phonetic transcription accuracy, (2) reducing scoring bias, (3) facilitating learning by providing students with immediate feedback, (4) informing the teaching process by providing data on student performance, and (5) increasing engagement and dynamic learning. Also, the ability to visualize summative class results allows students to see differences between their transcriptions and those of their peers. This visualization can promote discussion about differences in human speech production and perception and replicate real clinical cases where clinicians have differences in perception and clinical decision-making.
Interdisciplinary Learning Model
The development of the Automated Phonetic Transcription Grading Tool (APT-GT), served as a mechanism for an integrated learning opportunity between the departments of Communication Disorders (CMDS) and Computer Science and Software Engineering (CSSE) at Auburn University. Faculty in the CMDS department challenged the CSSE department to create a user-friendly, aesthetically pleasing web-based interface for practice transcription assignments (Norman, 2002), and to implement an algorithm to automatically grade the assignments. An answer to this challenge was the integration of student learning in CSSE and CMDS to inform the design and implementation. This service-learning opportunity allowed students in a User Interface Design course, a software engineering upper-level undergraduate and graduate course, to connect engineering science with the public issue of effective and efficient identification of individuals with communication disorders.
To design the APT-GT, the CSSE team first gathered requirements from the subject matter experts in the field (the CSDS team), then crafted user scenarios for the Student User, Teacher User, and Admin User of the system. The scenarios were captured utilizing Unified Modeling Language (UML) to capture a pictorial description of the system and cataloging roles, actors and their relationships, system interaction, and flow (Booch, Rumbaugh, and Jacobson, 2005; Rumbaugh, Booch, and Jacobson, 1998). Operation Logic was codified through simplified class diagrams to inform the design and describes the structure for the users of the system as illustrated in Figure 1 (Sparks, 1995).
Once the system scenarios were captured, software requirements created, software language identified, and environment identified, the software development team began iteratively developing software to instantiate this software system. The initial development began with the creation of low-fidelity drawings (i.e., paper prototypes) of our vision of the system and the creation of quick wire-frames of the envisioned system (Bailey, 1982; Shneiderman and Plaisant, 2010). In the second stage of prototyping, these images were refined to make them more detailed and to improve aesthetic appeal (Norman, 2002).
One special requirement of the system was the design of the IPA keyboard. Many of the other features that we have developed in the APT-GT system are available in existing course management systems, but one unique aspect was the development of an interactive IPA keyboard. Students typically are required to complete assignments by hand, download special fonts, or copy and paste symbols from websites (Small, 2005, p. 4–5). Students who are initially learning IPA may be additionally encumbered by the need to search for symbols in texts or online. In the design process, key placement and size were considered to reduce the time searching for keys. Multiple versions of the keyboard were implemented to engage students in basic American English broad transcription (“Keyboard 1”), advanced narrow transcription of disordered speech using diacritics (“Keyboard 2”), and a complete set for full IPA implementation for international and multilingual use (“Keyboard 3”). Scaffolding the keyboard complexity was considered in order to reduce confusion for the novice user and build confidence in the task incrementally.
Outcomes of the Integrated Learning Model
Implementation of the software tool was supported by the first and third authors’ articulation and motor speech disorders courses in CMDS. CMDS students collaborated through the participatory design process (Bailey, 1982; Shneiderman and Plaisant, 2010) to aid in the development of the first version of APT-GT. Students (n=67) in undergraduate and graduate course work were used as beta testers to provide ease of use feedback to the student-led design team. Student feedback was used for refinement of the software to meet identified instructional needs. The students were surveyed at the beginning and end of the semesters to determine if the applied computer-supported learning environment with automated performance feedback increased confidence in their mastery of transcription when given additional practice. Students were asked the following: What is your greatest concern in transcribing disordered speech? What do you think you need to learn to be a more confident transcriber? If your level of confidence is different now compared to the beginning of the course, what aspects of the training modules do you think affected your level of confidence? What components of the transcription modules seemed helpful to you in learning phonetic transcription? The data were analyzed qualitatively to understand student sentiment following transcription practice modules. Open-ended responses were collapsed into themes independently by two research analysts. Themes were further collapsed into broad categories agreed upon by the two researchers.
Students’ greatest areas of concern in transcribing disordered speech were in their ability to understand disordered speech (38%), to transcribe accurately (39%), to transcribe speech sounds (20%), to transcribe quickly (1%), and their general lack of experience (1%). To be a more confident transcriber, students expressed the need for increasing their knowledge of the phonetic symbols (39%) and additional opportunities for practice (35%). Levels of confidence were reported to have increased as a result of additional practice opportunities (32%), the variety of speech samples, which included talkers with different disorders (31%), automated feedback (13%), and comparison of peer results (13%). Others commented on the ease of use of the keyboard and the frequent opportunities for practice. When asked which components of the transcription modules were most helpful, students rank-ordered the following items (one being the highest): (1) access to real clinical speech samples, (2) the ability to compare transcriptions with those of classmates, and (3) obtaining automated transcription feedback (see Figure 4). A few (six) students indicated that they did not think the transcription modules increased their confidence, and one student did not feel that they benefited from the modules.
This User Interface Design course helped CSSE students integrate the theory of user interface design by engaging in practical software development projects through a fully elaborated real-world case study. This course model typically gives students a solid understanding of the user interface design process (Wolf, 2012; Holtzblatt and Beyer, 2014; Caristix, 2010). The current learning episode included the following components: gathering of requirements, task analysis, development, testing, and a project presentation of findings from preliminary user evaluations pertaining to the analysis of user satisfaction and system effectiveness. It also gave them real-world experience in teamwork, as they collaborated with a team of four to eight individuals, as well as additional practice in important programming skills.
Through this collaborative and multifaceted effort, we aimed to create a rich learning experience for students in both departments to increase the efficiency of CMDS and CSSE instruction. Students in both classes had opportunities that increased engagement and interaction with science-based applied methodologies for addressing current public health issues. This marriage of computer software engineering and communication disorders learning objectives met two major goals: (1) to provide increased student engagement and (2) to increase applied science by addressing real-world problems. Instructors were able to close the theory-to-practice gap in two different disciplines through interdisciplinary collaboration.
We are currently working on making the learning management system more widely available to allow for testing by faculty at other institutions, particularly within the CSD profession, but also by teachers of linguistics and foreign languages and teachers of English to speakers of other languages. We also aim for further development and refinement to improve the user interaction experience and to improve technical support for usage with other languages.
About the Authors
Dr. Marisha Speights Atkins is an assistant professor at Auburn University and Director of the Technologies for Speech-Language Research Lab. Her work focuses on the development of innovative technologies for diagnosis and treatment of speech disorders. Her research interests include child speech production and disorders, acoustic-based technologies for assessment and treatment of speech disorders, speech intelligibility, and remote assessment of speech disorders through telepractice.
Dr. Cheryl Seals is an associate professor in Auburn University’s Department of Computer Science and Software Engineering. Dr. Seals directs the Auburn University Computer Human Interaction Lab, which develops computing applications to improve the usability of products for many different populations (4-H, K-12 Teacher Education, introductory computer programming, and mathematics education and reinforcement applications). Lab efforts include development of educational applications to support advanced personalized learning tools and testing applications to determine instructional potential and design usability for a population, with the goal of universal usability.
Dr. Dallin Bailey’s clinical research efforts primarily involve using linguistic tools to enhance treatment outcomes and patient satisfaction for aphasia and apraxia of speech treatments. His research focuses on the development and testing of treatments and treatment outcome measures for aphasia and apraxia of speech, kinematic measurement of speech motor learning, abstract word processing, verb processing, and single-subject research design.
Abel, J., Bliss, H., Gick, B., Noguchi, M., Schellenberg, M., & Yamane, N. (2016). Comparing instructional reinforcements in phonetics pedagogy. In Proc. ISAPh 2016 International Symposium on Applied Phonetics (pp. 52–55).
Bailey, R. W. (1982). Human performance engineering: A guide for system designers. Englewood Cliffs, NJ: Prentice Hall.
Brooch, G., Rumbaugh, J., & Jacobson, I. The Unified modeling language user guide (2nd ed.). (2005) Boston: Addison-Wesley.
Caristix. (2010). 8 Stages in an HL7 Interface Lifecycle . Retrieved from http://caristix.com/blog/2010/10/8-stages-in-an-hl7-interface-lifecycle/
Heselwood, B. (2007). Teaching and assessing phonetic transcription: A Roundtable discussion. Centre for Languages Linguistics & Area Studies. Retrieved from https://www.llas.ac.uk/resources/gpg/2871.html
Hillenbrand, J. (2014). Phonetics exercises using the Alvin experiment-control software. The Journal of the Acoustical Society of America, 135(4), 2196–2196.
Hillenbrand, J. M., Gayvert, R. T., & Clark, M. J. (2015). Phonetics exercises using the Alvin experiment-control software. Journal of Speech, Language, and Hearing Research, 58(2), 171–184.
Holtzblatt, K., & Beyer, H. R. (2014). Contextual design. In The Encyclopedia of human-computer interaction (2nd ed.), (#8). The Interaction Design Foundation. Retrieved from https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/contextual-design
Howard, S. J., & Heselwood, B. C. (2002). Learning and teaching phonetic transcription for clinical purposes. Clinical Linguistics & Phonetics, 16(5), 371–401.
Knight, R. A. (2010). Sounds for study: Speech and language therapy students’ use and perception of exercise podcasts for phonetics. International Journal of Teaching and Learning in Higher Education, 22(3), 269–276.
Knight, R. A. (2011). Towards a cognitive model of phonetic transcription. Phonetics Teaching and Learning Conference.
Knight, R. A., Bandali, C., Woodhead, C., & Vansadia, P. (2018). Clinicians’ views of the training, use and maintenance of phonetic transcription in speech and language therapy. International Journal of Language & Communication Disorders, 53(4), 776–787.
Ladefoged, P. (1990). The revised international phonetic alphabet. Language, 66(3), 550–552.
Mompeán, J. A., Ashby, M., & Fraser, H. (2011). Phonetics teaching and learning: an overview of recent trends and directions. In Proceedings of the 17th International Congress of Phonetic Sciences (Vol. 1, pp. 96-99).
Müller, N., & Papakyritsis, I. (2011). Segments, letters and gestures: Thoughts on doing and teaching phonetics and transcription. Clinical Linguistics & Phonetics, 25(11–12), 949–955.
Norman, D. A. (2002). Emotion and design: Attractive things work better. Interactions Magazine, 9(4), 36–42. Retrieved from web.jnd.org/dn.mss/emotion_design_attractive_things_work_better.html
Randolph, C. (2015). The “State” of phonetic transcription in the field of communication sciences and disorders. Journal of Phonetics and Audiology, 1, e102.
Rumbaugh, J., Booch, G., & Jacobson, I. (1998). The unified modeling language user guide. Addison-Wesley. https://pdfs.semanticscholar.org/fc51/1dcebd3dae76133d5dbbda4250bebd0fb5e3.pdf
Shneiderman, B., & Plaisant, C. (2010). Designing the user interface: Strategies for effective human-computer interaction, (5th ed.). Reading, MA: Addison-Wesley Publ. Co.
Small, L. H. (2005). Fundamentals of phonetics: A practical guide for students. Boston: Pearson/Allyn and Bacon.
Sparks, G. (2001). Database modelling in UML. Methods & Tools, 9(1), 10–23.
Sullivan, K., & Czigler, P. (2002). Maximising the educational affordances of a technology supported learning environment for introductory undergraduate phonetics. British Journal of Educational Technology, 33(3), 333–343.
Titterington, J., & Bates, S. (2018). Practice makes perfect? The pedagogic value of online independent phonetic transcription practice for speech and language therapy students. Clinical Linguistics & Phonetics, 32(3), 249-266.
Vaissière, J. 2003. New tools for teaching phonetics. Proceedings of the 15th International Conference of Phonetic Sciences (ICPhS), Barcelona. Retrieved from https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/papers/p15_0309.pdf
Verhoeven, J., & Davey, R. (2007). A multimedia approach to eartraining and IPA transcription. In Proceedings of Phonetics Teaching and Learning Conference, (pp.1–4). London: University College London.
Wolf, Lauren. (2012). 6 Tips for designing an optimal user interface for your digital event. INXPO. Retrieved from https://archive.is/20130616121623/http://web.inxpo.com/casting-calls/bid/105506/6-Tips-for-Designing-an-Optimal-User-Interface-for-Your-Digital-Event