RELIABILITY STUDY OF LANGUAGE SAMPLES USING SOFTWARE TO SUPPORT THE TRANSCRIPTION OF DIGITIZED SPEECH AAC SYSTEMS.

Pushpa Ramachandran and Katya Hill
Edinboro University of Pennsylvania

ABSTRACT
This study was part of the design and development phase for a digitized speech language activity monitor (DLAM). This task looked at the reliability of converting the WAVfile logfiles into text using software developed specifically for the DLAM to facilitate the transcription process. Twelve transcribers were assigned to correctly upload, transcribe, code, and analyze DLAM logfiles. Transcribers were required to achieve intra-rater reliability of 95% before handling the research data. Inter-rater reliability for utterance segmentation, word-by-word agreement, and the identification of pre-stored messages was determined. Reliability results ranged from 93% to 100% for the identified summary measures.

BACKGROUND
Principles of evidence-based practice and outcomes measurement emphasize the need to use instrumentation to collect data to support clinical decision-making and evaluate intervention services. With the shift toward accountability, developing tools to support data collection as well as integrating them across service delivery sectors and geographic borders would have a tremendous impact on outcomes measurement (1). The development of automated logfiles (2) and the AAC language activity monitor (LAM) (3) for clinical use has made available quantitative data on which to base intervention decisions. The LAM has made possible various analyses of data that can prove useful to clinicians as well as to people who rely on AAC. Analysis of AAC language samples collected using LAM has provided information similar to that available from samples of speaking individuals. Research using LAM logfiles has reported inter-rater reliability of the transcription process for utterance segmentation and word-by-word agreement at 96% and 100% respectively (4).

The majority of AAC devices in use today are digitized speech systems that do not have serial output capabilities. The DLAM records the speech output of these devices. The information, which includes time-stamps, can be uploaded to a computer and later analyzed for data to support AAC evidence-based practice.

RESEARCH QUESTION
This study was designed to determine the reliability of the transcription of digitized language samples using the DLAM Term PC software. In order for logfiles to be analyzed and used as data to report summary measures, the logfiles must be edited and coded as transcripts. Logfiles from digitized speech AAC devices must be converted from WAVfiles into text as part of the transcription process. The reliability of this transcription process has never been evaluated.

METHOD
Language samples were generated on a digitized speech AAC system. Ten utterance test lists were generated and these lists comprised the controlled sample. Some of the considerations in constructing the DLAM utterance test lists included the following: 1) previous research on language sample data collected using LAM tools and 2) vocabulary and customized utterances using the Unity 32 on the AlphaTalker. The demands on transcription were increased by varying the selection rate when the lists were generated.

Twelve transcribers were assigned to upload, transcribe, code, and analyze the control samples. These individuals consisted of six undergraduate and graduate students in speech pathology, three trained speech–language pathologists, and three professionals with computer experience. The individuals received 1:1 or 1:2 group training that lasted approximately one hour along with the transcription-training manual to follow for practice. All the transcribers were blind to the research questions.

A point-by point agreement of 95% initial intra-rater reliability was achieved using training lists before the actual transcription process was commenced. Reliability was calculated as percentage of agreements/(agreements+disagreements) for each rater for each sample.

Each individual was required to transcribe each utterance list twice making a total of 120 language samples used to calculate the reliability. The ten lists were randomly selected for transcription. The transcription process was two fold in nature. The first step involved using the DLAM Term PC software to listen to a DLAM logfile and transcribe the auditory signal thereby creating a text file which was a written representation of the digitized language sample. Once the utterances were transcribed, the transcribers needed to review the time stamps and text to identify and report the summary measures used to determine reliability.

The transcripts were examined for reliability in the following three areas: 1) ability to transcribe the words correctly (Word recognition); 2) ability to segment the time stamps and text into utterances (Utterance segmentation); and 3) ability to identify (code) pre-stored utterances.

RESULTS
Table one shows the mean reliability on the three measures for the twelve transcribers

U=utterance; WR=word recognition; PS=pre-stored messages

The average percentage of agreements on utterance segmentation was found to be 98.67%(range of 98 to 100). The inter-rater reliability for word-by word agreement was found to be 98.83% (range =97 to 100%). The inter-rater reliability for the identification of pre-stored messages (code agreement) was found to be 98.0% (range was 93 to 100%).

DISCUSSION
The study shows that there is strong inter-rater reliability for transcribing digitized speech language samples using the DLAM Term PC software. The high reliability that occurred using controlled samples under laboratory conditions would not be expected to be representative of performing transcription under normal conditions. However these reliability results clearly indicate the feasibility of using the DLAM PC software as a clinical transcription tool for collecting language samples using digitized speech AAC systems. Since the feasibility study demonstrated that WAVfiles can be converted into text, a whole range of computerized language analysis tools become available to report a variety of clinically useful summary measures on individuals who rely on digitized speech AAC systems. Further research investigating the reliability of transcription under normal training and working conditions is suggested.

REFERENCES
1) DeRuyter, F. (1995). "Evaluating outcomes in assistive technology: Do we understand the commitment?" Assistive Technology,7, 3-16.

2) Higginbotham DJ & Lesher GW (1999). Development of a voluntary standard format for augmentative communication device logfiles. In Proceedings of the RESNA ’99 Annual Conference. Arlington, VA: RESNA Press. 25-27.

3) Romich, BA & Hill, KJ (1999). A language activity monitor for AAC and writing systems: Clinical intervention, outcomes measurement, and research. In Proceedings of the RESNA ’99 Annual Conference. Arlington, VA: RESNA Press. 19-21.

4) Hill, K. (2001). The development of a model for automated performance measurement and the establishment of Indices for augmented communicators under two sampling conditions.(Doctoral Dissertation). University of Pittsburgh, Pennsylvania.

ACKNOWLEDGEMENTS
The National Institute for Deafness and Other Communication Disorders of NIH has provided funding to Prentke Romich Company to support the work on DLAM.

Pushpa Ramachandran, M.A.
Edinboro University of Pennsylvania
CATER, 102 Compton Hall
Edinboro, Pennsylvania 16444
Tel: 814-732-2431
FAX: 814-732-1580
Email: pushpa2812@yahoo.com