Austin J. Northcutt: “Maximizing the Diagnostic Accuracy of Language Sample Analysis in Children With Language Impairment” Transcript
I’m pleased to introduce Austin Northcutt, a senior in the department of communication sciences and disorders who will presenting his Honors Project, which is entitled “Maximizing the Diagnostic Accuracy of Language Sample Analysis in Children with Language Impairment.”
Language impairment is a communication disorder that inhibits language learning in children with no hearing loss or children with intellectual disability. There is no known cause for language impairment and it is diagnosed through a language assessment performed by a language pathologist. In the field today, standardized assessment is the most common tool used to diagnose children. These standardized assessments are mas produced as a tool for a language pathologist. One tool that is not used nearly as much as standardized assessment due to how time consuming it can be to transcribe and fully analyze a sample is a language sample analysis. in some cases a speech language pathologist may collect a sample but does not have the time to fully transcribe the sample, which prevents them from using all the possible formal measure that can be gathered. Language sample analysis is a non-standardized assessment tool used by language pathologists.
While these language samples can be extremely useful when diagnosing, they can be time consuming to transcribe and analyze. A lot of speech pathologists in the field today do not have the time today. There are a variety of types and the most commonly used types are play, narrative, and conversation. With all of these different types of sample, this serves as a facilitator for the child to produce language. And the goals of the assessment is to understand and assess the child’s language.
Narrative language samples are a sample in which language is produced through the telling of a story or narrative. These samples can be elicited in a variety of ways. One common elicitation technique is to present the child with a picture book with no words on any of the pages. The clinician then tells the child to create a story for the book. Another technique is to present the child with a sequence of two or three pictures. The child is then asked to create a story related to these pictures. In some cases the child is read a story and then asked to tell the story back to the clinician. This is sometimes referred to as a narrative retell.
When it comes to diagnosing language impairment, there is not one way that is standard throughout the field, and it is possible that every speech language pathologist does this a little bit differently. When deciding what speech language technique is best, a pathologist has to have an understanding of diagnostic accuracy.
Diagnostic accuracy is used to describe how correct a tool or measure is when identifying individuals both with and without a disorder. specificity and sensitivity are the more accurate words used to describe a technique’s accuracy. Specificity describes the true negatives found by the tool, or what percent of individuals without the disorder are correctly identified as not having the disorder using the tool. Sensitivity describes what percentage of individuals with the disorder are correctly identified by the tool. Within speech pathology, a value of 80% is acceptable, but 90% is preferred. A value of 100% is ideal because it would mean no one is misidentified using this measure.
A variety of different measures can be gathered form a language sample analysis. one of the most common and easily obtained is the mean length of utterance, or MLU. The MLU can be measured in either length of words or morphemes. MLU means the average number of words per utterance. In the utterance above, “they hopped in the car”, the MLU in words is five.
MLU can also be counted in morphemes. Morphemes are the smallest unit of meaning in a word. The word “jump” is one morpheme because it cannot be broken down any more. However, the word “jumped is two morphemes because is the past tense “-ed.” This carries its own meaning. The phrase “They hopped in the car” then has an MLU of six. This can be compared to the previous slide where the MLU in words would be five, showing how there is a difference between the two, although it’s not always a large difference. The difference between the two can be used to show the use of bound morphemes such as –ed or –ing.
Rice et al. determined that MLU is extremely useful when attempting to determine whether or not a child is following typical language acquisition norms.
The finiteness of a verb is related to the correct use of tense and agreement. The verb is marked for the correct tense and agreement does not have a finiteness error. Tense relates to past, present, or future. Some examples of tense marking would be “I jumped high” with the –ed in red marking a past tense. You can also used the word “to be” as an auxiliary or a main verb to also mark tense. For example, “I am here” shows that it is present tense.
Agreement is related to the marking of 1st, 2nd, or 3rd person as well as singular or plural. In these examples, the s on jumps shows that it’s third person. “He is here” would also be used as marking the agreement as singular. If it had been “they” it would be “they are here,” which would be in contrast to that.
Finite Verb Morphology Composite is calculated by adding up all instances in a sample in which some form of finiteness marking on a verb is obligatory. You would then divide the number of correct uses of finiteness by the total number of uses. If there was no –ed on “I jumped high” but it was supposed to refer to past tense, that would be an error but there would still be four obligatory contexts. So to find the composite, you would divide three by four, giving a finite verb pathology score of 75%.
As can been seen in the table, the finite verb morphology composite has an acceptable diagnostic accurately from ages 4 to 6. Above age 6 children with LI sensitivity decreases. And while these numbers are acceptable, the preferred percentage would be around 80%. So the sensitivity scores while acceptable are a little off.
Percentage Grammatical Utterances (PGU) is calculated by determining the percentage of grammatically correct utterance. An example of an ungrammatical utterance would be “him don’t want to get up.” The grammatically correct version of this would be “He Doesn’t want to get up.” Within the sample, the first utterance would be marked as agrammatical, but if the revision was made later in a grammatical fashion, the end result would be a PGU of 50%. Eisenberg and Guo 2016 found that PGU has an acceptable diagnostic accuracy from ages 3-8. And as you can see when compared to finite composite, from ages 3-5 the sensitivity is higher and in the preferred range, while 6-8 is still acceptable but not as high as preferred. And specificity has a range of numbers ranging from acceptable to idea.
The research questions for this project are as follows: “Which language sample measures most accurately diagnose LI in children ages 5 to 10 years old. DO these measures differ for different ages? And how can LSA measures be combined in a specific sequence to yield high diagnostic accuracy for children with LI?
The CHILDES database is a freely available database for language samples. This project will only use the Gillam corpus. This corpus was created using the Test of Narrative Language (TNL). This is an assessment with a large norming sample. Samples were used to norm the data. The transcripts were taken during the test administration for the standardized assessment. Samples were broken into three sections. The first is the narrative retell. The second section is a picture description of six pictures, and the third section is a picture description of a single picture.
Above is a demographic breakdown for the entire sample. We had a total of 342 transcripts, 152 females, 190 males. 171 transcripts were from children with a previous diagnosis of language impairment. 171 were from children described as typically developing. You can see the race breakdown on the right side. The majority of the sample is European American, with transcripts from individuals of African American, Asian, and Hispanic American background. 25 were described as other and 3 did not have a designated race or ethnicity.
You can see the demographic breakdown by age for the nonminority samples. The LI distribution is on the left side and the gender distribution is on the right. In the 5-year old group there are 13 total samples. Six of those were LI samples, 7 we TD, 7 were female, and 6 were males. In the 6-year old group there were 36 total samples, with 14 LI and 22 TD, 21 females and 15 males. In the 7-year old group there were 60 total samples, 23 LI, 37 TD, 26 females and 34 males. In the 8-year old there were 41 samples with 16 LI, 25 TD, 19 females, and 22 males. In the 9-year-old group there were 37 total samples, with 14 LI, 23 TD, 18 females, 19 males. And in the 10-year-old group there were 26 total samples, 14 LI, 12 TD, 9 females, and 17 males.
CLAN is a computer program used to analyze language samples. It is freely available and it automatically analyzes a sample for a number of different measures. It identifies a sample into morphemes and analyzes it. Within the program it is possible to add codes that will be used to calculate additional measures.
All of the Gillam samples gathered from LI children were used. Then a TD sample of the same age and gender was selected. If there was not an age and gender match, then a match of the same age but not gender was selected. If there was not an age and gender match, matches were made based on the next closest and available age match. Gender matches were made when possible.
All transcripts were blinded for age, whether or not the child had an LI, and gender. All the files were assigned a number between 100 and 999 randomly. All transcripts were unblended according to a key during analysis.
All of the transcripts were coded for semantic syntax and morphological errors. If an utterance contained a grammar error, then the entire utterance was determined to be ungrammatical. The utterance above “and her didn’t be afraid because her liked the octopuses” contains a number of grammar errors that resulted in the entire utterance being ungrammatical. An error code related to syntax or morphology was used, the utterance was always deemed ungrammatical. Some semantic codes, such as pronoun or preposition errors resulted in the utterance being deemed ungrammatical, but some semantic errors were not classified as grammar errors.
Using CLAN we calculated the PGU and FVMC. The program automatically calculated MLU in words and MLU in morphemes, as well as total number of verbs and total number of verbs per utterance. We then divided the transcripts into age groups, creating it for 5-, 6-, 7-, 8-, 9-, and 10-year-olds. We qualitatively selected cutoff scores based on what would most accurately diagnose the participants.
We assigned a diagnostic label to each sample. Each sample was either identified as being a child with language impairment, a child that is TD, or a child that cannot be diagnosed. Those not identified by our measure would need more testing to receive an accurate diagnosis on language impairment.
The cutoffs to identify children with nonminority backgrounds were: PGU of < 80% led to a diagnosis of language impairment. If the PGU is > 80% but the finite verb morphology composite is < 95% then that also leads to a diagnosis of LI. If the PGU is > 80% and the FVMC is > 95% the diagnosis is typically developing. These diagnoses have a specificity and sensitivity of 100%, meaning all of the kids with LI and TD were correctly identified.
If a six year old has a GPD < 88%, they are diagnosed with LI. If they have a PGU of > 88% but an MLU words < 5.5, they also receive a diagnosis of LI. If their verbs per utterance is > 1.2, their PGU is >88%, and their MLU in words is >5.5, then they’re identified as typically developing. If none of the above measures apply to the child, then more testing is needed.
These measures have a sensitivity of 100% and a specificity of 100%, with two kids needing further testing. This means that only one child was misdiagnosed using these measures.
If the 7-year-olds PGU was <80%, then they’re considered LI. If they have a PGU >80% and their MLU in words is >7.5 then they are TI. If their PGU is >80% but their MLU is less than or equal to 6.5 and their number of utterances is >15 then they’re considered LI. Anyone not identified with these measures would need further testing to accurately determine if they have an impairment. These measures had a sensitivity of 87% and a specificity of 91%, with 13 kids requiring further testing.
For 8-year-olds if their FMVC is <96% they are identified as being language impairs. For those with a FMVC >96% but an MLU in words <6.5 they’re also identified as being language impaired. These measures had a sensitivity of 94% and a specificity of 96%.
For 9-year-olds, an FVMC <100% is considered a language impairment. If they have an FVMC equal to 100% and they have an MLU > 9, then they are TD. If their number of utterances is <30, then they’re considered LI. These measures had a sensitivity of 82% and a specificity of 81% with 10 unclassified kids that require further testing.
The diagnosed 10-year-olds used the following measures in this order. If the child has an FVMC < 100%, then they are language impaired. Of those remaining, if their VPU is < 1.5, they’d be considered LI. Of those, if their MLU in words is >9, then they are typically developing. If none of these measures apply further testing is needed. These measures had a sensitivity of 100% and a specificity of 100% but 9 unclassified kids require further testing.
All the measures mentioned before were based on children from non-minority backgrounds. When measures for 6-year-olds from non-minority backgrounds are applied to those from minority backgrounds, the sensitivity is 100% and the specificity is 40%, with 2 unclassified kids. This means that 6 of the 10 TD children from minority backgrounds were misidentified from these measures.
The sensitivity and specificity are above 90% for the 5-, 6-, 8-, and 10-year olds. The sensitivity and specificity of the 7- and 9-year olds are both above 80%, which is acceptable but not as high as we would like. When calculating diagnostic accuracy, we did not include unclassified kids in these numbers. We did this because we are not diagnosing them and these children would not be diagnosed using our measures. These children would need further testing to get an accurate diagnosis.
Throughout the ages there’s not one consistent measure cutoff that is used. The PGU starts at a cutoff of 80% for five years olds and then goes up to 88% for six year olds. And this makes sense because as a child ages and uses the language more, the productions get closer to the adult productions. And a typically adult would have a PGU of 100%. Why then does the PGU cutoff for 7-year-olds go back down to 80%? A possible explanation is that these children are attempting new more complex language that will lead to more errors than a 6-year-old that is attempting simpler language.
For FVMC we do not expect a TD 5- or 6-year-old to make a many error, which is why it’s at 96 or 96%. And then we do not expect a TD 9- or 10-year old to make any errors, as the typical adult does not make any finiteness errors when speaking.
Verbs per utterance is a measure similar to density. This means they are more than likely using more complex sentences or embedding clauses. We do not expect this very often. There’s a six year old with multiple verbs per utterance on average. However, by age 10 we would expect most children to produce complex sentences or use embedded clauses. This is why a child with an average verb per utterance less than 1.5 is impaired. The MLU cutoffs increase with age until age 9. This makes sense because as a child ages, they get a better understanding of the language. At ages 9 and 10 the cutoff is about the same because this is the cutoff where the typical adult would be.
We attempted to apply these measures to children from a variety of different ethnic backgrounds. The diagnostic accuracy was too low when this was done. As mentioned before the measure for 6-year-olds had a sensitivity of 100% and s specificity of 40%. This means that typically developing children of minority backgrounds were being misidentified as impaired. This could be because of a dialectical difference, a misdiagnosis prior, or a host of other things. But at this point we cannot say why these measures cannot apply to children of minority backgrounds. Historically within the field of speech language pathology this is a common problem in which children from minority backgrounds that speak different dialects are actually misdiagnosed as being impaired when they are actually TD> this is something to take into account when creating measures for children.
On the left hand side of the slide you can see the diagnostic accuracies for the project as reported by Eisenberg and Guo (2016). They reported acceptable diagnostic accuracies for ages 3-8 using common diagnostic measures as opposed to a single measure, which resulted in us having a higher diagnostic accuracy for ages 5, 6, and 8. We also had a high diagnostic accuracy for age 10. The Eisenberg and Guo study didn’t report diagnoses for ages 7, 9, or 10.
On the left side of the slide you can see the diagnostic accuracies reported in 2016 for PGU. They reported that PGY has a high diagnostic accuracy for ages 4 and 5, with acceptable diagnostic accuracy for ages 3, 6, and 8. With our combined measures we had a high diagnostic accuracy for ages 5, 6, 8, and 10. We did not have samples for ages 3 and 4 so there is not a comparison to be made there.
This research will not be as meaningful if it cannot be used clinically. Language samples tend not to be fully transcribed and analyzed due to the time demand. However, narrative language samples tend to be much faster to transcribe as the transcriber already knows the subject material. Analyzing each of these samples only took about 5-10 minutes depending on the length of the sample. The measures set in this project were shown to have a high diagnostic accuracy for a variety of ages. The software used, CLAN, which made an analysis much easier, is freely available to anyone. These measures could be realistically used in a clinical setting as an alternative to a standardized assessment.
While we had a large sample, all of the transcripts came from the same corpus. This means that the same stimulus was used for every sample. It would be interesting to see if these measures would remain accurate if there was a larger variety of stimuli. It would also be interesting to look closely at the samples from minority children or additional samples from minority children, form other, corpi could also be interesting.
It would also be interesting to investigate the narrative portion of other language samples. It would also be interesting to see if there was a semantic component or measure that could be accurate in diagnosing children with language impairment.
This project was especially meaningful for me in that I will be starting a master’s program this fall to become a speech language pathologist. In this project, creating a clinically beneficial tool for speech pathologists to use in the field. This project has also furthered my understanding in child language and child language impairments than any other class has in my undergraduate career. It also has brought together all of my undergrad coursework and H-Options related to my field.
I would like to thank my mentor, Dr. Stacy Betz, Farah and Michele and all of the Purdue Fort Wayne Honors Program, Dr. Steven Cody with his help in formatting and practicing my presentation. I would also like to thank the American Speech and Hearing Association for their support while doing this project.