Validity and reliability of the Intelligibility in Context Scale: European Portuguese version

ABSTRACT The purpose of the study was to evaluate the validity, reliability, sensitivity and specificity of the European-Portuguese version of the Intelligibility in Context Scale (ICS-EP). Seventy-six children (age: M = 60.6, SD = 8.1 months), 25 having a parent or teacher concern about how they talked and 51 with no concern, were assessed with the phonetic-phonological test (TFF-ALPE) to calculate the following severity measures: percentage of phonemes correct (PPC), percentage of consonants correct (PCC), and percentage of vowels correct (PVC). Parents also filled out a questionnaire about their child’s development (e.g. concern about how the child talks). The ICS was then completed by parents to estimate their children’s intelligibility with different communicative partners. The results showed that item-level scores were different according to communicative partners. The mean ICS score for the whole sample was 4.5 (SD = 0.6), showing that children were “usually” to “always” intelligible. The ICS had excellent internal consistency (α = 0.96). Children with parental concern about their speech presented significantly lower mean scores (M = 3.91, SD = 0.59) than children without parental concerns (M = 4.78, SD = 0.36). There was a positive correlation between the ICS scores and PPC (r = .655), PCC (r = .654), and PVC (r = .588). A simple linear model was also obtained between the ICS mean score and the severity measures analyzed. High values were obtained for sensitivity (0.80) and specificity (0.84), using a cut-off point of 4.36. We conclude that the ICS-EP has good psychometric properties, suggesting it to be a valid tool for estimating children’s intelligibility when talking with different communicative partners. Therefore, this version of the ICS can be used as a screening measure for children’s speech intelligibility.

Intelligibility is influenced not only by speech signals, but also by the familiarity of the listener with the speaker, the presence of speech cues, knowledge of the context, and the number of speech samples presented (Flipsen, 1995;Ertmer, 2010;Pascoe et al., 2006).Speech-language pathologists should consider intelligibility not only to establish the diagnosis and decide the need for intervention, but also as an outcome measure to assess the efficacy of the intervention (Lousada et al., 2014;Williams, McLeod, & McCauley, 2010).
There are two assessment methods to measure intelligibility in children with speech sound disorders: word identification tasks and rating scales (Ertmer, 2010;Miller, 2013;Pascoe et al., 2006;Whitehill, 2002).In word identification tasks, listeners write down the words that they understood or select words from multiple-choice alternatives.In a clinical setting, the use of rating scales is potentially quick and easy (Lousada et al., 2014).Typically, this method requires the listener (e.g. a speech-language therapist or a communication partner) to rate speech samples along a continuum of intelligibility (e.g. on a numeric scale where 1 represents totally unintelligible and 5 means totally intelligible) (Ertmer, 2010;Lousada et al., 2014).Although different methods are available, few scales have been studied for their psychometric properties.
The Intelligibility in Context Scale (ICS; McLeod, Harrison, & McCormack, 2012b) is a scale that has been validated (McLeod, Crowe, & Shahaeian, 2015;McLeod et al., 2012a).The ICS requires parents to estimate a child's speech understandability in a range of environmental contexts and by different listeners (immediate family, extended family, friends, acquaintances, teachers and strangers/unfamiliar people) on a five-point scale (1 = never, 2 = rarely, 3 = sometimes, 4 = usually, 5 = always).The ICS contains seven items that were developed based on the environmental factors listed in the International Classification of Functioning, Disability, and Health: Children and Youth (ICF-CY, World Health Organization, 2007).The total score can be compared with normative data (if available) or as an outcome measure (Phạm, McLeod, & Harrison, 2017).
The psychometric properties of the ICS were first analyzed on 120 Australian Englishspeaking children (McLeod et al., 2012a).The results showed good internal consistency (α = .93)and construct validity.Criterion validity was also analyzed through moderate correlations between ICS score and severity measures: PPC (r = .54),PCC (r = .54),PVC (r = .36).Recently, a study with 803 Australian English-speaking children provided normative data and additional validation of the psychometric properties of the ICS (McLeod et al., 2015).The results indicate high internal consistency (α = .94).The values obtained for sensitivity and specificity were 0.82 and 0.58, respectively.Concerning criterion validity, significantly low correlations between ICS and percentage of phonemes correct (PPC, r = 0.30), percentage of consonants correct (PCC, r = 0.24), and percentage of vowels correct (PVC, r = 0.30) were found.The significant low correlations obtained suggest that severity measures of speech are linked with caregiver's estimation of intelligibility.
In ICS studies, the participants (children) were usually divided into two groups based on parent or teacher concerns about how the children talked and made speech sounds.This allowed researchers to determine whether the scale was able to distinguish between the groups (McLeod et al., 2015(McLeod et al., , 2012a)).
For Australian English-speaking children (McLeod et al., 2015), the ICS mean scores were lower for children who were identified by caregivers as having difficulty talking (M = 3.9) compared with those who were not (M = 4.6).Thus, this scale can be a useful tool to screen children's speech intelligibility.If there is a need for a relatively quick measure of the children's intelligibility, it can be easily completed by caregivers.In spite of the high potential of this screening tool, some studies (Ng et al., 2014;Phạm et al., 2017) reported that variables such as parents' educational level and duration of daily caregiverchild interaction could limit the ICS accuracy considering that this estimation of intelligibility is based on parents' opinion.
This study aims to analyze the psychometric properties of the ICS-EP.Specifically, internal consistency, criterion validity, sensitivity and specificity were analysed.

Participants
Seventy-six children were included in this study, 25 with a parent/teacher concern about how they talked and 51 with no identified concern.None of the children had any biomedical condition (e.g.neurological impairment or intellectual disability).None of the children had been identified as having a persistent hearing impairment, although 11 (14.5%)caregivers referred to a history of ear infections.All children had shown normal-range nonverbal intelligence (>25th percentile) on the Portuguese version of Raven's Coloured Progressive Matrices (Raven, Raven, & Court, 2009).European Portuguese was the native language of all participants.Socioeconomic level (see Table 1) was determined by crossing two indicators: the occupational group and the instructional level of the person who contributes the most to the family income (Reif, Marbeau, Quatresooz, & Vancraeynest, 1991).All ethical procedures were ensured by the Ethics Committee, Research Unit in Health Sciences (reference number 482_02_2018).Prior to any data collection, informed consent was collected from all caregivers.

Sociodemographic and sample characterization
Among the 76 children who participated in the study, more of them were male (n = 44, 57.9%) than female (n = 32, 42.1%).The children's ages ranged from 47 to 74 months (M = 60.6,SD = 8.1).The majority of the sample (72.4%) presented a high or medium-high socioeconomic status.Eleven (14.5%) caregivers referred to a history of ear infections (see Table 1).

Intelligibility in Context Scale (ICS)
The ICS is a seven-item parent-rated measure of children's intelligibility when communicating with people with different levels of familiarity and authority, using a five-point Likert scale.The ICS-EP was used.The translation has been undertaken by two SLPs and researchers who work with children with speech sound disorders and are native speakers of European-Portuguese. Synthesis of the translations has been checked via back translation by accredited translator (Beaton, Bombardier, Guillemin, & Ferraz, 1998;Guillemin, Bombardier, & Beaton, 1993).A committee review checked the resultant translation and a final version was obtained.

Questionnaire for parents
Caregivers filled out a questionnaire intended to characterize children (absence of a biomedical condition, native language, history of ear infections) and determine their family background (occupational group and instructional level).The questionnaire also includes a specific question: "Do you have any concerns about how your child talks and makes speech sounds?" with three response options (yes, a little, or no).The children were included in a group of no parental concern about speech and language if their parents answered no, according to McLeod et al. (2015).

Procedure
Recruitment First, children from three kindergartens and two child-care centers were screened by parent and teacher reports to detect those who were having difficulty talking and making speech sounds.Then, 80 children (27 who had been identified by their parents and teachers as having problems talking, and 53 who had not) underwent an assessment by a speech-language therapist and a psychologist.The ICS is available at http://www.csu.edu.au/research/multilingual-speech/ics.The final sample includes 76 children whose parents returned the questionnaire for parents and the ICS.Of those 76 children, there were 25 whose caregivers expressed concern about how they talk and make speech sounds, and 51 whose did not present such concerns.

Assessment
Children were assessed by three experienced pediatric speech-language therapists and two trained speech therapy undergraduate (final year) students.These one-hour meetings took place in a quiet room in their kindergarten or child-care center.With the consent of the children and their adult guardians, audio of the assessments was recorded using the software Audacity on a laptop with a built-in microphone.TFF-ALPE was used to assess all children's phonology skills.Phonetic transcriptions were recorded online by examiners, and the audio files were reviewed two days after the assessment sessions to check the transcriptions.Parents completed the questionnaire and the ICS after their child's assessment.

Reliability of transcriptions
Point-to-point agreement of all consonants and vowels was calculated for broad phonetic transcriptions of each word on the phonetic-phonological instrument.Interrater reliability between the two undergraduate students was calculated for transcription of words on the TFF-ALPE for 13% (10 children) of the sample.The interrater reliability for 3,580 phonemes was high (98.13%).This value is comparable with the agreement level in other studies in disordered child phonology (Shriberg, Tomblin, & Mcsweeny, 1999) and is considered adequate to the aim of our study.

Data analysis
Descriptive statistics were reported as mean (M), standard deviations (SD), median (Med) and percentiles (P25% and P75%) for continuous variables, and as counts and percentages for categorical variables.The Kruskal-Wallis test was used to compare two or more independent groups.Also, the effect size results are presented (epsilon squared) (Tomczak & Tomczak, 2014).For the post-hoc analysis, the Mann-Whitney U test was conducted with Bonferroni correction.All the correlations were calculated using the Spearman Rank test and classified the results by using the Cohen's guidelines for correlations: Small correlation (between 0.10-0.29);medium correlation (between 0.30-0.49);large correlation (between 0.50-1.0)(Cohen,1988).The non-parametric test choice was related to the rejection of the normality assumption.Notably, the Pearson correlation results were similar to the ones presented by the Spearman Rank test.
Linear regression models for the prediction of severity measures (PVC, PCC, PPC) were established.Regression ANOVA was tested for the significance of the slopes and the residual's normality was confirmed by visual inspection of the PP plot.
To evaluate internal consistency of the ICS, Cronbach's alpha was calculated.The sensitivity and specificity were evaluated using a Receiver Operative Characteristic (ROC) based on ICS and the parent's opinion.The area under the curve (AUC) and the correspondent 95% confidence interval (CI) were calculated.
All statistical analyses were performed using SPSS® Software, version 24.0 (SPSS Inc., Chicago, IL) and p-values under 0.05 were considered significant.

Descriptive and inferential statistics
The effect of the demographic variables of gender, age, and socioeconomic status on ICS scores and severity measures was analyzed (see Table 2).There were no significant differences between ICS scores based on gender (p > 0.05) or socioeconomic status (p > 0.05).However, there were significant differences between the mean ICS scores based on categorized age (p < 0.01) and parental concern about speech sound production (p < 0.001).The post hoc analysis on categorized age revealed an existence of two effects: one representing the youngest group ("≤53" months) with lower ICS scores, and a second effect representing the intermediate and oldest group ("54-64" and "≥65") with identical ICS scores.The "parents' evaluation" proved to be corrected because the "children with identified concern" group presented lower ICS values in comparison with the other group.
Concerning severity measures, there was a significant mean difference between the PCC for gender (p < 0.05) and categorized age (p < 0.01), and the PPC only for age groups (p < 0.05).Parent evaluation presents a significant effect on all severity measures (p < 0.001).The post hoc analysis on age group revealed for PCC and PPC revealed the same pattern of results as for the ICS variable: two effects, the first one representing the youngest group ("≤53" months) with lower PCC and PPC scores, and a second effect representing the intermediate and oldest group ("54-64" and "≥65") with identical PCC and PPC scores.As it happens for the ICS scale for the "parents evaluation" variable, the response pattern for PVC, PCC, and PPC consistently presented lower values for the group "children with identified concern".
The proportion of variance explained by the significant variables, measures by the effect size, varied from 12% to 14% for Age, varied from 31% to 53% for the "Parents evaluation", and only 6% for Gender.These values are considered low to moderate, suggesting high variability in the scales responses.
The results of the ICS obtained with the five-point Likert scale (1 = never, 2 = rarely, 3 = sometimes, 4 = usually, 5 = always) are presented in Table 3.A mean average total score of 4.49 (SD = .60)was obtained for the whole sample.Mean scores of the seven ICS items indicated that parents' ratings differed by communication partner, being highest for themselves (M = 4.75), similar for teachers, immediate family, and friends (M = 4.61, M = 4.55 and M = 4.50, respectively), and lowest for acquaintances (M = 4.41), extended family (M = 4.37) and strangers (M = 4.33).

Validity and reliability of the Intelligibility in Context Scale (ICS)
Internal consistency and correlation between items Large correlations were obtained between seven of the items on the ICS using bivariate nonparametric correlation analysis (Spearman's rho), ranging from rho = .62to rho = .94,p = .001.The lowest correlation was observed between parents and extended family members (rho = 0.62).Internal reliability of the ICS was calculated using Cronbach's alpha (α = 0.96), indicating a high internal consistency (see Table 4).

Criterion validity
Criterion validity of the ICS was analyzed for 76 children (25 whose caregivers presented concerns about how they talk and 51 who did not present any concern) who were assessed with the TFF-ALPE.Criterion validity shows a degree of overlap between two tools that measure similar skills (Gay, 1985).In the present study, the ICS was compared with the participant's PPC, PCC, and PVC obtained through data from the TFF-ALPE.Bivariate correlation analysis (Pearson's r) indicated that the ICS mean score was positively correlated with PPC (r = .655),PCC (r = .654),and PVC (r = .588).A simple regression model was also established (see Figures 1-3), indicating a linear relationship between the ICS mean score and the severity measures analyzed (PPC, PCC and PVC).The proportion of variability explained by those models ranged from 49.8% to 61.6%.
The corresponding optimal cut-point score for the sensitivity and specificity levels was 4.36.This value is the best at discriminating between children with speech sound disorders and typically-developing children compared with the results obtained when a detailed assessment is used.

Discussion
The aim of the present study was to analyze the reliability and validity of the ICS-EP.The ICS was filled out by 76 parents of European Portuguese-speaking children aged between 3;9 and 6;2.In the current study the possible influence of sociodemographic variables (gender, age, and socioeconomic status) on ICS mean scores was analyzed.There were no  significant differences in the ICS scores based on gender.This result supports other previous studies using ICS (Hopf, McLeod, & Mcdonagh, 2017;Phạm et al., 2017).In contrast, McLeod et al. (2015) found differences between male and female children (ICS scores for female children were significantly higher than those for male children).In spite of the absence of a significant difference, female children who participated in our study also received higher ICS scores than their male counterparts.The absence of significant difference might be related to the smaller sample size in our study.Our results also indicate significantly higher ICS scores as children get older (groups "54-64" and "≥65" presented higher ICS results when compared to "≤53" group).This finding is similar to results obtained by Tomić and Mildner (2014); Phạm et al. (2017);and Neumann, Rietz, and Stenneken (2017), although McLeod et al. (2015) found that children aged 5;0-5;5 received lower ICS scores than younger children.This unexpected result could be due to the fact that these children had not yet attended school.Concerning socioeconomic status, our study did not show socioeconomic influences on ICS, which is consistent to the results obtained by McLeod et al. (2015) and Hopf et al. (2017).
A mean ICS score of 4.5 (with a SD of 0.6) was obtained for this group of Portuguese children.This value is similar to other ICS mean scores in international studies (e.g.M = 4.4 for Australian children; M = 4.4 for Vietnamese children; M = 4.4 for German children).The slightly higher value in our results could be related to the age range of the sample included in our study (from 3;11-6;2) comparatively to other studies (from 4;0 to 5;5 -Australian children; 2;0-5;11 -Vietnamese children; 3;0-5;11 -German children).
This means that Portuguese children are, on average, "usually" to "always" intelligible for a variety of listeners.Intelligibility was higher with parents, followed by teachers, immediate family, friends, acquaintances, and extended family; children were least intelligible to strangers.This finding is generally consistent with other studies (McLeod et al., 2015(McLeod et al., , 2012a;;Neumann et al., 2017;Phạm et al., 2017) indicating the influence of environmental context on intelligibility.However, some small differences can be observed between studies (e.g. in the study by McLeod et al. (2015), the parents believe that immediate family understand the children easier than teachers) which may be due to different sample sizes.
The ICS-EP presents a high internal consistency of the seven items (α = 0.96), and moderate to high correlations between items.These results are very similar to the previous findings obtained using the original ICS (McLeod et al., 2015(McLeod et al., , 2012a)).
Criterion validity was analyzed by correlating total average ICS scores for the whole sample with PPC, PCC, and PVC.The significant moderate correlation values obtained suggest that speech severity measures are correlated with parents' responses on the ICS, meaning that parents can generally characterize their children's intelligibility accurately.The correlation value between PCC and ICS (r = .65)is higher than the one (r = .42)obtained in a previous study (Phạm et al., 2017).The results of simple linear regressions showed that we can predict the results for each severity measure (dependent variables) based on the values of the independent variable (ICS score).The values obtained for sensitivity and specificity were above 0.80, suggesting that the ICS can be used as a screening tool to identify children who may need an in-depth assessment of their speech abilities.
This study has some limitations that should be acknowledged.The sample size is not sufficiently large and was not systematically chosen.Participants were recruited in only three (out of a maximum of 18) districts in Portugal.Consequently, the sample is not representative of the overall population.Future studies should include a larger sample with younger children, randomly chosen.Other measures (e.g.test-retest, interrater reliability of the ICS) can be obtained to determine psychometric properties.

Conclusions
The ICS-EP indicates strong internal consistency, criterion validity, sensitivity and specificity, meaning that the ICS is a valid and reliable measure of intelligibility for preschoolaged children.The overall good psychometric properties indicate that the ICS-EP has clinical utility for speech-language pathologists.

Figure 4 .
Figure 4.The Receiver Operative Characteristic (ROC).The adjusted cutoff was 4.36 that corresponded to a sensitivity level of .804and a specificity level to .0.840.The AUC was 0.978 and the correspondent 95%CI was [0.792;0.965].

Table 1 .
Sociodemographic and sample characterization.

Table 2 .
ICS and severity scales.