INTERNAL CONSISTENCY OF THE EDUCATIONAL VALUE SCALE FOR GREEN OUTDOOR SETTINGS – THE CASE OF EDUPARK APP

The purpose of this paper is to analyse the internal consistency of an Educational Value Scale (EVS) to be used to assess individuals’ subjective perception of an educational app ability to support relevant learning in green outdoor settings. In this work, the EVS is presented as a scale with 12 items and data is aggregated and analysed to contribute to an empirical validation of the educational value construct. In this study, the analysis focuses a scale desirable psychometric property, the reliability, which is analysed through a robust Cronbach’s α estimation. Data for analysis is collected from a total of 924 responses to a questionnaire from students (of all school levels) and their teachers, after using the EduPARK app during a one-year face-to-face survey. Results reveal that the scale has internal consistency for teachers and older students (2, 3 Cycle of Basic Education and Secondary teaching students). This study is an initial effort to validate an EVS, which can be helpful to researchers developing and accessing educational apps and to educators selecting educational apps to integrate in their teaching practices.


Introduction
The ubiquity of mobile phones and its use for educational purposes has been a growing field of research with positive results in learning. Frequently, mobile learning literature reported student gains in cognitive and skill-based outcomes, as well as increased motivation and engagement (Zydney & Warner, 2016).
There is an increasing offer of educational mobile apps; however, their ability to promote learning is seldom demonstrated, and few trustworthy information on their quality is available. For example, the popularity criteria for app selection may not be the best option, as the same app can yield different educational gains in different educational contexts. Marques, M. M., & Pombo, L.  Hence, for teachers and parents the task of deciding which educational app(s) to use with their students or children is not simple. Thus, they could highly benefit from the wide application of a scale to evaluate this type of software.
In academic and industry, the development of educational apps for mobile learning can benefit from the target audience's feedback, as users' perceptions, for example about the usability, acceptance and improvement, can contribute to refine software prototypes (Pombo, Marques, Afonso, Dias, & Madeira, 2019). The use of scales to evaluate apps' educational value is also relevant. Evaluation tools are frequently extensive and difficult to complete, as the evaluation framework developed by de Freitas & Olivier (2006), especially if younger students' feedback is included in the study. The same app can promote learning in a given age group or audience with certain characteristics, but not for another.
Internal consistency is associated with a desirable psychometric property, reliability, which is usually analysed though Cronbach's α (Hair, Black, Babin, Anderson, & Tatham, 2010). The purpose of this study is to analyse the internal consistency of an Educational Value Scale (EVS) to be used to assess users' subjective perception of an educational app ability to support relevant learning in green outdoor settings. For app consumers, the contribution of this scale relies in supporting educational practitioners in designing instruction strategies and materials for learners, taking advantage of mobile devices affordance. Furthermore, EVS can be useful for researchers and educational software developers to assess their products and decide if further improvements are needed.
The paper is structured in the following sections: (a) Methods, with the description of the procedure of the EVS development and analysis of its internal consistency through the robust Cronbach's α, considering data from 924 questionnaires filled in by students and their teachers after using the EduPARK app in a green outdoor setting; (b) Results and their discussion in the light of literature, implications for research and practice, research limitations and proposals of future research.

Methods
This section describes how the EVS was created and its internal consistency analysed. First the EduPARK app is presented, as data was collected in reference to this particular mobile app. Follows a brief description on how the scale items, presented in Table 2, were produced. Finally, the procedure to analyse the internal consistency is also presented. Marques, M. M., & Pombo, L.

The case of EduPARK app
The EduPARK app was developed by the EduPARK Project (http://edupark.web.ua.pt/?lang=en) multidisciplinary team, involving researchers from the University of Aveiro (Portugal). The project aimed at creating attractive and effective strategies for interdisciplinary learning, relying on the development of an interactive mobile augmented reality app that supports geocaching activities a green outdoor setting -the Infante D. Pedro Park, in Aveiro. The City Council allowed the installation of plant identification plaques in the city park with augmented reality information in images, audios, videos, schemes, and 3D plant leaves.
The app can be used autonomously, and at any time, through the game mode or explore freely mode, promoting authentic learning so that visitors can enjoy a healthy walk while learning. The game includes several learning guides for different target groups: teachers and students from Basic to Higher Education, and also visitants and general public, in a lifelong learning perspective, as the tourist guide is also offered in English. The guides integrate multidisciplinary issues under the Portuguese National Education Curriculum and propose interdisciplinary questions articulated to educational challenges along the park. The goal is to accumulate points by answering correctly the questions, visualizing augmented reality markers that help to answer questions, and finding virtual caches in a logic of treasure hunt. Further information about the game and app can be found in .
The EduPARK app development follows a design-based research methodology, with successive refinement cycles, based on the users' feedback. The project organised activities for students, teachers and visitors to gather a convenience sample to collect systematic data. The activities occurred in the Aveiro Park and comprise: (a) a small introduction on the activity and some instructions on how to use the EduPARK app to play; (b) the actual game playing with the EduPARK app by groups of (usually) three or four students accompanied by an adult (a teacher, other school staff or an EduPARK team member), and (c) the filling in of a paper questionnaire (with the EVS, as described in  and the leader-board construction and announcement to participants, with small prizes distribution. The average response time to que questionnaire was 10-15 minutes.
The data collection occurred from March 2018 to April 2019.

Developing EVS
The EVS was developed under the EduPARK project (as described in above), which aimed to analyse the impact of the EduPARK app on different dimensions: (a) learning value; (b) intrinsic motivation; (c) engagement; (d) authentic learning; (e) lifelong learning; and (f) conservation and sustainability habits. For that purpose, two educational researchers, with expertise in mobile learning in green outdoor settings, analysed literature associated with the assessment of the above-mentioned dimensions (Crick & Yu, 2008;Erdogan, Ok, & Marcinkowski, 2012;Martínez, Aracón, & Hita, 2014;Simões & Alarcão, 2011;Walker & Fraser, 2005) to propose a set of items for the EVS. The set of items was revised in several rounds to remove, add and/or adapt items so they could measure the intended construct: the educational value based on the six dimensions mentioned above.
The educational researchers had the experience of using, under the same project, the System Usability Scale (SUS). This scale was developed by Brooke (1996) to quickly and easily collect a user's subjective rating of a product's usability. According to Bangor et al. (2008), who aggregated and analysed data from 2,324 questionnaires collected over 10 years, SUS is a widely used instrument in usability studies, with a high reliability: 0.91. In a similar way, EduPARK educational researchers decided to attempt to develop a scale to quickly and easily collect users' subjective rating of the educational value of an app for outdoor green settings. Taking this into account, two items from each of the abovementioned six dimensions of educational value should be included in the scale. In the iterative process of items generation and negotiation, 12 items earned both experts' agreement and were selected for inclusion in the EVS (see Table 2). Similarly to SUS, positive and negative items were alternated, to make sure the respondent reads and understands each statement before deciding whether or not to agree with it. The items were scaled in Likert format anchoring from 1 -strongly disagree to 5 -strongly agree for the analysis.  2. This app shows information in a confusing way. 3. I feel motivated to learn when I use this app. 4. I do not feel like using this app to learn. 5. Even in the difficult quiz-questions, I try to find the right answers. 6. Sometimes I respond randomly (without thinking). 7. This app shows real-world information that helps you learn. 8. I will quickly forget what I have learnt from this app. 9. Park visitors can learn from this app. 10. This app promotes learning only in a school context. 11. This app makes me feel like talking to others about nature protection. 12. This app does not help to realize that it is important to protect nature.

Internal consistency
Internal consistency of the EVS was assessed as a measure of scale reliability. Cronbach's coefficient α is one of the most widely adopted measure of the lower bound of the reliability (Hair et al., 2010). However, as this α has received some criticism in the literature, Zhang and Yuan (2016) robust Cronbach's α estimation procedure was followed to control the influence of outlying observations and leverage observations. In this procedure, three different plots are analysed to determine the adequate downweigh rate (ϕ) to compute Cronbach's α for a dataset.
Although the Cronbach's α is usually computed for data collected on a particular occasion (Taber, 2018), data aggregation from multiple studies collected through standardized scales has also been conducted before (Bangor et al., 2008). Hence, a robust Cronbach's α was estimated for all the 924 cases aggregated in one dataset and for four subpopulations within it, originated from 42 data collection events. An α value which exceeds 0.7 can be considered acceptable (Hair et al., 2010). However, other authors consider the value 0.6 as the lower bound of reliability acceptance, particularly in early stage of research (Griethuijsen et al., 2015;Nunnally, 1967).

Results and discussion
This section presents and discusses, in the light of literature, the internal consistency of EVS. Research limitations are discussed and future directions are proposed. Table 3 presents the estimated Cronbach's α for the entire dataset (924 cases) and for the four considered subpopulations within it. Complementary data, such as standard deviation (SD), confidence intervals (CI) and used downweigh rates (ϕ) are also presented. Marques, M. M., & Pombo, L.   Considering 0.7 as the α value for an acceptable reliability (Hair et al., 2010), the α for 2 nd /3 rd CBE (0.738), Secondary Teaching (0.716), Accompanying teachers (0.818) and total data set (0.732) can be considered from acceptable to good. This indicates that EVS has achieved an acceptable or good internal consistency. On the other hand, α value for 1 st CBE (0.653) falls just below the common acceptable threshold, accordingly to Hair and colleagues (2010), or this α value is just within the lower bound of acceptance (Griethuijsen et al., 2015;Nunnally, 1967). This result might be explained by the age of respondents, as younger students (age range from 6 to 11) may lack the maturity level required to truly understand and answer correctly the scale items. For example, 43.7% of these students attended school year 2, so their reading skills were still in an initial phase of development, with notable difficulties in understanding the scale items. Furthermore, the alternation between the positive and negative wording of items can lead to misinterpretation and difficulty in reversing responses from negative to positive ones. Users can forget to reverse their score accidentally agreeing with a negative item when they meant to disagree. These possible explanations are consistent with some other studies (Kortum, Acemyan, & Oswald, 2020;Ribeiro, 2020). Based on this result it is highly recommended to revise EVS items and graphic presentation of Likert scale to turn it more suitable for this school level.
The higher α value was achieved in the teachers' subpopulation, as they are adults from the educational context, frequently use to answer questionnaires, for example, in the mandatory continuous professional trainings.
As implications for research and practice, this study intends to initiate an effort of scale validation. The EVS can become a useful instrument to support researchers in developing and accessing educational app, as it can be used as a standardized educational value questionnaire. Standardized questionnaires allow the collection of systematic data supporting higher objectivity, replicability and quantification of results, among other advantages (Sauro & Lewis, 2012). Likewise, an EVS can support educators in the selection process of the educational apps suitable for their teaching practices, as if EVS becomes widely used, apps that achieve higher EVS scores with students of their education level are expected to be more suitable for their classes. Research limitations are related to the fact that empirical analysis was made on the Portuguese version of the EVS. Although the items translation to English was made by experience researchers, the participation of bilingual translator and back-translation method would improve the English version of the EVS. Likewise, eventual cultural differences were not taken into consideration.
For future research, it is recommended to adapt and to empirically analyse the internal consistency of the EVS scale for 1 st CBE students, turning it more suitable for this study level, in terms of language, graphic presentation and use of always positive worded items. Moreover, the scale relevance will increase as educational app developers use this tool as a standardized questionnaire for Educational value evaluation.