TEACHING-LEARNING PROCESS UNDER A MULTIDIRECTIONAL EFFICIENCY AND CANONICAL CORRELATION ANALYSIS

The detailed study of the agents and factors involved in the teaching-learning process is vital to advance in the proposal of strategies that contribute to their improvement in all aspects (methodology, evaluation, resource management, between others). Today, many theoretical studies on the quality of education are known, which makes this a broad topic of discussion. However, education is a universe in constant motion which imposes the necessity of new systematic studies from different and relevant perspectives. This work aims to contribute to the study of the quality of higher education, evaluating the teaching-learning process, under two approaches. First, examining the technical efficiency of the institutions of higher education (approach 1), and also evaluating the performance of the students of these institutions, in accordance with the influence of the training of the teachers assigned to them (approach 2). Specifically, the use case study relates to 335 data units representing Colombian higher education institutions, corresponding to 165 institutions in the year 2016 (47 publics and 118 privates), and 170 institutions in the year 2017 (48 publics and 122 privates). The analysis considers variables of training/research for teacher and variables of graduation/performance for students. For the estimation of the efficiency (approach 1) has used a non-parametric, method based on the Multidirectional Efficiency Analysis (MEA). The study is focused on efficiency score, efficiency ratios and analysis of the output inefficiency index. For the exploration of the relationship, academic qualification/performance (approach 2), is executed a Canonical Correlation Analysis (CCA). Box-Cox transformations and filtering analysis to deal with outliers are applied. The statistical significance of the canonical correlations found, are shown with different statistical tests. The greatest contribution of this work is the appropriate combination of the MEA and CCA methodologies, which allows a broader vision of what happens in the scheme: institution efficiency/student performance/teacher training. Our results allow us to characterize the institutions in terms of efficiency, discriminating them into seven groups according to the training offer (doctoral programs, masters, specialization, etc.) and comparatively establish the most relevant relationships between teacher training / research and performance of the student.


INTRODUCTION
Perceiving the role that each active agent (as student or teacher), according to its implicit characteristics, interprets in the teaching-learning process, is vital to establish measures that allow improved educational quality in higher education. In this sense, there are many criteria that must be taken into account and that make the difference in the results obtained, such as the pedagogy used by the teacher to teach and the responsibility applied by students to learn. But it is important to keep in mind that there are other variants in this process, such as the existence of a defined study plan according to the knowledge previously acquired by the students (prerequisites) or the physical resources made available by the higher institution to apply different methodologies of teaching. In other words, the importance of the efficiency of the institution in the educational process must also be established, since we cannot ignore that the results of the students could not be the same, if they belong to an institution that does not have a solid structure (for example poor management of resources, overload of physical capacity, etc.) to facilitate the learning process or take it to a higher level.
In this study, we focus our attention under two perspectives: examining the technical efficiency of the higher institution (approach 1) and analysing to what extent teacher training and research influences student results (approach 2), see Figure 1. In the literature, there are interesting research on the teaching-learning process. We highlight here some results on the approaches outlined above. About approach 1, Hanushek claims that the key to success lies in good teacher training and not in aspects such as in a large investment in education or in the proportion of students, [1]. Instead, for Konsik, research knowledge, knowledge of pedagogies, teaching methods, and processes (reading, writing, and teaching), and knowledge of government initiatives and educational institutions, they are the basis for planning teaching that goes beyond the classroom [1]. In this sense, Thanassoulis et. al, proposed an integrated approach to higher education teaching evaluation that combines the analytical hierarchy process and data analysis, [2]. About approach 2. According to Villa and Villar, established personal interrelationships (between students and teachers within the educational context) are the main tool to build learning, see [3]. For Buendia and Olmedo, the quality of learning is related to the quality of teaching, [4]. Teachers who know their teaching approaches and use them with learning approaches, perform better, according Maquilon [5]. Murillo et. al, used mathematical techniques to show that, there are multivariable correlations between teacher education and student outcomes for a precise set of factors, [6]. For Beck, program planning, professional identity, evaluation and student monitoring are one of the keys to building a quality education [1].
Specifically, this study evaluates the teaching-learning process of 335 Colombian higher education institutions, considering two very different approaches. Approach 1, examines the technical efficiency of the institutions. In this approach the study is considered under different education approaches (doctoral studies, master studies, specialization programs, etc). Approach 2, evaluates the performance of the students of these institutions, in accordance with the influence of the training of the teachers assigned to them. In this approach the study includes indistinctly institutes with different education approaches.
The analysis is made, under an appropriated methodology for each approach. A non-parametric, method based on the Multidirectional Efficiency Analysis (MEA) is used in the approach 1. The study is focused on efficiency score, efficiency ratios and analysis of the output inefficiency index. This study has an important contribution, using little used exploratory statistical techniques, that allows us to address important questions in improving educational education at a higher level such as: how strong is the correlation that exists between teacher's education and student performance? Are these correlations preserved year by year? To what extent do the correlations between the active agents intervene in the efficiency of education? In what structural way are the three parts related (teacher training, student results and institution efficiency)?
The remainder of the paper is laid out as follows. The next section a brief overview of the methodology used in each approach is given. Section 3 characterizes the data of the study and presents the main results. Finally, in section four, some concluding remarks are formulated.

METHODOLOGY
The analysis proposed in this study involves the application of two main techniques: a non-parametric method based on multidirectional efficiency analysis is used to estimate the efficiency (approach 1). A CCA is performed to explore the relationship, academic qualification / performance (approach 2).

Multidirectional Efficiency Analysis (MEA)
The MEA model also called potential improvement of data envelopment analysis is a non-parametric model appropriate for measuring levels of efficiency, see [7]. The general idea of the MEA model is the following. Let n=(s, c, t) ∈ N be a tuple identifying the sector s ∈ S, dimension c ∈ C and year t ∈ T. Considers that any given tuple n ∈ N produces J ∈ N outputs yj(n), j ∈ [J], using I ∈ N inputs xi(n), i ∈ [I] ([m] denotes the set {1,..., m}, for some m ∈ N). Therefore, x(n) ∈ R I is the vector of all the inputs and y(n) ∈ R J is the vector of all the outputs. Here the discretionary inputs (their values can be changed) are represented by the first indices d, 1< d < I.
For a given data set Z={z(n)}N with z(n)=(x(n),y(n)), the technical efficiency of each institution is measured by calculating the MEA score of each n #∈ N, defined as where # * (n #), β ' * (n #) and * (n #) represent the corresponding optimal solutions to the linear optimization problems Pi α (z, n), Pj β (z, n) and P γ (z, n, α * , β * ): The problem Pi α (z, n #) refers to the minimization of inputs, Pj β (z, n #) to the maximization of outputs and P γ (α * ,β * , , n #) to the realization of both (minimization of inputs and maximization of outputs) simultaneously. The value in equation (1) varies between 0 and 1, with fully efficient institutions having efficiency scores equal to 1. In addition to the calculation of MEA, the equation (2), based on the ideas in [8], is used to know the number of times each output was used inefficiently.

Definition 2
The inefficiency index for each output is given by for j ∈ [J] and tuple n #ϵ N.

Canonical Correlation Analysis (CCA)
The CCA is a multivariate analysis of correlation used to measure the associations among two sets of variables. The coefficient obtained measures the strength of association between two canonical variates see [9]. The general idea of CCA is the following. Let X be a set of variables with p number of elements and Y a set of variables with q number of elements. Consider U be the set of linear combinations of X, Ui=ai1X1+ ai2X2+…+ aipXp with i=1…p and V be the set of linear combinations Y, Vj=bj1Y1+ bj2Y2+…+ bjqYq with j=1…q. Suppose p ≤ q and define (Ui, Vi) as the i th canonical variate pair, where each member of U is paired with a member of V. The canonical correlation for the ith canonical variate pair, between the linear combinations Ui and Vi, is given by where cov( # , # ) denotes the covariance between Ui and Vi and var( # ) denotes the variance of the variable # . (similar to # ). The CCA consists in to find linear combinations of the X's and linear combinations of the Y's that maximize the correlation in (3). This mean, to find the coefficients of # and # that maximize the canonical correlation # subject to the constraints that with all the remaining correlations equal zero.
In this approach, are also applied techniques as the use of Box-Cox transformations, to modify the distributional shape of a set of data, see [10]. In order, to test the ith canonical correlation and all that follow it are zero, is used statistical tests based on Wilks Lambda, Hotelling-Lawley Trace, Pillai-Bartlett Trace and Roy's Largest Root, see [11].

Data characterization
The proposed analysis is applied to 335 Colombian HEIs (higher education institutions), 165 (2016) and 170 (2017), which are divided into seven academic sectors (see Table 1). For the purposes of this study, indicators of the quality of public and private HEIs are considered, determined by the Model of Educational Performance Indicators (MIDE), see [12]. The Colombian model includes three main dimensions (students, teaching, environment) associated with educational quality. We studied the variables considered by the multidimensional MIDE model in 2016 and 2017, following the structure presented for each year (see Table 2). The data set includes information supplied by the Colombian ministry.

Approach 1 based on MEA
For an analysis more detailed, in this approach we consider the student results divided into two dimensions: Performance and Graduates (see Table 2). We calculated the MEA score for each institution in each year, considering the variables of teacher education and research outcomes as the inputs and the students results as the outputs.
Defined EFF as subset of sector/dimension/year ternaries n=(s, c, t) such as 0, 6 ≤ MEA (s, c ,t) ≤ 1,0. The total mean efficiency by each sector is provided in the Table 3, and the percentage of institutions with MEA efficiency score equal to 1 (Full-EFF) are presented in Figure 2.   Table 3 and Figure 2, allow us to see the behaviour of the sectors in each dimension, throughout the study period. The performance dimension is less efficient than graduates in both years. S5 is a sector with good performance. In 2016 are the sectors with the highest average total efficiency in graduates (2017), performance and graduates (2016).
The values (percentages) in Figure 3 represent the number of times, in which each output was used inefficiently in each sector (see equation 2). In the Figure is represented the output (first, second and third) for each dimension according to the order established in the Table 2. Paradoxically, although some variables were modified (see Table 2) and the number / distribution of institutions is not the same (see Table 1), we found many similarities in the results of the two years. The three outputs in the dimension performance were equally misused and in the graduated dimension, the order of inefficiency output is the same: second, third, and first output.

Approach 2: CCA
We used a CCA to establish the correlation between the academic qualification of teachers of HEI (groups of variables Y) and results of their students (group of variables X) in Colombian.
We normalized the variables dividing by the number of students and then apply a filter to shoot the outliers. In the Figure 4, the normality and correlations of the data in the year 2016, before and after of applied of these two techniques are showed. In this case, the filter used was V04>0.9 and V06>1.0, reducing the sample in this year to 162 institutions Since the multivariate normality of the variables is required, we need apply the Box-Cox transformation. For this, we need guarantee that the values are positive, and it is possible using the calculation in equation (4). ##### represent the average and sd( #0 * ) and sd( '1 * ) the standard deviation (the same form is used for #1 * ).
The heat correlation graphs are shown in Figure 5 and Figure 6. Here the colour intensity is proportional to the Pearson correlation values. The existing correlations between the variables can be perceived using the correct interpretation of the multicolour code: negative correlation (represented by the blue colour) to the positive correlation (red) and null correlation (green). In this way we find similarities and differences in the correlations from year to year (2016: X correlation (between 6 variables), Y correlation (between 6 variables) and 2017: X correlation (between 5 variables), Y correlation (between 6 variables), see Table 2). The use of colour allows us to easily perceive the great differences between the initial data and the processed data in each year and at the same time from year to year. Note that the graphs reflect a higher negative correlation in the processed data.  The canonical variate pairs were analysed for each year and the results of the five statistical tests applied are provided in Table 4. As we can see, in both years, considering 5% as the significance level, only the two last canonical variate pairs are not significantly correlated.  To finish the analysis, we need to carry out two important steps. Identify the contributions of the individual variable to each corresponding canonical variable Xc and Yc (i.e. the magnitudes of the coefficients) and estimate the correlation coefficients between canonical variables and initial variables. The results of the two steps for 2017, can be seen in Tables 6 and Table 7.

CONCLUSIONS
The present work uses the MEA and CCA techniques, with the aim of contributing to the improvement of educational quality at the higher level, under two approaches: examining the technical efficiency of institutions (approach 1), and establish the relationships between student performance and teacher training (approach 2). The analyses applied to Colombian higher education institutions between the years 2016-2017, showed interesting results in each approach. Universities more specialized with 2 to 4 areas of knowledge, obtained higher levels of efficiency in relation to other universities. According to the teacher training, the results in quantitative reasoning critical reading, and written communication can be improved by almost 60%. In general terms, the "performance" dimension performs better throughout the study period than the "graduates" dimension. Regarding approach 2, the results are maintained year by year (2016-2017), with little variation in the two types of correlations found: direct and by groups.
Among the most outstanding we find direct correlations (in the same group of variables) between critical reading and written communication; quantitative reasoning and critical reading; and teachers with a doctorate and the number of citations. Correlations by groups of factors: teachers with doctoral training and high experience in research on students' cognitive abilities (quantum reasoning, communicative reading, and written communication); teachers with master's degrees on students' ability to read critically, internal scores established by the MIDE model (V11, V23) and written communication.
All the above shows the complexity of the teaching-learning process and the importance of using competent tools to adequately examine and evaluate each of the different factors involved. Let us not forget that in educational institutions it is important to keep up with the needs and resources available