Effective Schools with Big Data

Effective Schools with Big Data

Identification of good educational practices in schools with high added value using Big Data techniques is a project funded by the BBVA Foundation within the framework of its Leonardo Grants for Researchers and Cultural Creators (2017 call) which has a duration of 15 months and the following general objective:

Identification of factors associated with performance in schools with high added value for the production of a catalogue of good educational practices and its dissemination to the community.

This general objective is broken down into 3 stages carried out in chronological order.

1. Application of hierarchical linear models for identifying schools with high and low effectiveness based on large-scale assessment.

The goal of this stage is to select schools with high and low effectiveness based on the results of the Spanish sample of schools and pupils from the PISA 2015 large-scale assessment programme.

For the purposes of this project, effectiveness is understood as meaning the specific contribution of a school to its pupils’ education once the initial contextual conditions are isolated at both school level (resources, size of school and classrooms, socio-economic level of the educational community, multiculturalism, etc.) and pupil level (gender, family socio-economic level, attendance at both stages of nursery education, etc.). To quantify this effectiveness on the basis of pupils’ gross performance in the 3 major areas assessed by PISA (Reading, Mathematics and Science), the part relating to the average performance of schools not attributable to the effect of these contextual variables was isolated in order to obtain an average residual per school.

The set of statistical techniques used to carry out these calculations are generally referred to as hierarchical linear models or multilevel models. Through their application, therefore, we were able to eliminate the influence that contextual factors have on the performance of schools and pupils, obtaining a rating that provides information about the part relating to the average performance of schools that is not explained by these contextual factors. This residual average performance obtained in the hierarchical linear models is generally known as Added Value in Education.

Thus, this stage ends with identification, selection and characterisation of schools whose residual average performance is systematically and clearly maintained at higher levels (schools with high effectiveness) and lower levels (schools with low effectiveness) across the 3 areas assessed by PISA.

2. Application of Big Data techniques for identifying factors associated with performance in schools with high effectiveness.

Having selected schools whose contribution is clearly higher than one would expect given their contextual conditions (high effectiveness) and schools whose contribution is lower (low effectiveness), consideration needs to be given to examining the elements, actions and specific educational dynamics that differentiate them. Thus, this stage has the ultimate goal of identifying and analysing the main non-contextual factors (also known as process factors) attributable to schools that are clearly associated with high effectiveness in relation or contrast to schools with low effectiveness.

To fulfil this stage, the application of so-called Big Data techniques is envisaged. Data Mining techniques are applied, specifically decision trees, to identify non-contextual factors associated with high or low effectiveness, and the behaviour of their interactions. The advantage of this set of techniques compared to traditional ones is that they enable the identification and selection of valuable information in all types of data sets, even when they are large. Such is the case with large-scale educational assessments, such as the PISA 2015 tests addressed in this project.

Decision trees generate classifications or groupings of subjects based on their scores in a set of explanatory variables (in this case, non-contextual factors) taking as a reference a given criterion (in this case, that the subjects and/or schools have been considered to have high or low effectiveness). In this way, a tree is generated that, under a branched and hierarchical structure, indicates the set of non-contextual factors that is better able to explain that the subjects/schools have been considered to have high or low effectiveness in the previous stage.

Therefore, as the main contribution of this stage, the information obtained with these decision trees will be analysed to carry out a characterisation and differentiation of schools with high and low added value with respect to their levels in the non-contextual factors studied.

3. Design of a catalogue of good educational practices and dissemination of the results to the educational and scientific communities.

Having obtained and analysed the empirical data in detail, the last stage of the study is proposed, with the ultimate aim of disseminating the results to different groups that may be interested in the project. On the one hand, the educational community, and on the other, the scientific community is taken into consideration:

Educational community:

First of all, the dissemination of a broad catalogue of good practices aimed at teachers, management teams and educational institutions, including thorough information on the key factors of school effectiveness identified in the project, is proposed. As an addition, the publishing and dissemination of a summary catalogue consisting of information sheets mainly aimed at families and other educational stakeholders in the non-professional field is proposed. Lastly, this website seeks to be an element of multidirectional communication for both disseminating the results of the project and inviting feedback in order to contribute to educational improvement.

Scientific community:

Publication is defined in both scientific events and national and international impact publications in order to disseminate the results especially from the first 2 stages of the project.