16 July 2020


Spotlight: Identifying Students at Risk of Dropping Out

Spotlight posts are an occasional series where we deep dive on a successful use case developed and built by a RapidMiner user. 

Keeping college students involved and engaged has long been a struggle for institutions of higher learning. In addition to the usual distractions for young people, there are the challenges of being away from home, difficult courses, and sometimes aloof professors.

Add to that the fact that college is expensive, and there’s a lot of pressure on students and families, as well as the institutions, to maintain enrollment, with many students finishing not finishing the programs they start—according to some studies, student drop-out rates are as high as 40%.

One South American university decided to try and predict students who were at risk of dropping out using advanced data analytics.

Starting the project

Initiated as a response to the 2019 book The College Dropout Scandal, the IT department at Pontificia Universidad Javeriana (Pontifical Xavierian University)—a highly regarded Jesuit institution in Colombia—embarked on a machine learning endeavor to identify college students who are at risk of dropping out of school.

The development of the dropout identification algorithm was led by Jaime Reinoso Castillo, the director of the Centro de Servicios Informáticos (CSI) at Pontificia Universidad Javeriana.

Jaime and his team were able to collect a wide array of data from across the university, including attendance records, financial data, students’ vocational intentions, and more in order to develop their algorithm.

Getting buy-in

Analytics project like this often require executive level sponsorship and buy in. In this case, the biggest support came from the university’s Chief Financial Officer, as one might expect given the monetary implications for keeping students enrolled.

As the Chief Academic Officer, the university provost was very interested as well, given that some of the data inputs came from academic programs. As it turned out, some academic programs were more predictable than others.

“Our hypothesis is that those programs with strong strategies for avoiding or reducing academic dropout are those that generate the less predictive values,” Reinoso said. “This is because the set of students that really drop out in those cases are those with stranger or more atypical causes. For example, our psychology program has this behavior because they do extensive interviews with each student during the semester. So, they have a very good knowledge of what is happening to each student. In that case, those students that drop out were those with the most extreme cases.”

Because of this influence of different academic programs on drop-out rates, the data from the academic programs and student’s academic records, proved to be the most powerful predictors, including data like:

Data difficulties

Reinoso also described some of the data collection and management issues his team dealt with during the course of this project.

“We discovered that we required 5 years of historic data… [B]ecause the principal source of the information is the Academic Register System (PeopleSoft in our case), [the] information is pretty stable and high quality. Other additional information required much more work.  For example, information about extra-academic activities came from other sources.” Reinoso said that data analysis “happened pretty fast, and was completed in a couple of weeks” once the data were in hand.

The project was not shared with any students, and was geared toward academic program managers who make decisions about strategies to reduce or avoid losing students. “Nevertheless,” Reinoso said, “some staff had adverse reactions because they do not understand the technology and they felt that they were dealing with a black box.”

From a budget perspective, this project initially took two months for two engineers. Reinoso thought that subsequent runs of the project would be less expensive because programs to extract and collate the data wouldn’t need to be built again. From the perspective of ROI, Reinoso believes that “that at minimum a 10% reduction in academic dropout can be reached easily.”

University enrollments during a pandemic

Given today the current global pandemic, higher education is experiencing challenges determining which students will attend in the first place, never mind drop out. We asked Reinoso if he thought that the machine learning algorithm he developed might help administrators gain additional insights into student behaviors.

“We believe that academic dropout behavior is going to change. So we are waiting to complete data for this year (2020) in order to apply the algorithms we developed again. We do have some new data available now: for example, we have data about Panopto’s and Zoom statistics that should be added to the original data set.”

Concluding thoughts

Lowering the college student dropout rate is an ongoing worldwide challenge, and the insights provided by Pontificia Universidad Javeriana Cali’s machine learning project might help universities understand and manage the challenges, especially in the current environment.

If you’d like to see what kind of impact machine learning can predict for your business, sign up for a free AI assessment. We’ll walk you through possible use cases and see where you can have the biggest impact with AI.

Related Resources