It’s easy enough for students to sign up for online university courses, but getting them to finish is much harder. Dropout rates for online courses can be as high as 80 percent. Researchers have tried to help by developing early warning systems that predict which students are more likely to drop out. Administrators could then use these predictions to target at-risk students with extra retention efforts. And as these early warning systems become more sophisticated, they also reveal which variables are most closely correlated with dropout risk.
In a paper published 16 April in IEEE Transactions on Learning Technologies, a team of researchers in Spain describe an early warning system that uses machine learning to provide tailored predictions for both new and recurrent students. The system, called SPA (a Spanish acronym for Dropout Prevention System), was developed using data from more than 11,000 students who were enrolled in online programs at Madrid Open University (UDIMA) over the course of five years.
Juan J. Alcolea, director of analytics at the company Dimetrical, saw an opportunity with such a large data set and sought to partner with researchers at the online university, as well as those at Universidad Autónoma de Madrid, to develop SPA using machine-learning techniques. “Dropout has always been a key concern in higher education institutions, so it was a clear problem to benefit from these new techniques,” he says. “Especially in distance education, where the dropout problem is traditionally bigger—but the amount of potentially useful information available [is also bigger].”
SPA includes not just personal data (e.g., like age or gender), economic data (e.g., fee payment type), administrative data, academic results, and early/late enrollment information. Critically, it also incorporates behavioral data from the university’s online learning management system. This includes data capturing the time of day and duration of student activity, for example. SPA may consider as many as 120 variables when creating a risk profile for each student, which is stated as an overall percentage (for example, student A has a 60 percent chance of dropping out).
For new students, less data is available. But as the machine-learning algorithms account for additional variables for recurring students, interesting patterns emerge. “Features not available for new students, such as performance rate in previous years or percent of degree completion…are consistently chosen by models for recurrent students, revealing their superior predictive power over many of the common features available for both types of students,” explains Alcolea. In other words, some of the factors that best predict a student’s likelihood of sticking with their classes can be measured only after students have stuck with the program for at least a year.
Still, the variables that SPA focuses on hint at some intriguing patterns for dropout risk. For instance, age was an important factor in predicting dropout risk for new students. Students under the age of 20 were more likely to drop out than older students. How students distribute their online activity throughout the day—specifically at lunch time or nightly—is also related to risk; more activity during lunch hours was associated with higher risk scores, while more activity at night was associated with lower risk scores. Notably, SPA hinted at a gender gap, with women at a higher risk of dropping out compared with men.
“Metrics like the amount, length, and timing of messages among students and their teachers, or various features based on activity trends—like activity that remains constant, increases, or decreases—seem to have low or no predictive power at all, contrary to what we initially thought,” Alcolea says.
Susan Therriault, a managing researcher at the American Institutes for Research, specializes in developing early warning systems for K-12 schools in the United States. Therriault says that early warning systems like SPA, which are founded on online programs, are able to incorporate a lot more data than what’s traditionally included in early warning systems for K-12, where data may be limited to just attendance or grades, for example. But she’s cautious about jumping to conclusions on patterns revealed by predictive modeling tools. “One of the things that’s pretty clear is that predictive analytics demonstrates symptoms and not the problems, and you can’t necessary diagnose [those problems] with the symptom information. You usually have to dig deeper,” she says.
While SPA reveals some intriguing factors for predicting dropout risk, the tool was ultimately designed for more practical applications—retaining students. University administrators can generate monthly reports that assign students risk scores, both in absolute terms and relative to their peers. When administrators identify students at risk of dropping out, officials can reach out by email or phone. Moving forward, Alcolea says his team plans to analyze the effectiveness of various retention measures.