The December 2022 issue of IEEE Spectrum is here!

Close bar

Early Warning System Predicts Risk of Online Students Dropping Out

With the new system, every student is scored based on how likely they are to finish their courses

3 min read
Woman sitting in front of a computer, looking tired.
Photo: iStockphoto

It’s easy enough for students to sign up for online university courses, but getting them to finish is much harder. Dropout rates for online courses can be as high as 80 percent. Researchers have tried to help by developing early warning systems that predict which students are more likely to drop out. Administrators could then use these predictions to target at-risk students with extra retention efforts. And as these early warning systems become more sophisticated, they also reveal which variables are most closely correlated with dropout risk.

In a paper published 16 April in IEEE Transactions on Learning Technologies, a team of researchers in Spain describe an early warning system that uses machine learning to provide tailored predictions for both new and recurrent students. The system, called SPA (a Spanish acronym for Dropout Prevention System), was developed using data from more than 11,000 students who were enrolled in online programs at Madrid Open University (UDIMA) over the course of five years.

Juan J. Alcolea, director of analytics at the company Dimetrical, saw an opportunity with such a large data set and sought to partner with researchers at the online university, as well as those at Universidad Autónoma de Madrid, to develop SPA using machine-learning techniques. “Dropout has always been a key concern in higher education institutions, so it was a clear problem to benefit from these new techniques,” he says. “Especially in distance education, where the dropout problem is traditionally bigger—but the amount of potentially useful information available [is also bigger].”

SPA includes not just personal data (e.g., like age or gender), economic data (e.g., fee payment type), administrative data, academic results, and early/late enrollment information. Critically, it also incorporates behavioral data from the university’s online learning management system. This includes data capturing the time of day and duration of student activity, for example. SPA may consider as many as 120 variables when creating a risk profile for each student, which is stated as an overall percentage (for example, student A has a 60 percent chance of dropping out). 

For new students, less data is available. But as the machine-learning algorithms account for additional variables for recurring students, interesting patterns emerge. “Features not available for new students, such as performance rate in previous years or percent of degree completion…are consistently chosen by models for recurrent students, revealing their superior predictive power over many of the common features available for both types of students,” explains Alcolea. In other words, some of the factors that best predict a student’s likelihood of sticking with their classes can be measured only after students have stuck with the program for at least a year.

Still, the variables that SPA focuses on hint at some intriguing patterns for dropout risk. For instance, age was an important factor in predicting dropout risk for new students. Students under the age of 20 were more likely to drop out than older students. How students distribute their online activity throughout the day—specifically at lunch time or nightly—is also related to risk; more activity during lunch hours was associated with higher risk scores, while more activity at night was associated with lower risk scores. Notably, SPA hinted at a gender gap, with women at a higher risk of dropping out compared with men.

“Metrics like the amount, length, and timing of messages among students and their teachers, or various features based on activity trends—like activity that remains constant, increases, or decreases—seem to have low or no predictive power at all, contrary to what we initially thought,” Alcolea says.

Susan Therriault, a managing researcher at the American Institutes for Research, specializes in developing early warning systems for K-12 schools in the United States. Therriault says that early warning systems like SPA, which are founded on online programs, are able to incorporate a lot more data than what’s traditionally included in early warning systems for K-12, where data may be limited to just attendance or grades, for example. But she’s cautious about jumping to conclusions on patterns revealed by predictive modeling tools. “One of the things that’s pretty clear is that predictive analytics demonstrates symptoms and not the problems, and you can’t necessary diagnose [those problems] with the symptom information. You usually have to dig deeper,” she says.

While SPA reveals some intriguing factors for predicting dropout risk, the tool was ultimately designed for more practical applications—retaining students. University administrators can generate monthly reports that assign students risk scores, both in absolute terms and relative to their peers. When administrators identify students at risk of dropping out, officials can reach out by email or phone. Moving forward, Alcolea says his team plans to analyze the effectiveness of various retention measures.

The Conversation (0)

Why Functional Programming Should Be the Future of Software Development

It’s hard to learn, but your code will produce fewer nasty surprises

11 min read
Vertical
A plate of spaghetti made from code
Shira Inbar
DarkBlue1

You’d expectthe longest and most costly phase in the lifecycle of a software product to be the initial development of the system, when all those great features are first imagined and then created. In fact, the hardest part comes later, during the maintenance phase. That’s when programmers pay the price for the shortcuts they took during development.

So why did they take shortcuts? Maybe they didn’t realize that they were cutting any corners. Only when their code was deployed and exercised by a lot of users did its hidden flaws come to light. And maybe the developers were rushed. Time-to-market pressures would almost guarantee that their software will contain more bugs than it would otherwise.

Keep Reading ↓Show less
{"imageShortcodeIds":["31996907"]}