Whether or not massive open online courses (MOOCs) are the future of education, one thing is certain: They are massive. In 2011, one computer-science MOOC at Stanford University attracted 160 000 students.
How do you grade that many students? If it’s a multiple-choice exam, it’s easy. But what if you want to measure the quality of an answer?
Grading student-produced software poses that sort of problem. You might think that computer software could easily measure not only the functionality of another software program but also how well it works and how well it’s written. After all, the computer is grading something in its own language. But such automated grading hasn't been as easy as you’d expect.
According to Varun Aggarwal, the chief technology officer for Aspiring Minds, a company based in Gurgaon, India, that assesses education and training techniques for institutions, automated program grading remains a problem because there are just too many correct—or nearly correct—ways to write a program that performs a desired function.
Looking at the number of test cases a program passes is the most widely used method right now for autograding a computer program. The problem with this method is that the student-submitted program may work, but it could be inefficient in the way it arrives at the result; it may also employ bad programming practices. Conversely, code may pass very few of the test cases simply because of inadvertent, silly errors, even though it is logically quite close to a correct solution and deserves a high grade.
Aggarwal and his colleagues at Aspiring Minds thought: What if we just abandon the test-cases metric and develop an automated grading system that approximates how a human would grade? This set them off on a quest that ultimately led them to an artificial-intelligence (AI) approach involving machine learning. AI has only recently become a possible solution for improving the automated grading of programs because MOOCs are making an avalanche of human-graded data points available.
“Artificial-intelligence approaches, like machine-learning methods, are extremely data hungry,” explains Aggarwal. “And we think these are the best set of tools to use to solve this particular problem of the digital age, and MOOCs in particular promise an abundance of such digitized samples.”
To set the AI in the right direction, Aggarwal and his colleagues looked at a set of programs graded by experts to identify what factors earn programs a particular grade. From the human-graded programs, the AI was able to automatically derive those features that a human would weigh while grading a program.
Then Aggarwal and his colleagues employed machine-learning techniques, such as regression analysis. These methods helped them find which features of a particular programming problem were important enough to actually distinguish between the good programs and the not-so-good programs. For instance, an important feature could be the number of nested loops used in a program.
“Through regression, we test whether programs having a particular feature are graded higher than others,” explains Shashank Srikant, an R&D engineer at Aspiring Minds and coauthor of the paper describing the AI, which was presented at the International Conference on Machine Learning and Applications (ICMLA’13) last month. If the answer is yes, according to Srikant, it becomes a useful feature. Then the AI derives a library of features from the programs. A model based on the useful features is then used to assign the grade to any new program the AI needs to evaluate.
The machine-learning technique stacked up quite favorably against other automated-grading methods. For example, comparing the grades given by test-case-based autograding to those given by human graders gives a correlation—a measure of how similar two sets of data are—of 0.6 to 0.75. By comparison, the machine-learning approach provided a much closer correlation of 0.8 to 0.9.
Aspiring Minds is already in some discussions with MOOCs and the universities behind them to collaborate and to dip into the universities’ datasets of human-graded programs. In the meantime, the company will offer this automated grading in its computer programming assessment product, Automata, in the first quarter of 2014. To beat out today’s methods, “we need to show the validity of our approach on more problems and data sets,” says Aggarwal. “As people see the approach works well on their class data from a previous year, they will become confident to use it the next year. The more data we have, the better models we can make.”
While the AI technique promises to bring improved automated grading to MOOCs, it may also prove beneficial to the students learning programming.
“I think the major disruption of this work will be in building automatic programming teaching systems, providing feedback, hints, and guidance in real time to students based on their submitted code,” says Aggarwal. “This can superscale programming education across the world, which is right now possible only with very good teaching assistants.”
About the Author
Dexter Johnson, based in Madrid, has been writing the Nanoclast blog for IEEE Spectrum since 2009.