Hey there, human — the robots need you! Vote for IEEE’s Robots Guide in the Webby Awards.

Close bar

New Zealand Startup Seeks to Automate (Most) Code Review

CodeLingo has developed a tool that it says can help developers with code reviews, refactoring, and documentation

4 min read
Focused black female programmer coding on a computer in an office.
Photo: Getty Images

Software developers are force-multipliers. Yet, instead of spending their time building new products or services, software developers are wasting too much of it on maintaining existing code.

CodeLingo, a New Zealand-based startup founded in 2016, aims to change that. The company has developed an automated code review tool that catches new and existing issues in code. CodeLingo's search engine and query language finds patterns across the code base and uses those patterns to automate code reviews, code refactoring (restructuring existing code to optimize it), and contributor documentation.

According to a 2018 study by Stripe, developers could increase the global GDP by US $3 trillion over the next 10 years. But developers spend almost half of their working hours—that’s 17 hours in an average 41-hour work week—on code maintenance. This includes finding and repairing bugs, fixing bad code, and refactoring. This equates to an estimated $300 billion loss in productivity each year.

CodeLingo hopes to recapture some of that loss so it could be spent on what matters. “CodeLingo is, in essence, an analysis platform,” says founder Jesse Meek. “It treats the whole software stack as data, then looks for patterns in that data and ways to automate common development workflows, such as finding and fixing bugs, automatically refactoring the code base, automating reviews of pull requests as they come into a repository, and automating the generation of contributor documentation.”

CodeLingo in action: automatically catching project-specific issues in existing codeThe CodeLingo tool catches an issue specific to this project in a batch of old code.Image: CodeLingo

To represent the software stack as data, CodeLingo extracts and transforms it into vertices and edges, which are then loaded into a graph. The platform uses a custom-built search tool the company calls tenets to scan the graph for patterns, then performs actions that have been associated with those patterns in the past, such as adding code review comments and lines of documentation, or rewriting code. “CodeLingo is basically a query engine for a graph,” Meek says.

Tenet corresponding to the automated workflow in FindingCodeIssues.jpgThis tenet corresponds to the automated workflow shown in the above example.Image: CodeLingo

A tenet, the company says, is an “underlying principle guiding a workflow” in CodeLingo. It defines a team’s coding standards and best practices. It’s essentially a query pattern written in CodeLingo’s custom query language, CLQL (CodeLingo Query Language), and is attached to certain workflows to automate them. CodeLingo offers existing tenet bundles that can be applied to existing code, but developers can also write their own.

Automating software development workflows could speed up the coding process. “Avoiding manual code reviews is time-saving. Code will be generated faster if programmers are able to focus just on code development,” says Diana de la Iglesia, a computer scientist at the Spanish National Cancer Research Center.

For Lana Sinapayen, a researcher at Sony Computer Science Laboratories in Japan, the strength of an automated tool is twofold. “It forces teams to make implicit assumptions explicit (‘We use camel case’ or ‘This function should be used rather than that other one’) and enforces those shared assumptions automatically.”

But the challenge, she adds, is that you may wind up spending more time using the tool than you would doing the task at hand. “You have to write and maintain the rules telling the tool what to do, and that constitutes overhead time,” she says.

What makes CodeLingo different from other code quality methods, according to Meek, is its balance between the generic and the specific. “There’s a sweet spot in the middle—codifiable knowledge and practices general enough that they warrant automation but specific enough that you won’t find them in off-the-shelf solutions,” he says.

“Code will be generated faster if programmers are able to focus just on code development.”

That’s where CodeLingo sits, combining the generic functionality of linters—tools that analyze code in a particular programming language for errors—and the specificity of code reviews to allow software development teams to scale without sacrificing code quality.

But teams still shouldn’t rely on tools alone. “A hybrid approach where developers commit to producing clean code, follow standard coding style guides, and stick to good practices while getting additional support from tools like CodeLingo would be ideal,” says de la Iglesia.

Though CodeLingo is still in beta mode, it has already helped repair errors in the code bases of companies including Baidu, Dropbox, HelloFresh, Kubernetes, and Sky UK. In March 2019, the team launched an awareness campaign in which they looked at past issues within a handful of GitHub repositories and coded up tenets for them, free of charge. The company is in the final stages of its campaign and will be gathering metrics to drive a seed funding round.

Meek says, for every five repositories they reach out to, four accept CodeLingo’s changes and express interest in what else they can do. The platform has been experiencing 50 percent week-on-week growth, with up to 350 users and 3,000 repositories installing their tool.

For now, CodeLingo is focused on improving the stability and performance of its platform, but the team hopes to further expand the system. “There are two vectors of growth. One is the query engine itself—the expressibility of the language in the query engine and the tenets we store. The other is the domains of data we can query,” says Meek. This means extending the capabilities of the system to not only query code, but also crunch version control and runtime information. “As the capacity of our platform matures, it could give you insight not just on how one unit of the infrastructure is working, but on how the full ecosystem comes together,” he says.

Meek envisions CodeLingo as a tool that could continuously maintain high standards. “No matter if you're a two-person weekend team or a 500-person-strong development team, you can always do better,” he says. “That’s the kind of teams we hope to work with—those who are always looking to raise their own quality bar, with CodeLingo supporting those higher standards.”

The Conversation (0)