The December 2022 issue of IEEE Spectrum is here!

Close bar

Top Programming Languages Trends: The Rise of Big Data

Languages like Go, Julia, R, Scala, and even Python are riding the number-crunching wave

3 min read
Opening illustration for this feature article.
Illustration: Grzegorz Knec/Alamy

Now that IEEE Spectrum is into the third year of annually ranking languages, we can start looking at some trends over time. What languages are ascendant? Which are losing traction? And which of the data sources that we use to create our rankings are contributing the most to these shifts?

In this article I’m going to focus on so-called big-data languages, such as Julia, Python, R, and Scala. Most of these are purpose-built for handling large amounts of numeric data, with stables of packages that can be tapped for quick big-data analytic prototyping. These languages are increasingly important, as they facilitate the mining of the huge data sets that are now routinely collected across practically all sectors of government, science, and commerce.

The biggest mover in this category was Go, an open source language created by Google to help solve the company’s issues with scaling systems and concurrent programming back in 2007. In the default Spectrum ranking, it’s moved up 10 positions since 2014 to settle into 10th place this year. Other big-data languages that saw moves since 2014 in the Spectrum ranking were R and Scala, with R ascending 4 spots and Scala moving up 2 (although down from 2015, when it was up 4 places from its 2014 position). Julia was added to the list of languages we track in 2015, and in the past year it’s moved from rank 40 to 33, still a marginal player but clearly possessing some momentum in its growth.

The chief reason for Go’s quick rise in our ranking is the large increase in related activity on the GitHub source code archive. Since 2014, the total number of repositories on GitHub that list Go as the primary language went up by a factor of more than four. If we look at just active GitHub repositories, then there are almost five times as many. There’s also a fair bit more chatter about the language on Reddit, with our data showing a threefold increase in the number of posts on that site mentioning the language.

Another language that has continued to move up the rankings since 2014 is R, now in fifth place. R has been lifted in our rankings by racking up more questions on Stack Overflow—about 46 percent more since 2014. But even more important to R’s rise is that it is increasingly mentioned in scholarly research papers. The Spectrum default ranking is heavily weighted toward data from IEEE Xplore, which indexes millions of scholarly articles, standards, and books in the IEEE database. In our 2015 ranking there were a mere 39 papers talking about the language, whereas this year we logged 244 papers.

Contrary to the substantial gains in the rankings seen by open source languages such as Go, Julia, R, and Scala, proprietary data-analysis languages such as Matlab and SAS have seen a drop-off: Matlab fell four places in the rankings since 2014 and SAS has fallen seven. However, it’s important to note that both of those languages are still growing; it’s just that they’re not growing as fast as some of the languages that are displacing them.

When we weight the rankings toward jobs, we continue to see heavily used languages like Java and Python dominate. But recruiters are much more interested in R and Scala in 2016 then they were in 2014. When we collected data in 2014, there were only 136 jobs listed for Scala on CareerBuilder and Dice. But by 2016 there was more than a fourfold increase, to 631 jobs.

This growth invites the question whether R can ever unseat Python or Java as the top languages for big data. But while R has seen huge gains over the last few years, Python and Java really are 800-pound gorillas. For instance, we found roughly 15 times as many job listings for pythonistas as for R developers. And while we measured about 63,000 new GitHub repositories in the last year for R, there were close to 458,000 for Python. Although R may be great for visualization and exploratory analysis and is clearly popular with academics writing research papers, Python has significant advantages for users in production environments: It’s more easily integrated into production data pipelines, and as a general purpose language it simply has a broader array of uses.

These data illustrate that despite the desire of some coders to evaluate languages on purely internal merits—the elegance of their syntax, or the degree and nature of the abstractions used—a big driver for a language’s popularity will always be the domains that it targets, either by design or through the availability of supporting libraries.

The Conversation (0)

Intel’s Take on the Next Wave of Moore’s Law

Ann B. Kelleher explains what’s new 75 years after the transistor’s invention

5 min read
image of a black and gold computer chip against a black background

Intel's Ponte Vecchio processor

Intel

The next wave of Moore’s Law will rely on a developing concept called system technology co-optimization, said Ann B. Kelleher, general manager of technology development at Intel in an interview with IEEE Spectrum ahead of her plenary talk at the 2022 IEEE Electron Device Meeting (IEDM).

“Moore’s Law is about increasing the integration of functions,” says Kelleher. “As we look forward into the next 10 to 20 years, there’s a pipeline full of innovation” that will continue the cadence of improved products every two years. That path includes the usual continued improvements in semiconductor processes and design, but system technology co-optimization (STCO) will make the biggest difference.

Keep Reading ↓Show less
{"imageShortcodeIds":[]}

The EV Transition Explained: Charger Infrastructure

How many, where, and who pays?

8 min read
Illuminated electric vehicle charging stations at night in Monterey Park, California.

Electric vehicle charging stations in Monterey Park, Calif.

FREDERIC J. BROWN/AFP/Getty Images

The ability to conveniently charge an EV away from home is a top concern for many EV owners. A 2022 survey of EV owners by Forbes indicates that 62 percent of respondents are so anxious about their EV range that travel plans have been affected. While “range anxiety” may be overblown, the need for an extensive and reliable external charging infrastructure is not.

Keep Reading ↓Show less
{"imageShortcodeIds":[]}

Learn How Global Configuration Management and IBM CLM Work Together

In this presentation we will build the case for component-based requirements management

2 min read

This is a sponsored article brought to you by 321 Gang.

To fully support Requirements Management (RM) best practices, a tool needs to support traceability, versioning, reuse, and Product Line Engineering (PLE). This is especially true when designing large complex systems or systems that follow standards and regulations. Most modern requirement tools do a decent job of capturing requirements and related metadata. Some tools also support rudimentary mechanisms for baselining and traceability capabilities (“linking” requirements). The earlier versions of IBM DOORS Next supported a rich configurable traceability and even a rudimentary form of reuse. DOORS Next became a complete solution for managing requirements a few years ago when IBM invented and implemented Global Configuration Management (GCM) as part of its Engineering Lifecycle Management (ELM, formerly known as Collaborative Lifecycle Management or simply CLM) suite of integrated tools. On the surface, it seems that GCM just provides versioning capability, but it is so much more than that. GCM arms product/system development organizations with support for advanced requirement reuse, traceability that supports versioning, release management and variant management. It is also possible to manage collections of related Application Lifecycle Management (ALM) and Systems Engineering artifacts in a single configuration.

Keep Reading ↓Show less