IEEE Top Programming Languages: Design, Methods, and Data Sources

Here are the metrics we use to build an overall ranking of programming language popularity

4 min read

The IEEE Spectrum Top Programming Languages app synthesizes 11 metrics from eight sources to arrive at an overall ranking of language popularity. The sources cover contexts that include social chatter, open-source code production, and job postings. Below, you'll find information about how we choose which languages to track and the data sources we use to do it.

What We Track

Starting from a list of over 300 programming languages gathered from GitHub, we looked at the volume of results found on Google when we searched for each one using the template "X programming" where "X" is the name of the language. We filtered out languages that had a very low number of search results and then went through the remaining entries by hand to narrow them down to the most interesting. We labeled each language according to whether or not it finds significant use in one or more of the following categories: Web, mobile, enterprise/desktop, or embedded environments.

Our final set of 55 languages includes names familiar to most computer users, such as Java, stalwarts like Cobol and Fortran, and languages that thrive in niches, like Haskell. We gauged the popularity of each using 11 metrics across eight sources in the following ways:

Google Search

We measured the number of hits for each language by using Google's API to search for the template "X programming." This number indicates the volume of online information resources about each programming language. We took the measurement in June 2021, so it represents a snapshot of the Web at that particular moment in time. This measurement technique is also used by the oft-cited TIOBE rankings.

Google Trends

We measured the index of each language as reported by Google Trends using the template "X programming" in June 2021. This number indicates the demand for information about the particular language, because Google Trends measures how often people search for the given term. As it measures searching activity rather than information availability, Google Trends can be an early cue to up-and-coming languages. Our methodology here is similar to that of the Popularity of Programming Language (PYPL) ranking.

Twitter

We measured the number of hits on Twitter for the template "X programming" for the 12 months ending June 2021 using the Twitter Search API. This number indicates the amount of chatter on social media for the language and reflects the sharing of online resources like news articles or books, as well as physical social activities such as hackathons.

GitHub

GitHub is a site where programmers can collaboratively store repositories of code. Using the GitHub API and GitHub tags, we measured two things for the 12 months ending June 2021: (1) the number of new repositories created for each language, and (2) the number of active repositories for each language, where "active" means that someone has edited the code in a particular repository. The number of new repositories measures fresh activity around the language, whereas the number of active repositories measures the ongoing interest in developing each language.

Stack Overflow

Stack Overflow is a popular site where programmers can ask questions about coding. We measured the number of questions posted that mention each language for the 12 months ending June 2021. Each question is tagged with the languages under discussion, and these tags are used to tabulate our measurements using the Stack Exchange API.

Reddit

Reddit is a news and information site where users post links and comments. On Reddit we measured the number of posts mentioning each of the languages, using the template "X programming" from June 2020 to June 2021 across any subreddit on the site. We collected data using the Reddit API.

Hacker News

Hacker News is a news and information site where users post comments and links to news about technology. We measured the number of posts that mentioned each of the languages using the template "X programming" for the 12 months ending June 2021. Just like those used by the websites Topsy, Stack Overflow, and Reddit, this metric also captures social activity and information sharing around the various languages. We used the Algolia Search API.

CareerBuilder

We measured the demand for different programming languages on the CareerBuilder job site. Because there is no publicly available API, we manually searched for listings including each language. Because some of the languages we track could be ambiguous in plain text—such as Go, J, Processing, and R—we manually inspected listing to remove false positives (for example, listings looking for experience with the Americans with Disabilities Act rather than the Ada programming language.)

IEEE Job Site

We measured the demand for different programming languages in job postings on the IEEE Job Site. Because some of the languages we track could be ambiguous in plain text—such as D, Go, J, Processing, and R—we use strict matching of the form "X programming" for these languages. For other languages we use a search string composed of "X AND programming," which allows us to capture a broader range of relevant postings. Because no externally exposed API exists for the IEEE Job Site, we extracted data using an internal custom-query tool in July 2021.

IEEE Xplore Digital Library

IEEE maintains a digital library with over 3.6 million conference and journal articles covering a range of scientific and engineering disciplines. We measured the number of articles that mention each of the languages in the template "X programming" for the years 2020 and 2021. This metric captures the prevalence of the different programming languages as used and referenced in scholarship. We collected data using the IEEE Xplore API.

Updated 24 Aug 2021

The Conversation (0)