IEEE Top Programming Languages: Design, Methods, and Data Sources

The IEEE Spectrum Top Programming Languages app synthesizes 12 metrics from 10 sources to arrive at an overall ranking of language popularity. The sources cover contexts that include social chatter, open-source code production, and job postings. Below, you’ll find information about how we choose which languages to track and the data sources we use to do it.

What We Track

Starting from a list of over 300 programming languages gathered from GitHub, we looked at the volume of results found on Google when we searched for each one in using the template “X programming” where “X” is the name of the language. We filtered out languages that had a very low number of search results and then went through the remaining entries by hand to narrow them down to the most interesting. We labeled each language according to whether or not it finds significant use in one or more of the following categories: Web, mobile, enterprise/desktop, or embedded environments.

Our final set of 48 languages includes names familiar to most computer users, such as Java; stalwarts like Cobol and Fortran; and languages that thrive in niches, like Haskell. We gauged the popularity of each using 12 metrics across 10 sources in the following ways:

Google Search

We measured the number of hits for each language by using Google’s API to search for the template “X programming.” This number indicates the volume of online information resources about each programming language. We took the measurement in June 2016, so it represents a snapshot of the Web at that particular moment in time. This measurement technique is also used by the oft-cited TIOBE rankings.

Google Trends

We measured the index of each language as reported by Google Trends using the template “X programming” in June 2016. This number indicates the demand for information about the particular language, because Google Trends measures how often people search for the given term. As it measures searching activity rather than information availability, Google Trends can be an early cue to up-and-coming languages. Our methodology here is similar to that of the Popularity of Programming Language (PYPL) ranking.

Twitter

We measure the number of hits on Twitter for the template “X programming” for the 12 months ending June 2016. For 2016 data we use the Twitter Search API whereas in 2014 and 2015 we used the Topsy API, which is now defunct. This number indicates the amount of chatter on social media for the language and reflects the sharing of online resources like news articles or books, as well as physical social activities such as hackathons.

GitHub

GitHub is a site where programmers can collaboratively store repositories of code. Using the GitHub API and GitHub tags, we measured two things for the 12 months ending June 2016: (1) the number of new repositories created for each language, and (2) the number of active repositories for each language, where “active” means that someone has edited the code in a particular repository. The number of new repositories measures fresh activity around the language, whereas the number of active repositories measures the ongoing interest in developing each language.

Stack Overflow

Stack Overflow is a popular site where programmers can ask questions about coding. We measured two things on Stack Overflow for the 12 months ending June 2016: (1) the number of questions posted mentioning each language, and (2) the amount of attention paid to those questions. Each metric measures the demand for information about the language in a different way. Each question is tagged with the languages under discussion, and these tags are used to tabulate our measurements using the Stack Exchange API.

Reddit

Reddit is a news and information site where users post links and comments and can “upvote” or “downvote” the contributions of others to help identify the most important or interesting links. On Reddit we measured the number of posts mentioning each of the languages using the template “X programming” from June 2015 to June 2016 across any subreddit on the site. This metric captures the social activity and information sharing around each of the languages. Data was collected using the Reddit API.

Hacker News

Hacker News is a news and information site where users post comments and links to news about technology. We measure the number of posts that mentioned each of the languages using the template “X programming” for the 12 months ending June 2016. Like Topsy, Stack Overflow, and Reddit, this metric also captures social activity and information sharing around the various languages. We use the Algolia Search API.

CareerBuilder

We measured the demand for different programming languages on the CareerBuilder job site. We measure the number of fresh job openings (those that are less than 30 days old) on the U.S. site that mention the language. Because some of the languages we track could be ambiguous in plain text—such as D, Go, J, Processing, and R—we use strict matching of the form “X programming” for these languages. For other languages we use a search string composed of “X AND programming,” which allows us to capture a broader range of relevant postings. Data was collected in mid-June 2016 using the CareerBuilder API.

Dice

We also measured demand for different programming languages using the U.S. Dice job site, which lists technology-oriented jobs. Dice has a skills section on each job posting that lists the desired languages for that position. We thus measured the number of fresh job postings (those that are less than 30 days old) on the site that mention each of the languages in the skills section. Data was collected in mid-June 2016 using the Dice API.

IEEE Xplore Digital Library

IEEE maintains a digital library with over 3.6 million conference and journal articles covering a range of scientific and engineering disciplines. We measured the number of articles that mention each of the languages in the template “X programming” for the years of 2015 and 2016. This metric captures the prevalence of the different programming languages as used and referenced in scholarship. Data was collected using the IEEE Xplore API.