IEEE Top Programming Languages: Design, Methods, and Data Sources
The IEEE Spectrum Top Programming Languages app synthesizes 11 metrics from 9 sources to arrive at an overall ranking of language popularity. The sources cover contexts that include social chatter, open-source code production, and job postings. Below, you’ll find information about how we choose which languages to track and the data sources we use to do it.
What We Track
Starting from a list of over 300 programming languages gathered from GitHub, we looked at the volume of results found on Google when we searched for each one in using the template “X programming” where “X” is the name of the language. We filtered out languages that had a very low number of search results and then went through the remaining entries by hand to narrow them down to the most interesting. We labeled each language according to whether or not it finds significant use in one or more of the following categories: Web, mobile, enterprise/desktop, or embedded environments.
Our final set of 47 languages includes names familiar to most computer users, such as Java; stalwarts like Cobol and Fortran; and languages that thrive in niches, like Haskell. We gauged the popularity of each using 11 metrics across 9 sources in the following ways:
We measured the number of hits for each language by using Google’s API to search for the template “X programming.” This number indicates the volume of online information resources about each programming language. We took the measurement in June 2018, so it represents a snapshot of the Web at that particular moment in time. This measurement technique is also used by the oft-cited TIOBE rankings.
We measured the index of each language as reported by Google Trends using the template “X programming” in July 2018. This number indicates the demand for information about the particular language, because Google Trends measures how often people search for the given term. As it measures searching activity rather than information availability, Google Trends can be an early cue to up-and-coming languages. Our methodology here is similar to that of the Popularity of Programming Language (PYPL) ranking.
We measured the number of hits on Twitter for the template “X programming” for the 12 months ending June 2018. For 2016 and later data we used the Twitter Search API, whereas in 2014 and 2015 we used the Topsy API, which is now defunct. This number indicates the amount of chatter on social media for the language and reflects the sharing of online resources like news articles or books, as well as physical social activities such as hackathons.
GitHub is a site where programmers can collaboratively store repositories of code. Using the GitHub API and GitHub tags, we measured two things for the 12 months ending June 2018: (1) the number of new repositories created for each language, and (2) the number of active repositories for each language, where “active” means that someone has edited the code in a particular repository. The number of new repositories measures fresh activity around the language, whereas the number of active repositories measures the ongoing interest in developing each language.
Stack Overflow is a popular site where programmers can ask questions about coding. We measured two things on Stack Overflow for the 12 months ending June 2018: (1) the number of questions posted mentioning each language and (2) the amount of attention paid to those questions. Each metric measures the demand for information about the language in a different way. Each question is tagged with the languages under discussion, and these tags are used to tabulate our measurements using the Stack Exchange API.
Reddit is a news and information site where users post links and comments and can “upvote” or “downvote” the contributions of others to help identify the most important or interesting links. On Reddit we measured the number of posts mentioning each of the languages, using the template “X programming” from June 2017 to June 2018 across any subreddit on the site. This metric captures the social activity and information sharing around each of the languages. We collected data using the Reddit API.
Hacker News is a news and information site where users post comments and links to news about technology. We measured the number of posts that mentioned each of the languages using the template “X programming” for the 12 months ending June 2018. Like Topsy, Stack Overflow, and Reddit, this metric also captures social activity and information sharing around the various languages. We used the Algolia Search API.
We measured the demand for different programming languages on the CareerBuilder job site. We measure the number of fresh job openings (those that are less than 30 days old) on the U.S. site that mention the language. Because some of the languages we track could be ambiguous in plain text—such as D, Go, J, Processing, and R—we use strict matching of the form “X programming” for these languages. For other languages we use a search string composed of “X AND programming,” which allows us to capture a broader range of relevant postings. We collected data in late June 2018 using the CareerBuilder API.
This metric is now obsolete because Dice shut down its API. However, we are retaining the previously gathered data for consistency in making comparison between two earlier years. (For 2018, the data is a null set). In the future, we expect to replace this metric with another job site that allows a similar analysis. Dice had a skills section on each job posting that listed the desired languages for that position. In previous years we measured the number of fresh job postings (those that were less than 30 days old) on the site.
IEEE Xplore Digital Library
IEEE maintains a digital library with over 3.6 million conference and journal articles covering a range of scientific and engineering disciplines. We measured the number of articles that mention each of the languages in the template “X programming” for the years of 2016 and 2017. This metric captures the prevalence of the different programming languages as used and referenced in scholarship. We collected data using the IEEE Xplore API.