IEEE Top Programming Languages: Design, Methods, and Data Sources

The IEEE Spectrum Top Programming Languages app synthesizes 12 metrics from 10 sources to arrive at an overall ranking of language popularity. The sources cover contexts that include social chatter, open-source code production, and job postings. Read on to learn more about the languages we track and each of the data sources we used, what it measures, and how we measured it.

What We Track

Starting from a list of over 150 programming languages gathered from GitHub, we looked at the volume of results found on Google when we searched for each one in the pattern “X programming” where “X” is the name of the language. We filtered out languages if they had a very low number of search results and then went through the list by hand to identify the most interesting languages. We labeled each language according to its use in Web, mobile, enterprise/desktop, or embedded environments.

Our final set of 49 languages includes names familiar to most computer users like Java, stalwarts like Cobol and Fortran, and more recent but niche languages like Erlang. For each we gauged its popularity using 12 metrics across 10 sources:

Google Search

We measured the number of hits for each language by using Google’s API to search for the template “X programming.” This indicates the volume of online information resources about each programming language. We took the measurement in January 2014, and so it represents a snapshot of the Web at that particular moment in time. This measurement technique is also used by the oft-cited TIOBE rankings.

Google Trends

We measured the index of each language as reported by Google Trends using the template “X programming” in January 2014. This indicates the demand for information about the particular language, because Google Trends measures how often people search for the given term. As it measures searching activity rather than information availability, Google Trends can be an early cue to up-and-coming languages. Our methodology here is similar to that of the PopularitY of Programming Language, or PYPL, ranking.

Twitter

We measure the number of hits on Twitter for the template “X programming” for the year of 2013 using the Topsy API. This number indicates the amount of chatter on social media for the language and reflects the sharing of online resources like news articles or books, as well as physical social activities such as hackathons.

GitHub

GitHub is a site where programmers can collaboratively store repositories of code. Using the GitHub API and GitHub tags, we measured two things: (1) the number of new repositories created in 2013 for each language, and (2) the number of active repositories in 2013 for each language, where “active” means that someone had edited the code in a particular repository. The number of new repositories measures fresh activity around the language, whereas the number of active repositories measures the ongoing interest in developing in each language.

Stack Overflow

Stack Overflow is a popular site where programmers can ask questions about coding. We measured two things on Stack Overflow: (1) the number of questions posted mentioning each language in 2013, and (2) the amount of attention paid to those questions. Each metric measures the demand for information about the language in a different way. Each question is tagged with the languages under discussion, and these tags are used to tabulate our measurements using the Stack Exchange API.

Reddit

Reddit is a news and information site where users post links and comments and can “upvote” or “downvote” the contributions of others to help identify the most important or interesting links. On Reddit we measured the number of posts mentioning each of the languages using the template “X programming” stretching from July 2012 to January 2014 across any subreddit on the site. We go a bit further back into history with Reddit than other sources in order to ensure robust statistics for each of the languages. This metric captures the social activity and information sharing around each of the languages. Data was collected using the Reddit API.

Hacker News

Hacker News is a news and information site where users post comments and links to news about technology. We measure the number of posts that mentioned each of the languages using the template “X programming” for the year of 2013. Like Topsy, Stack Overflow, and Reddit, this metric also captures social activity and information sharing around the various languages. Because Hacker News does not have its own data API, we used a third-party index called the HNSearch API.

Career Builder

We measured the demand for different programming languages on the Career Builder job site. We measure the number of fresh (that is, less than 30 days old) job openings on the U.S. site that mention the language. Because some of the languages we track could be ambiguous in plain text—such as D, Go, J, Processing, and R—we use strict matching of the form “X programming” for these languages. For other languages we use a search string composed of “X AND programming AND developer,” which allows us to capture a broader ranger of relevant postings. Data was collected in mid-January 2014 using the Career Builder API.

Dice

We also measured demand for different programming languages using the U.S. Dice job site, which lists technology-oriented jobs. Because of the way Dice structures data, it has a skills section on each job posting that lists the languages that are desired for that position. We thus measured the number of fresh (that is, less than 30 days old) job postings on the site that mention each of the languages in the skills section. Data was collected in mid-January 2014 using the Dice API.

IEEE Xplore Digital Library

IEEE maintains a digital library with over 3.6 million conference and journal articles covering a range of scientific and engineering disciplines. We measured the number of articles that mention each of the languages in the template “X programming” for the year of 2013. This metric captures the prevalence of the different programming languages as used and referenced in scholarship. Data was collected using the IEEE Xplore API.