Humans of Electrical Engineering, Mathematics and Computer Science

Georgios Gousios

Georgios Gousios’ open source GHTorrent changed the domain in a foundational way

Assistant professor of Software Engineering Georgios Gousios  received the Foundational Contribution Award at the MSR 2018 conference. He received the award as scientific research acknowledgement for his GHTorrent research initiative project at GitHub.

GitHub is a global software collaboration platform that supports a large community to learn, share, and work together to build software. It’s success has led to over 28 million users, 68+ million repositories and users who have added a total of 2,800,000,000 lines of code on GitHub in 2017 alone. With the help of Gousios’ GHTorrent all this historical data can be easily mined from the GitHub repositories, after which it is analysed and given back as a service to the wider research community on the GitHub platform. “GHTorrent is an easy to use source for accessing the wealth of data that GitHub has,” according to Gousios.

Scientific breakthroughs

GHTorrent has enabled researchers to perform large scale, data-driven studies on one of today's busiest online collaboration hubs. This uncovered patterns of collaboration that were previously unknown. Researchers from UC Davis found that adding diversity to software teams increases their productivity [1]. Researchers from NC State University used GHTorrent to discover that women’s contributions are discriminated against in OSS projects [2]. Researchers at TU Delft used GHTorrent to study how learning transfer works in MOOCs [3]. Gousios’s own research defined the characteristics of the pull-based software development model and uncovered the pain points that developers face when working with it, which lead to hundreds of derivative works on how to mitigate them [4].

Industrial Uptake

The tool facilitates data exploration & analysis of everything that is happening on GitHub and has thus been embraced by leading corporations such as Microsoft, who is also sponsoring running GHTorrent on their Azure cloud, and Google. Both of them are using it to track progress of their open source projects. Professional services firm Deloitte used GHTorrent to research the blockchain evolution.

Open science

The GHTorrent research project can be seen as a case book example of open science. “The project covered a need and it was open from the beginning, people could download it from the very beginning,” Gousios explains. It is the work philosophy that he h3ly believes in. “Open access is an absolute must for the adoption and wide spreading of research results. This should happen as early as possible. I hope this project will inspire others to do the same.”

[1] Vasilescu, Bogdan, et al. "Gender and tenure diversity in GitHub teams." Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2015.
[2] Terrell, Josh, et al. "Gender differences and bias in open source: Pull request acceptance of women versus men." PeerJ Computer Science 3 (2017): e111.
[3] Chen, Guanliang, et al. "Learning Transfer: Does It Take Place in MOOCs? An Investigation into the Uptake of Functional Programming in Practice." Proceedings of the Third (2016) ACM Conference on Learning@ Scale. ACM, 2016.
[4] Gousios, Georgios, Martin Pinzger, and Arie van Deursen. "An exploratory study of the pull-based software development model." Proceedings of the 36th International Conference on Software Engineering. ACM, 2014.