GitHub Experiment: Mapping the civic tech community


In November 2015, Stefan Baack created a GitHub Scraper to explore the network dynamics of civic hackers on the platform. Recognizing that GitHub is just one of many community platforms utilized by civic tech organizations, he also leveraged data from the Poplus Community and his own work with the British civic hacking NGO mySociety. Baack's ensuing analysis revealed a follower network dominated by one key player, maxogden, as well as regional hubs; a list of the most popular repositories; and, a map illustrating the locations of civic tech organizations across the globe.

Image: Baack's original Follower Network Graph.

Following the publication of his analysis, the civic tech community pitched in and helped Baack develop a more encompassing picture of the global community. Significantly, a number of the original findings were found to be incomplete by the second analysis. For example, the updated network analysis showed that, although maxogden still held a central position, it was also possible to identify other key players within regions.

Image: Baack's updated Follower Network Graph.

We spoke to Baack about his experiences working on this experiment, and his resulting impressions of civic tech organizations and activists.

EJC: What drew you to map the global civic tech community?

Baack: I’ve been researching the practices and values of civic tech with a case study on the British NGO mySociety, one of the oldest civic tech organizations that pioneered several civic tech applications which have been copied in many other countries around the world (e.g. TheyWorkForYou or WhatDoTheyKnow). Even though I primarily relied on interviews with mySociety members, I realized that some sort of network analysis would be extremely interesting to get a glimpse of the civic tech community as a whole. Partly because mySociety is one of the founding organizations of Poplus, an international federation of civic tech groups interested in developing and maintaining reusable civic tech components that work across countries. How can we describe mySociety’s role in this federation; and more generally how are their members connected? Who are the main contributors of this federation? Moreover, I was interested in expanding my qualitative research with some quantitative analysis, e.g. by checking which mySociety projects are most popular, or what tools or libraries mySociety developers are using to build their own applications.

You state that your project drew heavily on Liliana Bounegru, Jonathan Gray, and Stefania Milan's work that used GitHub to study data journalism. In what specific ways did their research inform your project? And now, having completed it, how does your understanding of their insights compare?

This project asked how GitHub serves as a transparency device for ‘journalism in the making’ and how it mediates, challenges or transforms journalism practices, products and values. The main task was developing and experimenting with new methods to scrape and analyze data from GitHub to answer these kind of questions. While the questions I addressed in my project are different, I drew heavily from the methodological experiments we conducted. During the summer school I learned a lot about writing scrapers, what is possible with the GitHub API, how the data gathered from GitHub can be analyzed, and the limitations that need to taken into account. My GitHub scraper reproduced some of the experiments we did at the Summer School and customized them to best suit my needs. My project largely confirmed the findings from the Summer School (in terms of methodology), but it also showed, I belief, how the limitations of GitHub data can be augmented by involving actors from the community we’re looking at in the data collection and analysis.

You project uses GitHub as an “inaccurate proxy” to describe this community. Can you explain what you mean by this and how you mediated the challenges posed by this?

First, GitHub’s social features like following or starring are poor indicators for how active or well-connected a user really is because these features are not necessarily needed to use GitHub productively – in many cases, not using them simply means that other channels are used to stay connected. Second, the civic tech community doesn’t solely consist of developers. Non-developers are underrepresented on a platform like GitHub. Third, the locations of civic technologists are inaccurate because users can specify their locations in whatever way they want, if they specify it at all. And fourth, civic tech is very much a field in flux: what exactly ‘civic tech’ is (or isn’t) is disputed. Accordingly, it is also hard to tell who belongs to the civic tech community. I sidestepped this problem by simply including every organizations that claims to be part of the civic tech community and has a GitHub account (more details on how I compiled that list below). The list of organizations I used to scrape GitHub is by no means complete and it’s not always clear how the organizations in this list form a ‘community’. In other words, my data doesn’t represent the ‘civic tech community’ as whole, only a fraction of it – albeit an important one.

Being aware and transparent about these limitations is important. At the same time, it is also important to understand what the data gathered from GitHub does tell us despite these limitations. After all, GitHub is the most popular platform among civic technologists to share and develop their tools. Every civic tech organization I’m aware of does have a GitHub account, or at least some of its members are registered. However distorted, the data therefore shows some general tendencies which can be examined in more detail using other methods or additional data sources. This is where I see the main value of this project: not in how accurately it represents the civic tech community, but how it shows us patterns that raise new questions and guide further research. For example, the data I gathered raised interesting questions about the political economy behind civic tech: Mapping the locations of civic technologists shows a strong Western bias; and comparing the follower-network with the contributor-network raised questions about the role of African civic tech groups. The network analysis also indicated that European groups have stronger ties to Latin America than to the US, but is this really the case or just due to a bias in the data or my analysis?

Another way to mediate the inaccuracy of data collected from GitHub is expert knowledge about the subject, for example from previous research or by involving members of the community in the collection and the analysis of the data. The insights I got from my study about mySociety as well as my earlier research about the Open Knowledge Foundation in Germany made it easier to judge the accuracy of the results and helped in the analysis. I also got some very helpful comments from the community after I published the fist results on my blog.

After sharing your initial project, your updated version incorporated community inputs and ideas, and it is clear that some of your conclusions changed following this. For example, in your initial follower network visualization maxogden has a very prominent position. Whilst  maxogden is still central in the updated version, it also reveals a number of other prominent actors. Can you please explain how the community helped you to create a more encompassing methodology?

Before I published the first article on my blog I shared it on mySociety’s mailing list. I got some helpful suggestions (for example to map contributors) and comments from the community (including maxogden). Most importantly, I was contacted by some of the organizers of the Poplus federation, including Steven Clift from E-Democracy.org who already curated a list of groups in this field. He shared with me an internal, crowdsourced list of most of the members of the Poplus mailing list. This helped me to add a lot of civic tech organizations I wasn’t aware of, but beyond that this list also helped me to judge the (in)accuracy of my data collection as it gave me a sense of the scope and diversity of the civic tech community.

Can you explain the how you developed the project’s Scraper and how it could be used in other data driven projects?

Originally, this whole project started as a programming exercise. I was learning Python and got inspired by the Digital Methods Summer School. GitHub provides an easy-to-use API with a very good documentation, so it was a great platform to learn how to write scrapers. I started very small: my first goal was to get a CSV with all the repositories of mySociety’s GitHub account. Then I added more details such as the number of contributors, created new CSVs to get the starred repositories of these contributors and so forth. It was an iterative and playful process that only later turned into a more serious research project, after I realized that the little tool I’ve created was powerful enough to answer some interesting questions.

The scraper can be used in any data driven project interested in the relationships between different organizations on GitHub: You can get CSV spreadsheets with detailed information about the projects and members of the organizations and GEXF files for network analysis in tools like Gephi.

Following your project, what insights have you personally drawn about the global civic tech community?

If there is a global civic tech community, it unclear what this community wants beyond open data, reusable tools and a vague sense of improving ‘civic life’. The list I used and the internal Poplus list is a mix of for-profit companies, non-profit NGOs and all kinds of government institutions. These type of actors might align because they all profit from increased transparency and reusability, but eventually their interests might clash. For example, wanting public institutions to copy civic tech applications in order to change the relationship between citizens and these institutions (which is what European non-profits tend to favor) or wanting public institutions to ‘step back’ and foster ecosystems of for-profit services (which is often implied in the ‘Government as a platform’ metaphor) has very different implications for civic life. While there seems to be much interest among practitioners to point out similarities and to articulate a common cause, I think it is actually more important to emphasize the differences between actors who claim to do civic tech.

Explore the original version of the project here and the updated version here.