20/7/2017

Tweets of Congress: Output from 1000+ accounts for any given day

 

Tweets of Congress is a project collating the daily Twitter output of both houses of the United States Congress, encompassing the accounts of members, political parties, committees and caucuses (around 1,070 accounts in total). There are two components to the project: a backend app for data collection and serialization and a frontend Github-hosted site offering JSON datasets for given days.

The App

The backend app, the Congressional Tweet Automator, is a light NodeJS program backed by a Redis data store for tracking tweets and users. The app uses the Twit and Github modules, respectively, for interfacing with the Twitter and Github APIs. There are also some utility functions to track time and the like.

To maximize portability, the app is set up to scrape from a Twitter list with the pertinent accounts. The app contains functionality that translates and simplifies the received tweet data from the Twitter API; it also unfurls natively shortened t.co links, adds media URLs, and includes full text of quoted tweets or retweets. To reduce the risk of blocking and make the implementation as an anonymous as possible, I set up a single-purpose private Twitter account with a private Twitter list for tracking accounts. Twitter doesn't allow users to track private lists they are added to, so essentially there is no way that anyone on my list will ever figure out they are there.

Powering the list is a customized user dataset that organizes the accounts by the entity they are associated with and whether they are campaign or office accounts. The user dataset is a combination of metadata sourced from the unitedstates/congress-legislators project and accounts I had followed through single-purpose Twitter accounts, which I set up to create TweetDeck streams for tracking Democrats and Republicans' activity on Twitter. Collating and categorizing all the accounts—particularly campaign accounts which do not follow any standardized naming convention—was a time-consuming, manual process.

At the around midnight Eastern Standard Time, the app serializes the tweets collected the day prior and empties the tweets hash key of the Redis store. A secondary Github account I use for automated tasks commits the JSON-serialized data to the repo along with a small Markdown file for enabling some functionality on the frontend site. Initially, I was actually generating a full Jekyll post containing a sorted list of all the tweets, but I dropped that after one day when I found out how unwieldy and overwhelming 3,000 tweets was.

In the near future, I hope to also add the serialized datasets to the Internet Archive. Doing so will solve the issue that arises from eventually having to drop old datasets due to the size constraints on Github.

The app is deployed on Heroku, and tweets are collected/serialized using a script that runs on an hourly interval via Heroku Scheduler.

The titular Tweets of Congress site is a bit less complex: it's basically just a Github Pages Jekyll blog using a preexisting theme with a handful of stylistic and functional customizations. I changed the functionality to leverage the Markdown files generated by the backend to change the post links on the homepage and Jekyll-generated RSS feed to links to the JSON datasets created on respective days.

How can journalists use the project?

The most obvious use of the project is figuring out some way to use the daily JSON tweet datasets. To make the most of that data, you probably want to use it in concert with the aforementioned metadata in the backend app. The possibilities are infinite.

For instance, Philip Bump of the Washington Post has used the combination of tweet and user datasets to create a visualization of what pictures members of congress tweet by party and an analysis of how frequently Democrats and Republicans tweeted about healthcare.

You could also use parts of the backend app to create something new, like an archive of corporate tweets, or apply the idea of the app to another governmental body. You can use the user metadata to create a list of Congress-related Twitter accounts and break down accounts by various properties a Twitter API query returns (background color, like counts, et cetera).

Explore Tweets of Congress here.

About the author:

Alex Litel is a developer based in Los Angeles.

Image: Esther Vargas.

Comments