1/6/2018

Data Done Right: How Mozilla visualized users’ opinions of Facebook

 

By Vojtech Sedlak, Mozilla Foundation

When news broke about Facebook, Cambridge Analytica and their covert data sharing practices, it felt like a pivotal moment in the pro-privacy movement. Pundits, politicians and activists around the world scolded Facebook for its leniency with people’s personal information. Amidst all of the prominent voices sharing their opinion on the matter, we wanted to make sure users got their say, so the Mozilla Foundation launched a survey to find out how people felt about Facebook.

Around 47,000 people took the survey and the results were really interesting. One of the core tenets of our work at Mozilla is working in the open. For this reason, we decided to make the survey data open and accessible to everyone, all the while protecting the privacy of the survey takers. Whereas in the past we had shared survey data in a raw format, we decided to take it a step further this time by building an interactive tool that allowed people to explore and parse the data themselves.

Listening and sharing

Mozilla Foundation works to promote the health of the Internet, so listening to our community is an important part of our work. In the past, we’ve asked how people feel about privacy and security and what their views are on connected devices. When reporting the survey results, we always try to open the data up as much as possible, so that anyone can explore and visualize it. For example, when we posted the nearly 200,000 responses to our connected devices survey on our website in a csv format, a data visualization class at the University of Colorado, Boulder took the data and created dozens of fascinating visualizations that unlocked new insights. For us, building an interactive tool that enables data parsing was the natural next step for our Facebook survey data.

Before any data is shared out, whether in a csv format or in an online tool, it is important to ensure that privacy of survey takers is sufficiently protected. Aside from removing any personally-identifiable information such as postal/codes or IP addresses, we also made sure only results with sufficient sample size were presented. For instance, all countries with less than 700 respondents were bundled into an ‘Other’ category. This became particularly important when we added the ability to cross-tabulate the data and filter by specific demographic attributes, as that enabled users to zoom in on very small segments of respondents (e.g. less than 5).

Building a scalable tool

The first iteration of the Facebook survey tool was built in R’s Shiny framework, using shinydashboard for UI, tidyverse for parsing of the survey data and ggplot2 for visualizations (source code is available on Github). Initially, we only included the ability to compare results by a specific demographic attribute such as age, but we ended up adding an ability to apply specific filters to the survey data, thus unlocking a more granular view of the data.

Mozilla is fortunate to have a strong community of supporters, which means that any publicly available tool needs to meet strict load capacity criteria. It turns out that Shiny is not the ideal framework when scalability is the name of the game, as it severely limits the number of concurrent users. Even with a proxy server and substantial resources at hand, Shiny proved to not scale well beyond 100 concurrent users due to its dependence of WebSockets. And so, we had to pivot and rebuild the tool from the ground up in JavaScript.

Whereas the Shiny version of the tool made it easy to load and parse the data, in JavaScript all data had to be pre-parsed and split into multiple files to minimize load times. We used D3js for data visualizations, which made the visualizations lightning fast. Instead of barely handling 100 concurrent users, the tool was now ready to withstand thousands of concurrent visitors. Due to time constraints we weren’t able to innovate on the design of the tool (i.e. the JavaScript version is an exact replica of the Shiny one), or to apply some of the more innovative ways of building interactive tools, such as using React or Vue.js. You can find the source code for the JavaScript version of the tool here.

Promoting data literacy

Data literacy is arguably one of the most important technical skills in today’s world. While a lot of progress has been made around general digital literacy and even coding, the ability to understand data and to create or interpret data visualizations requires a distinct mixture of technical skills, design, and critical thinking. These skills are ever more important as today’s tools and the growing open data movement, which Mozilla is a proud supporter of, make it easy to take any dataset, find insights and create interactive visualizations.

As we look towards the next opportunity to listen to our community, we are eager to innovate on the design and utility of our interactive visualizations. Our mission is to empower all the citizen data scientists and hackers out there.

About the author

Vojtech Sedlak is a Digital Analyst at the Mozilla Foundation. Originally from the Czech Republic, Vojtech works at the intersection of programming, analytics and storytelling. He enjoys searching for innovative ways of analyzing data and communicating insights and is a passionate advocate of the open data and open analytics movements.

Explore the project here.

Image: Book Catalog.

Comments