Mirko Lorenz looks at how keshif.me lets you explore facets of datasets quickly
A key driver behind any work with data is to find new insights, patterns, or proof for an assumption. But to get there, a careful, step-by-step approach is needed to avoid drawing the wrong conclusions. One must find data, clean it, and then go through potential facets to find the story.
Additionally, there is another challenge that applies specifically to newsrooms. Unlike business organizations, journalists deal with very diverse datasets; one day we might need to understand a pattern in a local context, whilst another we might need to know which companies are performing well internationally.
For this purpose, the best practice would be to constantly collect data for a variety of topics in order to avoid repeating the process of finding, cleaning, and analyzing data on a deadline. However, since the data used in newsrooms spans so many areas of interest it would be difficult to set up even a simple server or database for each case. Moreover, previous investigations may not have been collected in one place, resulting in double work.
In comes keshif.me, an open source tool that essentially enables it’s users to connect to a range of tabular datasets. Once the data is in, the tool provides a very quick way to explore the facets of that data - be it the Titanic passenger list or the list of the fastest growing companies in the US. Users can quickly explore, compare and filter their search for interesting angles.
If you look up keshif.me, you will already find close to a hundred different datasets, providing an excellent showcase for the many areas that it can be applied to.
To learn a bit more about the background and motivation behind keshif.me, we interviewed its creator, Adil Yaldin.
Q: First: Adil, can you briefly introduce yourself - what is your background?
I think about data visually, and I believe that's where most of my work starts from. Currently, I am a computer science Ph.D. student at University of Maryland, College Park, and proud to be a member of its Human Computer Interaction Lab, which celebrates its +30 years of contributions to research and our daily lives. I am advised by my professors Niklas Elmqvist and Ben Bederson, who guide my work in many ways. Before that, I've worked on computer graphics, specifically rendering related stuff. It's been always about generating images. Now, my focus is visualizing structured datasets.
Q: What users and what kind of use cases do you have in mind?
If you have a structured, tabular dataset, you can use keshif to understand your data better by expressing your questions, and observing relations. This is a very broad use-case, but it sets the larger context of keshif and its future development. I generally try to avoid the word "user", for reasons already discussed here. I think the persona of keshif is more of an explorer. It can be curiosity that starts your exploration, or a focused drive to understand certain relations better. Yet, keshif is not about catering to everyone and every need. Keshif is a shared design basis that is useful in many common cases, currently focusing on open-ended data exploration of tabular data.
Q: Why did you develop Keshif.me?
From the first day, keshif has been about making information more accessible to a broader audience, lowering the complexity barriers in data analytics. There's a ton of work done already (VisTools page is an overview), but our tools are still mostly too complex to use effectively, or limited in power. I believe most of that complexity is because the visualization tools are "designed for designers", to support engaging in creative ways to explore that data. They do not necessarily focus on supporting us to make smarter decisions easily. Just look at features of any tool, and take a note on how many "customizations" it allows with so many options to choose from. I believe that good design can anticipate its end-use more precisely, and aims to keep the essentials, the strongest bare minimum. While developing keshif, I learned it's much harder said than done.
In addition, we like to celebrate tools that can do so many things at once, and how different perspectives can teach us something new and unexpected. We also forget about failed attempts, bad results, the time lost in making mistakes and learning our tools. Without necessary and wide ranging skills, we cannot make good choices on an empty canvas. Most design alternatives fail to communicate the data in an effectively perceivable way. This leads to "list of good-practices" being shared every day on social media, and in data-vis courses and workshops. I believe this is partly because of how we envisioned our tools should be, celebrating an "anything goes" mentality. We sometimes stop to say "just because you can, doesn't mean you should". If we are to self-criticize, I think we sometimes do not practice what we preach. We commonly work with complex data and our needs get complex too, so that's understandable.
I want to make it also clear that tools that support creativity are crucial to broaden our horizons. A blank canvas is how your start painting, and silence is how you start shaping your music. You also start with an intent, chose your brush and instrument accordingly, and add your strokes and notes based on your skill and imagination. Tools are an integral part of creative process, and I believe there's more work to be done on that end as well. In the meantime, we also can build on established practices for data analytics, think about the user experience, and focus on the most important and common goals. Things are getting more simple and more effective across the spectrum of our digital tools. I see a similar, positive trend in data visualization too, and keshif is my contribution to this trend. There's also a fundamental need to teach good practices and the power of data visualization, and I appreciate any work which leads to a critically thinking and data-aware community.
Q: In your opinion, how could a newsroom make use of this tool?
The more I look at various data sources and how they differ, the more I appreciate the hard work of the new-generation data journalists. I understand there is usually time pressure to find data, clean it, understand it (and its domain), find insights, and then decide what to communicate with your audience in the most simple, yet also intriguing way. A new tool in the newsroom should lower these barriers to be the most useful. It should be flexible enough to suit your environment, and reduce your efforts by bringing an efficient structure.
I want Keshif to be a tool of choice from the moment you get your data, to help you build a mental model of what it has, and help you identify points of interest by interactive exploration. One of the recent additions to keshif is the visual authoring interface so that you don't need to write code to create a data browser. YourData page is where you can quickly play with your data to create a basic browser. However, visual authoring is still an early prototype, and does not have the full features that are enabled by the API. The API offers more functionality to transform your data and style your browser. Once you use the API, you can then embed the browser in your public story, and let your audience explore the data themselves. You can think of Keshif as the generalized interface to your tabular data, instead of old-fashioned tables, and limited visualisations. Keshif is based on d3.js , the leading low-level library for web-based data visualization, and it also uses some jquery. So, keshif can co-exist with existing web pages easily.
Looking at the datasets in keshif.me can tell a lot about what kinds of data is well-fit for keshif. The good news is that, journalism is the top dataset keyword with 15 datasets! Checking the source codes of example datasets and taking an example and transforming to fit your data can be a good way to start for journalists experienced in programming. US Presidents demo is a great starting point; its browser configuration is only 23 lines of simple code. Your data source can be a Google Sheets, or CSV/JSON/XML files, hosted locally or on the cloud. There is more info in wiki. I'm in the process of updating the wiki when I have the chance.
Q: Is there a plan to add additional features? Can others contribute?
Yes & yes! There is a lot more to improve on the design, to extend to other data types, larger data scales, and more use-cases to make it a more extensive platform for lowering barriers to data exploration. Keshif has been tested over many years on different browsers, yet it probably also still feels like a prototype at first look. The front-page is not as pretty and informative, and the documentation is not as extensive and detailed as they could be. Keshif is a research project of a PhD student testing out some ideas in public. I hope that keshif will be around for longer with increasing impact and capabilities, so my focus in improving it in the right direction for the future.
I also welcome contributions from others. The best way to start contributing would be to open up a github issue, and talk about your plans. If you want to experiment with your own ideas, just fork keshif on github, do your awesome thing, and let me know.
Q: What is the license policy? Can this be used freely based on the Open Source license?
You can use keshif as long as you give attribution to it, and follow the 3-Clause BSD. You can break it, integrate it with your webpages, add new features and change its design. Hopefully, you will not modify it to a purple background with blinking text with 3D pie charts, and add lots of menus and configuration options everywhere. That would hurt me personally, but you are free to do so. If you do something nice with it, and visualize some data that you care about, let me know. My contact info is at adilyalcin.me.
Head over to keshif.me to try it for yourself!