21/9/2015

import.io

 

Instantly convert webpages into tables of data and create tailored APIs to suit your story

Although the web contains an enormous wealth of information, any data journalist will know that much of it is only human-readable and must be scraped in order to tell a data driven story. Yet, if you don't know how to code and the datasets aren't readily available, this can pose quite a challenge.

In response, the team at import.io have developed a free desktop app that allows journalists to quickly extract information from any webpage and turn it into their own custom API to query over time or into a static dataset ready for immediate analysis.

To extract data from a page, import.io uses a 'point and click' interface - you simply need to highlight and select the fields you wish to scrape. By utilizing the 'Crawler' feature, this process can be mimicked across multiple pages on the same website.

Once you have used import.io to scrape data, the app also offers integrations with Tableau and Plot.ly to visualize your insights, or alternatively it can be downloaded in CSV, Excel, Google Sheets or JSON formats and imported into a third party software of your choice.

import.io's API feature is particularly beneficial for journalists who want to tell a live story. After you define the schema for your dataset on the webpage, this can be turned into a custom API and queried at intervals of your choosing. Not only does this allow for changes to be tracked over time, but by powering visualizations with your custom import.io API these will be seamlessly reflected on the end product. In addition, import.io automatically tests data sources twice a day and will let you know if URLs are broken, so you don't need to worry about the API failing without it going noticed.

Looking at the number of custom APIs created by import.io users in comparison to that of those already publicly available reveals its value for filling data deficits. As of March 2014, the number of custom APIs created by import.io users reached 20,000, whereas the number of free and publicly available APIs only just reached 14,000 in September 2015. With import.io, non-coding journalists no longer have to rely on these pre-existing APIs that may not exactly fit their investigation's needs, and they do not need to commission the creation of specified APIs to fill this gap.

Examples of import.io driven projects

  • Carolyn Said's article 'The Airbnb effect' in the San Francisco Chronicle (12 July 2015) used import.io to scrape data from Flipkey and Homeaway in order to investigate the impact that they have had on the city's real estate market.
  • Oxfam's study 'A Tale of Two Britains: Inequality in the UK' (17 March 2014) took data from Forbes’ list of The World's Billionaires extracted using import.io to reveal that Britain’s five richest families are worth more than the poorest 20%.
  • The British Red Cross' First Aid App uses an API derived from data import.io scraped from the NHS website to provide information about hospitals across Britian.

Learn more about using import.io for data journalism

Comments