16/6/2017

Dedupe.io

 

De-duplicate and find matches in your messy data.

Journalists working with handwritten data, or other messy spreadsheets, have long been plagued by typos, inconsistent formatting, and contradictory fields. 

But all these Excel woes might soon be over. 

Dedupe.io is a cool tool that helps to quickly and automatically find similar rows in a spreadsheet or database.

Using machine learning techniques, the tool can:

  • identify all exact and similar records within a spreadsheet
  • link spreadsheets to find overlapping records in each
  • automatically match new data as it comes in based on existing training

All you have to do is upload your spreadsheet and provide training on the right way to identify similar records in your data. Then, matches are automatically found for you to review and download.

Dedupe.io was created by DataMade, an organisation that builds technology and uses data to empower journalists and other civic actors, after years of encountering messy data.

"Before we can map how much money we’re spending on incarcerating people per city block, before we can tell you how much money Ed Burke’s political campaign has in its coffers and before we can show residents what lots are for sale in their neighborhood, we have to spend a lot of time just cleaning up and preparing data. It’s a lot of work," explains Derek Eder.

At the moment, Eder and his team have only released Dedupe.io in private beta. However, interested journalists can email .(JavaScript must be enabled to view this email address) to sign up.

Find out more here.

Comments