17/11/2015

DocumentCloud

 

Turn documents into data.

Reporting with primary documents? DocumentCloud provides a cool mechanism for turning these into searchable data and finding stories within large sums of documents.

Every document uploaded to the site is run through Thomson Reuters OpenCalais - a process that uses Natural Language Processing alongside machine learning algorithms to convert unstructured text into extractable data. Not only does this mean that reporters can explore their documents via keyword searches, but it also provides the opportunity to extract common facets and identify critical information. For example, say you are trying pinpoint a timeline of events - DocumentCloud allows you to select all the dates in a series of documents and plot these on a timeline for more streamlined analysis.

docdate.png

Once you have completed your story, DocumentCloud can be used to integrate the primary documents into the report itself and help your reader follow the paper trail behind the story. Some examples of how DocumentCloud has been integrated to report on primary documents include the Wall Street Journal's catalog of surveillance technologies and WRAL's coverage of the University of North Carolina at Chapel Hill academic fraud scandal, in which readers can search through 200,000 pages of documents collected throughout their investigation.

Surveillance.png

New developments

This week, the team at DocumentCloud made it even easier to integrate documents in your work with the launch of a mobile responsive Page Embed type. The new Page Embed displays a single page from your document, along with your annotations, in an fluid viewer that resizes the interface with respect to the device that it is being viewed on.

DocumentCloud's new Page Embed feature has already been employed by the Boston Globe to report on a civil rights battle between the Harvard teaching hospital and a patient's spouse, Dr. Hooman Noorchashm. In their coverage, the Globe utilized the Page Embed feature to insert a letter sent to Noorchashm from the hospital - a piece of evidence that helps add weight and verify their report's claims.

Check out more examples of the new Page Embed feature on its GitHub page.

Learn more about how you can use DocumentCloud by visiting its website.

Comments