Search, analyze and cull huge volumes of text or documents.

Sorting through a large number of documents? Overview is a free, open-source tool that helps you sift through material and identify stories.

Originally designed to assist investigative journalists, Overview can hold up to 200,000 documents for free on its public server, and you can also download a local version for your computer.

Overview is integrated with DocumentCloud, and files can also be imported in PDF or CSV formats.

Once you've imported your materials, Overview offers four sections to explore them further - namely, the folder tree, search field, tag list, and document viewer.


All your documents are held in the folder tree, sorted via best-fit keywords. These documents can be further queried by conducting a search in the top field, or by using the tag function to label and filter specific materials. As its name suggests, the document viewer section allows you to zoom in on one or a selection of documents for further analysis.

One particularly useful feature of Overview's search function is its advanced syntax. Users aren't just limited to boolean operators, but can also conduct fuzzy searches, revealing documents that contain keywords with two characters added, deleted, or changed; administer wildcard searches to unearth regular expressions; screen documents via a preselected list of unimportant or important words; and more.

Social media analysis

Like document dumps, social media conversations contain a large volumes of text that can be burdensome to mine and analyze.

To combat this, Overview's mining features can also be applied to social materials.

For the time being, Overview can only conduct social analysis after the data has already been scraped and organized into CVS format; but, from here conversations can be explored via the folder tree, search, or tag functions as you would any other document.

Case studies

Earlier this year, researchers from the Organized Crime and Corruption Reporting Project (OCCRP) used Overview to investigate documents that implicated the President of Uzbekistan's daughter in a bribery scandal worth over 1 billion USD. The documents, which can be viewed here, made use of company-related keyword searches and tagging.


Image: Documents filtered by the Merkony Investment Group Limited tag

Similarly, Jack Gillum, of the Associated Press, used a combination of DocumentCloud and Overview to investigate 9000 pages of documents related to then vice-presidential candidate Paul Ryan's record against federal handouts.

“I used Overview to take these 9000 pages of documents, and knowing there was probably going to be a lot of garbage or extra attachements, to separate the chaff from the wheat,” said Gillum.

"I could figure out where are the letters from voters, and to to put these documents in groups. So if someone’s complaining about the FCC, and there are 200 pages about that, we can put that aside.”

Through this investigation, Gillum was able to show that Ryan had supported several of the same programs he publicly campaigned against.

Find out more on the Overview website.