4/7/2017

Sqoop

 

Just because there’s a duty to disclose, doesn’t mean there’s a duty to make it easy. This seems to be a universally true when it comes to public records, regardless of the country or government making them available.

The consequences for journalists can be profound: hours of time spent digging through messy data, missing stories that go untold, and the opportunity costs that come with these, just to name a few.

This is a problem we set out to improve a couple of years ago in the US with the introduction of Sqoop, a free data journalism site intended to make it easier for reporters to find and track public records, starting with the Securities and Exchange Commission (SEC), the Patent Office, and the federal court system, otherwise known as PACER (public access to court automated records).

Think of it as a search box across all of these public records sites (and we’re working to add others) as well as a rapid alerting service. If a journalist has saved searches for “Facebook”, “Jeffrey P. Bezos”, or “Internet of Things”, she will receive email alerts every time these search terms show up in new public filings.

Journalists can refine search results based on data source, form type, and geographic factors, and then save those searches as alerts.

Image: A Sqoop search for “Facebook” shows at-a-glance results from the SEC, the Patent Office, and federal courts.

The Securities and Exchange Commission

The first problem Sqoop tries to help solve is finding the right filing. The second is helping journalists determine the news value.

Just like Google, Sqoop provides insight into what’s contained in the disclosure by either identifying the exhibits associated with the filing, like “offer letter”, or in the case of Form 4s for stock compensation, Sqoop translates the codes and does the math to show how much the transaction is worth. This saves time journalists would otherwise spend sifting through numerous files and then using spreadsheets to calculate their findings.

In addition to searching by company and executive names or other key words, journalists can also search based on geography, including US state abbreviations, for example, NY for New York or CA for California, or within a defined proximity to their current location. Alternatively, journalists can query the 381 Metropolitan Service Areas in the country, from Abilene, Tex., to Yuma, Ariz., and everything in between.

This can be useful if a reporter’s beat is restricted to geographic area, such as New York City or the State of Oregon. By using these location filters in conjunction with SEC form type filters, a journalist covering startups in Silicon Valley could receive alerts for all new Form Ds—disclosures about private company financings, within that defined area.

Image: Using filters to find all Form Ds issued from Silicon Valley.

If a journalist’s beat is industry specific, like airlines or packaged software (yes, that’s still what it’s called), Sqoop provides the capability to search by the SEC’s Standard Industry Classification (SIC) codes.

Finally, The SEC defines a variety of events which require a company to file form 8-K with an associated item, such as “Entry into Material Definitive Agreement”, or “Completion of Acquisition or Disposition of Assets”, and journalists can use Sqoop to search for these 8-K events.

The U.S. Patent Office

The United States Patent and Trademark Office (USPTO) has two databases for patents, both applications and grants, updates to which it publishes twice a week. Once a patent application is submitted to the USPTO it can take several months for the patent office to publish the application publicly.

Once these applications and grants are published, they can provide insight into what a company is researching, as in the case of “Google wants to inject cyborg lenses into your eyeballs”, or “Boeing files patent for 3D-printed aircraft parts”.

Image: Patents may never turn into business initiatives, but many times they do.

Journalists can search Sqoop’s patent database for utility (most commonly known as inventions) applications and grants, as well as plant (think agriculture) and design patents. As with other search results, Sqoop provides a summary of what the patent document contains, as well as the inventors and assignees.

Federal Courts (PACER)

Perhaps the most daunting of the public records sites that Sqoop tried to help with is PACER because of its pay wall that charges users both search charges and a per-page fee. Consequently, Sqoop is only able to gather meta-data about new docket information, that is, the title of the case, the docket number, the presiding court, and date.

This is not a perfect system, but in most cases it’s an improvement if, for no other reason, PACER offers no alerting system. So, if a journalist covered Microsoft as a beat, he could get alerts every time the company was involved in a federal lawsuit, either as a plaintiff or defendant, similar to other Sqoop alerts.

Court cases are unlike SEC and patents, however, in that they often go on for months or years. A feature journalists requested early on ended up been developed as “Docket Watch”, which lets reporters follow specific court cases and get alerts every time there’s a new update to the case, like a “temporary restraining order”, or “summary judgement”.

Image: By clicking the eyeball icon, reporters can set Docket Watch alerts to get notified with new case updates.

Some reporters may cover a specific court, for example, Middle Pennsylvania or Southern District of New York, so we’ve created a handy list of shortcuts so that you can search and then get alerts for all new cases from these courts.

What’s Next?

Journalists have asked for a long list of other data sources to be added, both from the US and other countries. Currently, we’re working on a prototype of the US Department of Justice based on how our registered journalists ranked it relative to other choices.

Although most of Sqoop’s journalists are US-based, dozens of others from a handful of European and Asian publications use the service as well. If you would like to try out the service, you can register for a free journalist account here.

About the author

Bill Hankes is a longtime public relations executive who much earlier in his career was a journalist. Before founding Sqoop, Bill was Director of Bing Public Relations at Microsoft and prior to that Vice President of Corporate Communications for RealNetworks (NASDAQ: RNWK). Hankes has spoken at the SABEW Government Data Immersion Workshop in Washington, D.C., for two years running, last year’s NICAR conference in Denver, and at the Investigative Reporters and Editors conference in Philadelphia where Sqoop was debuted. He is a member of SPJ, SABEW and Hacks and Hackers. Last summer in Vienna, Austria, Sqoop was honored as one of the eight most innovative startups in news worldwide by the Global Editors Network, and overall winner in the data journalism category.

Comments