5/8/2017

Face-O-Matic: Experimental Slack alert system tracking Trump and congressional leaders on TV news

 

Face-O-Matic is an experimental public service that alerts users via a Slack app whenever the faces of President Donald Trump and congressional leaders appear on major TV news cable channels: CNN, Fox News, MSNBC, and the BBC. The alerts include hyperlinks to the actual TV news footage on the TV News Archive website, where the viewer can see the appearances in the context of the entire broadcast. The app was built by the Internet Archive’s TV News Archive in partnership with the California-based start-up firm Matroid, which specializes in identifying people and objects in images and video.

The app

The TV News Archive, our collection of 1.3 million+ TV news broadcasts dating back to 2009, is already searchable through closed captions.

But captions don’t always get you everything you want. If you search, for example, the words “Donald Trump” you get back a hodge-podge of clips in which Trump is speaking and clips where reporters are talking about Trump. His image may not appear on the screen at all. The same is true for “Barack Obama”, “Mitch McConnell”, “Chuck Schumer”, or any name.

Developing the ability to search the TV News Archive by recognizing the faces of public officials requires applying algorithms, such as those developed by Matroid. In the future we hope to work with a variety of firms and researchers; for example, Dan Schultz, the Internet Archive’s TV News Archive’s senior creative technologist, is also working on a separate facial detection experiment with the firm Datmo.

Facial detection requires a number of related steps: first, training the system to recognize where a face appears on a TV screen; second, extracting that image so it can be analyzed; and third, comparing that face to a set known to be a particular person to discover matches.

In general, facial recognition algorithms tend to rely on the work of FaceNet, described in this 2015 paper, in which researchers describe creating a way of “mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity”. In other words, it’s a way of turning a face into a pattern of data, and it’s sophisticated enough to describe faces from various vantage points – straight ahead, three-quarter view, side view, et cetera. To develop Face-o-Matic, TV News Archive staff collected public images of elected officials from different vantage points to use as training sets for the algorithm.

The Face-O-Matic Slack app is meant to be a demonstration project that allows the TV News Archive to experiment in two ways: first, by creating pipelines that run the TV News Archive video streams through artificial intelligence models to explore whether the resulting information is useful; second, by using a new way to distribute TV News Archive information through the popular Slack service, used widely in journalistic and academic settings. For now, the Slack app must be installed by a Slack team administrator. The link is here.

Face-O-Matic on GitHub

You can follow TV News Archive’s progress recognizing faces on TV via the following GitHub pages:

Why detect faces of public officials?

First, our concentration on public officials is purposeful; in experimenting with this technology, we strive to respect individual privacy and harvest only information for which there is a compelling public interest, such as the role of elected officials in public life. The TV News Archive is committed to these principles, developed by leading artificial intelligence researchers, ethicists, and others at a January 2017 conference organized by the Future of Life Institute.

Second, developing the technology to recognize faces of public officials contained within the TV News Archive and turning it into data opens a whole new dimension for journalists and researchers to explore for patterns and trends in how news is reported.  

For example, it will eventually be possible to trace the origin of specific video clips found online; to determine how often the president’s face appears on TV networks and programs compared to other public officials; to see how often certain video clips are repeated over time; to determine the gender ratio of people appearing on TV news; and more. It will become useful not just in explaining how media messages travel, but also as a way to counter misinformation, by providing a path to verify source material that appears on TV news.

This capability adds to the toolbox we’ve already begun with the Duplitron, the open source audio fingerprinting tool developed by Schultz that the TV News Archive used to track political ads and debate coverage in the 2016 elections for the Political TV Ad Archive. The Duplitron is also the basis for The Glorious ContextuBot, which was recently awarded a Knight Prototype Fund grant.

All of these lines of exploration should help journalists and researchers who currently can only conduct such analyses by watching thousands of hours of television and hand coding it or by using an expensive private service. Because we are a public library, we make such information available free of charge.

What’s next?

The TV News Archive will continue to work with partners such as Matroid to develop methods of extracting metadata from the TV News Archive and make it available to the public. We will develop ways to deliver such experimental data in structured formats (such as JSON, csv, and more) to augment Face-O-Matic’s Slack alert stream. Such data could help researchers conduct analyses of the different amounts of “face-time” public officials enjoy on TV news.

Schultz also hopes to develop ways to augment the facial detection data with closed captioning, using for example OpenedCaptions, another open source tool he created that provides a constant stream of data from TV for any service set up to listen. This will make it simpler to search such data sets to find a particular moment that a researcher is looking for. (Accurate captioning presents its own technological challenges: see this post on Hyper.Audio’s work.)

The TV News Archive is also experimenting with chyron detection to extract the statements that appear on the bottom of TV news screens.

Beyond this experimentation, we have big plans for the future.  We are planning to make more than a million hours of TV news available to researchers from both private and public institutions via a digital public library branch of the Internet Archive’s TV News Archive. These branches would be housed in computing environments, where networked computers provide the processing power needed to analyze large amounts of data.

Researchers will be able to conduct their own experiments using machine learning to extract metadata from TV news. Such metadata could include, for example, speaker identification–a way to identify not just when a speaker appears on a screen, but when she or he is talking. Researchers could also create ways to do complex topic analysis, making it possible to trace how certain themes and talking points travel across the TV news universe and perhaps beyond. Metadata generated through these experiments would then be used to enrich the TV News Archive, so that any member of the public could do increasingly sophisticated searches.

We want feedback! Please contact us at .(JavaScript must be enabled to view this email address).

Nancy Watzman is the managing editor of the Internet Archive’s TV News Archive. Sign up for the TV News Archive’s weekly newsletter here.

Image: Jeff Evans.

Comments