All the lonely people: Where do they all come from?


A closer look at the data behind Craigslist Missed Connections.

Every day thousands of people log onto Craigslist's Missed Connections section to see if that lingering glance with a passerby, or smile exchanged on public transport, was met with equal infatuation. But who are these people, and what do their posts say about them?

To answer these questions, Ilia Blinderman extracted data from 10,000 Craiglist Missed Connections across the United States in January, focusing specifically on messages from New York, LA, Chicago, Houston, Philadelphia, Phoenix, San Antonio, San Diego, and Dallas.

The project is a particularly poignant example of data driven narrative journalism - although the piece rests on an analysis of the Craiglist data, Blinderman humanizes these trends through personal accounts and reflections on the 'missed connections' phenomenon.

For example, in his analysis of the messages' language choice, Blinderman tests Nick Paumgarten's assertation that "demonstrating the ability, and the inclination, to write well is a rough equivalent to showing up in a black Mercedes". In doing so, he employs a zoom circle visualization to compare the most common phrases used by men and women.




Images: Zoom circle visualization of the most commonly used phrases in missed connection posts for men seeking women

What results is a particularly banal picture of message content - in Blinderman's own words "Star-crossed lovers, these are not". While Paumgarten may have us think that online dating is a words game, Blinderman's visualization throws this into doubt. As the small snapshot above illustrates, the majority of phrases used by posters are quite cliche.

Similarly, to reveal additional trends in the Craiglist data, the project leverages several other interactive graph types, alongside narrative driven in-text analysis.

Throughout the entire project, it is clear that Blinderman has made incredibly considered editorial decisions to represent the trends derived from his data as accurately as possible. For instance, he explains the decision to use an interactive scatterplot graph to compare the lengths of posts and ages of posters across cities:

"Scatterplots are, perhaps, the best way to represent large numbers of characteristics for data points simultaneously. In this case, I used the axes to depict the mean ages of the posters (no real outliers here, so mean seemed appropriate) and the median post lengths (some posts were exceedingly long, and used the median because I didn't want outliers to skew the figures), of their posts; I'd used colour to depict the various groups of posters and whom they sought; circle size to demonstrate the absolute number of posts; and finally, on-hover highlights to allow readers to compare different groups in the same city. The fact that this was an interactive graphic also allowed me to both communicate the broader point that most posts were clustered around certain intervals, and the noteworthy differences between them, avoiding the dreaded "non-zero axis" issue."


Images: Blinderman's scatterplot graph depicting the lengths of posts and ages of posters

Although the project only provides a small cross section of Missed Connections posts in the United States, Blinderman's considered visualizations and personal long form writing style leaves the audience with an insightful, and surprising emotive, taste of loneliness and love on the internet.

Want to find out more about the posters behind Craiglist's Missed Conntections? Check out the full piece here.