<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:admin="http://webns.net/mvcb/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:content="http://purl.org/rss/1.0/modules/content/">

    <channel>
    
    <title>DDJ &#45; Resources</title>
    <link>http://datadrivenjournalism.net/resources</link>
    <description>DDJ &#45; Resources</description>
    <dc:language>en</dc:language>
    <dc:creator>support@ejc.net</dc:creator>
    <dc:rights>Copyright 2013</dc:rights>
    <dc:date>2013-05-14T23:45:00+00:00</dc:date>
    <admin:generatorAgent rdf:resource="http://expressionengine.com/" />
    

    <item>
      <title>Getting to Know your Dataset with the OpenRefine Facets</title>
      <link>http://datadrivenjournalism.net/resources/Getting_to_Know_your_Dataset_with_the_OpenRefine_Facets</link>
      <guid>http://datadrivenjournalism.net/resources/Getting_to_Know_your_Dataset_with_the_OpenRefine_Facets#When:23:45:00Z</guid>
      <description><![CDATA[<p>
	<em>Originally published by <a href="https://twitter.com/psychemedia" target="_blank">Tony Hirst</a>, Open University lecturer and data storyteller at the Open Knowledge Foundation,&nbsp;on <a href="http://blog.ouseful.info/" target="_blank">blog.ouseful.info</a> under a <a href="http://creativecommons.org/licenses/by/3.0/" target="_blank">Creative Commons Attribution licence</a>.</em></p>
<p>
	&nbsp;</p>
<p>
	One of the many ways of using OpenRefine is as a toolkit for getting a feel for the range of variation contained within a dataset using the various <em>faceting</em> options. In the sense of analysis being a conversation with data, this is a bit like an idle chit-chat/&#39;getting to know you&#39; phase, as a precursor to a full blown conversation.</p>
<p>
	<em>Faceted search</em> or <em>faceted browsing/navigation</em> typically provides a set of search filters to a list of search results that limits or restricts the displayed results to ones that fulfill certain conditions. In a library catalogue, the facets might refer to metadata fields such as publication date, thus allowing a user to search within a given date range, or publisher:</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/faceted-search-ou-library.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_11.56.37_AM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_11.56.37_AM.png" style="height: 386px; width: 600px;" /></a></p>
<p style="text-align: center;">
	<em>Facets in a library catalogue</em></p>
<p>
	Where the facet relates to a <em>categorical</em> variable &ndash; that is, where there is a set of unique values that the facet can take (such as the names of different publishers) &ndash; a view of the facet values will show the names of the different publishers extracted from the original search results. Selecting a particular publisher, for example, will then limit the displayed results to just those results associated with that publisher. For <em>numerical</em> facets, where the quantities associated with the facet relate to a number or date (that is, a set of things that have a numerical <em>range</em>), the facet view will show the full range of values contained within that particular facet. The user can then select a subset of results that fall within a specified part of that range.</p>
<p>
	In the case of OpenRefine, facets can be defined on a per column basis. For categorical facets, Refine will identify the set of unique values associated with a particular faceted view that are contained within a column, along with a count of how many times each facet value occurs throughout the column. The user can then choose to view only those rows with a particular (facet selected) value in the faceted column. For columns that contain numbers, Refine will generate a numerical facet that spans the range of values contained within the column, along with a histogram that provides a count of occurrences of numbers within small ranges across the full range.</p>
<p>
	So what faceting options does OpenRefine provide?</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/refine-facets.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_11.59.22_AM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_11.59.22_AM.png" /></a></p>
<p>
	Here&rsquo;s how they work (data used for the examples comes from <a href="http://analyticsmadeskeezy.com/2012/10/11/graphing-clustering-community-detection-wholesale-drug-deals-using-excel-and-gephi/" target="_blank">Even Wholesale Drug Dealers Can Use a Little Retargeting: Graphing, Clustering &amp; Community Detection in Excel and Gephi</a> and <a href="http://blog.ouseful.info/2012/10/02/grabbing-twitter-search-results-into-google-refine-and-exporting-conversations-into-gephi/" target="_blank">JSON import from the Twitter search API</a>):</p>
<p>
	Exploring the set of categories described within a column using the <em>text</em> facet:</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/refine-text-facet.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_12.01.46_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_12.01.46_PM.png" style="height: 243px; width: 500px;" /></a></p>
<p>
	Faceted views also allow you to view the facet values by occurrence count, so it&rsquo;s easy to see which the most popular facet values are:</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/refine-facet-sort-by-count.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_12.03.51_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_12.03.51_PM.png" /></a></p>
<p>
	You can also get a tab separated list of facet values:</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/refine-facet-values-tab-separated.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_12.05.07_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_12.05.07_PM.png" /></a></p>
<p>
	<br />
	Sometimes it can be useful to view rows associated with particular facet values that occur a particular number of times, particulalry at the limits (for example, very popular facet values, or uniquely occurring facet values):</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/refine-facet-count.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_12.06.48_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_12.06.48_PM.png" /></a></p>
<p>
	You can look at the range of numerical values contained in a column using the <em>numeric</em> facet:</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/refine-numeric-facet.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_12.08.28_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_12.08.28_PM.png" /></a></p>
<p>
	You can look at the distribution over time of column contents using the <em>timeline</em> facet:</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/refine-date-facet.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_12.10.43_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_12.10.43_PM.png" /></a></p>
<p>
	Faceting by time requires time-related strings to be parsed as such; sometimes, Refine needs a little bit of help in interpreting an imported string as a time string. So for example, given a &ldquo;time&rdquo; string such as<em> Mon, 29 Oct 2012 10:56:52 +0000</em> from the Twitter search API, we can use the GREL function <font face="courier new">toDate(value,&quot;EEE, dd MMM y H:m:s&quot;)</font> to create a new column with time-cast elements.</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/refine-datetime-conversion.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_12.15.55_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_12.15.55_PM.png" /></a></p>
<p>
	(See <a href="http://code.google.com/p/google-refine/wiki/GRELDateFunctions" target="_blank">GRELDateFunctions</a> and the <a href="http://docs.oracle.com/javase/1.4.2/docs/api/java/text/SimpleDateFormat.html" target="_blank">Java SimpleDateFormat class documentation</a> for more details.)</p>
<p>
	You can get a feel for the correlation of values across numerical columns, and explore those correlations further, using the <em>scatterplot</em> facet.</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/refine-scatterplot0.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_12.18.29_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_12.18.29_PM.png" /></a></p>
<p>
	This generates a view that creates a set of scatterplots relating to pairwise combinations of all the numerical columns in the dataset:</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/refine-scatterplot.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_12.19.50_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_12.19.50_PM.png" /></a></p>
<p>
	Clicking on one of these panels allows you to filter points within a particular area of the corresponding scatter chart (click and drag a rectangular area over the points you want to view), effectively allowing you to filter the data across related ranges of two numerical columns at the same time:</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/refine-scatterplot-range.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_2.14.14_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_2.14.14_PM.png" /></a></p>
<p>
	A range of customisable faceting options are also provided, that allow you to define your own faceting functions:</p>
<ul>
	<li>
		the <em>Custom text&hellip;</em> facet;</li>
	<li>
		the <em>Custom Numeric&hellip;</em> facet.</li>
</ul>
<p>
	More conveniently, a range of predefined customized facets are provided that provide shortcuts to &ldquo;bespoke&rdquo; faceting functions:</p>
<p style="text-align: center;">
	<a href="http://ouseful.files.wordpress.com/2012/11/refine-custom-facets.png" target="_blank"><img alt="Screen_shot_2013-03-14_at_2.19.19_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-03-14_at_2.19.19_PM.png" /></a></p>
<p>
	So for example:</p>
<ul>
	<li>
		The <em>word</em> facet splits strings contained in cells into single words, counts their occurrences throughout the column, and then lists unique words and their occurrence count in the facet panel. This faceting option thus provides a way of selecting rows where the contents of a particular column contain one or more specified words. (The user defined GREL custom text facet <font face="courier new">ngram(value,1)</font> provides a similar (though not identical) result &ndash; duplicated words in a cell are identified as unique by the single word ngram function; see also <font face="courier new">split(value,&quot; &quot;)</font>, which does seem to replicate the behaviour of the word facet function.)</li>
	<li>
		The <em>duplicates facet</em> returns boolean values of <font face="courier new">true</font> and <font face="courier new">false</font>; filtering on true values returns all the rows that have duplicated values within a particular column; filtering on <font face="courier new">false</font> displays all unique rows.</li>
	<li>
		The <em>text length</em> facet produces a facet based on the character count of strings in cells within the faceted column; the custom numeric facet <font face="courier new">length(value)</font> achieves something similar; the related measure, word count, can be achieved using the custom numeric facet <font face="courier new">length(split(value,&quot; &quot;))</font></li>
</ul>
<p>
	Note that facet views can be combined. Selecting multiple rows within a particular facet panel provides a Boolean &#39;OR&#39; over the selected values (that is, if <em>any</em> of the selected values appear in the column, the corresponding rows will be displayed). &#39;AND&#39; conditions, even within the same facet, create a separate facet panel for each &#39;AND&#39;-ed condition.</p>
]]></description> 
      <dc:date>2013-05-14T23:45:00+00:00</dc:date>
    </item>

    <item>
      <title>Using the OpenCorporates API to Map Company Information</title>
      <link>http://datadrivenjournalism.net/resources/Using_the_OpenCorporates_API_to_Map_Company_Information</link>
      <guid>http://datadrivenjournalism.net/resources/Using_the_OpenCorporates_API_to_Map_Company_Information#When:23:24:50Z</guid>
      <description><![CDATA[<p>
	<em>Originally published by <a href="https://twitter.com/psychemedia" target="_blank">Tony Hirst</a> on <a href="http://blog.opencorporates.com/" target="_blank">OpenCorporates News</a>. Republished with permission.</em></p>
<p>
	<em>Note: OpenCorporates is a database which makes available data about companies under an open license.</em></p>
<p>
	Looking back over my datajunkie notes, I may only have been using the <a href="http://api.opencorporates.com/" target="_blank">OpenCorporates API</a> since March 2012 but it&rsquo;s become one of the richest data playgrounds for me, in part because of the far-ranging linking it affords both internally and externally, to other data sources.</p>
<p>
	Diving into the data for the first time, not even a year ago now, my first thought was to look for something structured that I could use as a warm-up exercise to familiarise myself with the API. Focusing primarily on UK companies, and looking through some of the results for some of the larger UK registered companies, I noticed that (recently registered?) trademark ownership information was available. (Specifically, the company data points to other <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;data records which include records for trademark registrations).</p>
<p>
	Included in the <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;recorded data was a unique WIPO (World Intellectual Property Organization) generated identifier for each trademark. A quick websearch revealed that WIPO publishes information about trademarks from URLs that include the trademark identifier, so I could use the <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;data &ndash; trademarks registered to a particular corporate entity &ndash; to draw down additional data from WIPO about particular trademarks. For trademarks that are registered images, this included an image file, so it was a relatively simple exercise to generate a quick sketch of (at least some of) the graphical trademarks registered a particular company.</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-03_at_12.08.08_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-03_at_12.08.08_PM.png" style="height: 376px; width: 500px;" /></p>
<p>
	One thing I noticed in searching for companies on <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;is that, for the bigger companies at least, there are lots of corporate entities associated with a particular company name. <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;currently provides an in-part crowd-sourced &ldquo;community groupings&rdquo; feature that tries to bundle together different companies that are part of corporate group, but as I poked around the data I noticed that director filings might provide one way of automatically grouping companies. And so I went graph hunting&hellip;</p>
<p>
	The new release of the <a href="http://api.opencorporates.com/" target="_blank">OpenCorporates API</a>&nbsp;makes it trivial to look up directors, but 6 months or so ago, all we had to hand to was partially structured director filings. It was enough, though, to be able to pull out the directors associated with a particular corporate entity. And having got a list of directors by company, we could do a search around a company with many corporate entities &ndash; <a href="http://opencorporates.com/companies?utf8=%E2%9C%93&amp;q=tesco&amp;commit=Search" target="_blank">Tesco</a>, for example &ndash; and <a href="http://blog.ouseful.info/2012/04/12/mapping-the-tesco-corporate-organisational-sprawl-an-initial-sketch/" target="_blank">map out</a> which entities were connected to which by virtue of common director names. Directors&rsquo; data is starting to appear as such on <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>, which makes this sort of mapping easier, although now we are faced with the problem of deciding whether a two directors records sharing the same name are part of the same &ldquo;director grouping&rdquo;!</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-03_at_12.21.22_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-03_at_12.21.22_PM.png" style="height: 482px; width: 600px;" /></p>
<p>
	Using network visualisation tools such as Gephi, it&rsquo;s possible to easily decompose graphs such as these that show connections between companies and directors to a form that just shows co-director links (directors joined by a common company) or potential corporate groupings (companies connected by N or more common directors).</p>
<p>
	Another possible link between companies was their registered address, so we could also start to <a href="http://blog.ouseful.info/2012/04/13/initial-sketch-of-registered-addresses-of-tesco-companies/" target="_blank">explore</a> which similarly named companies might be sharing a physical office. It&rsquo;s not hard to imagine a time when <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;will associate geolocation based data with corporate entities, which makes this route to identifying pattern and structure in the data from a geographical, location based perspective a ready possibility.</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-03_at_12.23.21_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-03_at_12.23.21_PM.png" style="height: 467px; width: 600px;" /></p>
<p>
	Revealing the implied structures that are hidden away inside the <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;database by virtue of common links between corporate entities, directors, and/or locations represents one significant form of value. But there is also much to be gained through linking the <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;data to other data sources as part of investigations that span datasets. A trivial example is a transparency supporting service that lets us quickly look up (fuzzily, it has to be said!) the directorships of local councillors. Using data from <a href="http://openlylocal.com/" target="_blank">OpenlyLocal</a>, we can pull down a list of councillor names for a particular council, and then look up those names as directors on <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>. Using open spending data, a further step might be to look up the companies that have received payments over &pound;500 from the same council; and then look to see whether there are any matches.</p>
<p>
	Whilst preparing for a recent presentation about open data, it struck me that <a href="http://blog.ouseful.info/2012/11/22/online12-reflections-can-open-public-data-be-disruptive-to-information-vendors/" target="_blank">OpenCorporates</a><a href="http://blog.ouseful.info/2012/11/22/online12-reflections-can-open-public-data-be-disruptive-to-information-vendors/" target="_blank"> has the potential to be disruptive</a> in the sense of Clayton Christensen&rsquo;s &ldquo;Innovator&rsquo;s Dilemma&rdquo;: whilst the data quality may still be lacking in certain respects, <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;is <em>good enough</em> to use at least as a starting point for certain company related data searches. As the corporate mapping tools evolve, curating corporate groupings (both automatically/heuristically, and via human curators) will become ever easier and ever more accurate. As the director database evolves, I&rsquo;m sure techniques will emerge for &ldquo;de-duping&rdquo; director entities.</p>
<p>
	The library world may have tools and ideas to help in this respect, for example via the notion of &ldquo;Virtual International Authority Files&rdquo; (<a href="http://www.oclc.org/viaf/default.htm" target="_blank">VIAF</a>), that provide comprehensive, authoritative identifiers for known entities or some of the competing(!?) personal identifier schemes e.g. (<a href="http://about.orcid.org/" target="_blank">Open Researcher and Contributor ID (ORCID</a>), <a href="http://www.isni.org/" target="_blank">International Standard Name Identifier (ISNI)</a>, both <a href="http://outgoing.typepad.com/outgoing/2011/07/viaf-and-other-ids.html" target="_blank">discussed here</a>.). (To a certain extent, the aim of <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;appears to be the creation of such authority files for corporate entities globally, whatever territory they are registered in.)</p>
<p>
	An approach that I believe holds much promise is the <a href="http://api.opencorporates.com/documentation/Google-Refine-Reconciliation-API" target="_blank">OpenCorporates Reconciliation API</a>. This provides a clean and efficient way of integrating look-ups to <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a> with data cleansing tools such as <a href="https://github.com/OpenRefine/OpenRefine/wiki" target="_blank">OpenRefine</a>. The reconciliation API provides a fuzzy match on a corporate name that returns a set of ranked &ldquo;possible matches&rdquo; in the <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;database and that makes it relatively easy to <a href="http://blog.ouseful.info/2012/04/05/tinkering-with-scraperwiki-the-bottom-line-opencorporates-reconciliation-and-the-google-viz-api/" target="_blank">annotate third party datasets containing company names with OpenCorporates identifiers</a>. This sort of tool may prove invaluable when trying to <a href="http://blog.ouseful.info/2012/05/26/visualising-spending-flows-to-serco-using-openlylocal-aggregated-spending-data/" target="_blank">reconcile council spending data against corporate groupings</a>.</p>
<p>
	I&rsquo;m also hopeful for an appearance of a directors reconciliation service&hellip;;-)</p>
<p>
	By continuing to take an open approach to its data, providing robust linking strategies <em>out</em> to other identifier namespaces, <em>in</em> to the <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;namespace, and <em>within</em>&nbsp;<a href="http://opencorporates.com/" target="_blank">OpenCorporates</a> itself through corporate and director groupings, <a href="http://opencorporates.com/" target="_blank">OpenCorporates</a>&nbsp;can both add value to other services as well as gain value from external enrichment.</p>
]]></description> 
      <dc:date>2013-05-07T23:24:50+00:00</dc:date>
    </item>

    <item>
      <title>DDJSchool Tutorial: How to Create Maps with QGIS</title>
      <link>http://datadrivenjournalism.net/resources/DDJSchool_Tutorial_How_to_Create_Maps_with_QGIS</link>
      <guid>http://datadrivenjournalism.net/resources/DDJSchool_Tutorial_How_to_Create_Maps_with_QGIS#When:10:42:29Z</guid>
      <description><![CDATA[<p dir="ltr">
	<em>By Gregor Aisch, visualization architect and interactive news developer, based on his workshop, Data visualisation, Maps and Timelines on a Shoestring. The workshop is part of the School of Data Journalism 2013 at the International Journalism Festival.</em></p>
<p dir="ltr">
	In this tutorial we will create a simple map of the Tour de France stations of the last 100 years.</p>
<h3>
	Pre-Requirements</h3>
<ol>
	<li>
		<ol>
			<li dir="ltr">
				<p dir="ltr">
					Install and configure QGIS. Install from <a href="http://qgis.org/">http://qgis.org</a>. On most systems there should be a one-click installer that guides you through the process.</p>
			</li>
			<li dir="ltr">
				<p dir="ltr">
					We need to install the following handy plugins:</p>
			</li>
		</ol>
	</li>
</ol>
<ul>
	<li dir="ltr">
		<p dir="ltr">
			<strong>Add Delimited Text Layer</strong> allows us to read and plot points from a CSV file.</p>
	</li>
	<li dir="ltr">
		<p dir="ltr">
			<strong>Edit Any Layer</strong> allows us to easily edit CSV layers</p>
	</li>
</ul>
<ol style="margin-left: 40px;">
	<li dir="ltr">
		<p dir="ltr">
			In menu click Plugins &gt; Fetch Python Plugins. In the appearing dialog type in <code>edit any</code> in the the filter box to narrow down the list:<img alt="" height="434px;" src="http://farm9.staticflickr.com/8396/8686063038_f3d8ea6cc7_z.jpg" width="627px;" /></p>
	</li>
	<li dir="ltr">
		<p dir="ltr">
			Select the plugin and click Install/upgrade plugin. Repeat the same for Add Delimited Text Layer.</p>
	</li>
</ol>
<ol start="3" style="">
	<li dir="ltr">
		<p dir="ltr">
			Download country shapefile from naturalearthdata.com. We are looking for <strong>ne_50m_admin_0_countries</strong>.</p>
	</li>
	<li dir="ltr">
		<p dir="ltr">
			Download our sample dataset from <a href="http://vis4.net/perugia13/tour-de-france.csv">http://vis4.net/perugia13/tour-de-france.csv</a>.&nbsp;</p>
	</li>
</ol>
<h3>
	Creating the Base Map Layer</h3>
<ol style="">
	<li>
		<p dir="ltr">
			<em>Click Layer &gt; Add Vector Layer &gt; Browse </em>and select the file <strong>50m_admin_0_countries.shp</strong>. That&#39;s the shapefile containing the borders of all countries. Click Open to finally add it to the map.<img alt="" height="450px;" src="http://farm9.staticflickr.com/8538/8684945357_fff9c75d61_z.jpg" width="623px;" /></p>
	</li>
	<li>
		<p dir="ltr" style="">
			Filter for countries with ISO code of France. Right-click on the layer and select Query from its context menu. In the text box SQL <em>where clause</em> enter the text: <code>ISO_A3</code> = &#39;FRA&#39;. Make sure to use single-quotes as double-quotes. &nbsp; &nbsp; &nbsp;&nbsp;<img alt="" height="501px;" src="http://farm9.staticflickr.com/8120/8684944763_22b7d51fc5_z.jpg" width="561px;" /></p>
	</li>
	<li>
		<p dir="ltr">
			Zoom to Metropolitan France. You can simply use the Zoom In tool</p>
		<p>
			<img alt="" height="25px;" src="http://farm9.staticflickr.com/8399/8684944903_db63d62797_t.jpg" width="26px;" /></p>
		and draw a rectangle around France.<img alt="" height="467px;" src="http://farm9.staticflickr.com/8401/8684944999_168cde88bc_z.jpg" width="596px;" />
		<p>
			&nbsp;</p>
	</li>
	<li>
		<p dir="ltr">
			You might have noticed by now that France looks rather compressed. That is because by default QGIS is using the Plate Carree projection (nerdily referred to by its EPSG code <code>EPSG:4326</code>). You can change the projection by clicking the following icon in the lower right of the window:<img alt="" height="27px;" src="http://farm9.staticflickr.com/8542/8684944515_fc509a51d1_m.jpg" width="105px;" /></p>
	</li>
	<li>
		<p dir="ltr">
			In the opening dialog activate the checkbox next to <em>&quot;Enable &#39;on the fly&#39; CRS transformation&quot;</em>. Then in the filter text field enter France to search for map projection spezialized for France. For instance you can pick <strong>ED50 / France EuroLambert</strong>. Click OK to activate the projection.<img alt="" height="542px;" src="http://farm9.staticflickr.com/8258/8686063640_6a60b5413a_z.jpg" width="641px;" /></p>
	</li>
	<li>
		<p dir="ltr">
			Let&#39;s change the default styling. Again, right-click the layer and select Properties. The next dialog should be opened with the Style tab selected by default. Click the button Change&hellip; to change the layer style.</p>
	</li>
	<li>
		<p dir="ltr">
			Now we are going to disable the filling by selecting No Brush in <strong>Fill style </strong>drop-down. Change the <strong>border color</strong> to red and increase the border width to 1. Click OK to apply the styling.<img alt="" height="423px;" src="http://farm9.staticflickr.com/8255/8686064000_571773ff5e_z.jpg" width="573px;" /></p>
	</li>
	<li>
		<p dir="ltr">
			By now the resulting map should look like this:</p>
	</li>
</ol>
<p>
	<img alt="" height="256px;" src="http://farm9.staticflickr.com/8544/8686063462_8a98fe57bc_z.jpg" width="271px;" /></p>
<p>
	&nbsp;</p>
<br />
<h3>
	Adding the Tour de France Stations</h3>
<p>
	<img alt="" src="http://farm9.staticflickr.com/8398/8684944663_389066cb89_z.jpg" /></p>
<ul>
	<li dir="ltr">
		<p dir="ltr">
			Now our map contains all the locations of Tour de France stations:</p>
	</li>
</ul>
<p>
	<img alt="" src="http://farm9.staticflickr.com/8540/8686063582_4c69036632_z.jpg" /></p>
<ul>
	<li dir="ltr">
		<p dir="ltr">
			Now we are going to size the stations according to how often they have been part of the tour. Right click the layer <strong>tour-de-france</strong> and select Properties in the context menu.</p>
	</li>
	<li dir="ltr">
		<p dir="ltr">
			Change the value in the <strong>Size</strong> field to a lower value such as 0,5.</p>
	</li>
</ul>
<p>
	<img alt="" src="http://farm9.staticflickr.com/8395/8686063936_b523bb7d03_m.jpg" /></p>
<ul>
	<li dir="ltr">
		<p dir="ltr">
			Now click <em>Advanced &gt; Size scale field &gt; <strong>count</strong></em> to let QGIS use the values in the column count as radius for the symbols.</p>
	</li>
	<li dir="ltr">
		<img alt="" src="http://farm9.staticflickr.com/8260/8684944889_881a0eb5c4.jpg" />
		<p dir="ltr">
			You might also want to make the symbols more transparent by moving the <strong>Transparency slider</strong> to 50%. Your map should now look like this:</p>
	</li>
</ul>
<p>
	<img alt="" src="http://farm9.staticflickr.com/8123/8684944583_a1f9ea2693_z.jpg" /></p>
<ul>
	<li dir="ltr">
		<p dir="ltr">
			Since we must always <a href="http://slides.vis4.net/perugia13/tools/%E2%80%A6">size symbols by area</a>, and not radius, we now need to correct our map. As the area of circles grows proportionally with the square of the radius, we need to compute the square roots of the counts to get proper radii.</p>
	</li>
	<li dir="ltr">
		<p dir="ltr">
			Usually you could have done this already during the data preparation phase and could have simply stored another column in the CSV file. You can also just load the CSV into a spreadsheet tool like Excel and add a new column with the square roots of the counts. However, you can also do this in QGIS using the <strong>Edit Any Layer plugin</strong>.</p>
	</li>
</ul>
<p>
	<img alt="null" src="http://farm9.staticflickr.com/8405/8684945025_675e392b32_m.jpg" style="height: 229px; width: 300px;" /></p>
<ul>
	<li dir="ltr">
		<p dir="ltr">
			In the menu Plugins select <strong><em>Edit Any Layer</em></strong><em> &gt; <strong>Create Editable Layer</strong></em>. Select <strong>tour-de-france</strong> as input layer and chose a name for the output layer. I will simply use <strong>tour-de-france-2</strong> here. Click <strong>OK</strong> to proceed.</p>
	</li>
	<li dir="ltr">
		<p dir="ltr">
			You will be asked for the coordinate system again. WGS84 should be selected by default so simply clicking OKshould work.</p>
	</li>
	<li dir="ltr">
		<p dir="ltr">
			Now open the <strong>attribute table</strong> by right-clicking the new layer and selecting Open Attribute Table in the context menu. You will now see all the data stored in the CSV. Activate <strong>editing mode</strong> by clicking on the little blue pencil icon (see screenshot). Then open the <strong>field calculator</strong> by clicking on the little calculator icon.</p>
	</li>
</ul>
<p>
	<img alt="" src="http://farm9.staticflickr.com/8125/8684945161_aca95b45ff_z.jpg" /></p>
<ul>
	<li dir="ltr">
		<p dir="ltr">
			Make sure that Create a new field is checked and enter a meaningful name for the new column, e.g. <code>radius</code>. As the square roots are going to be decimal numbers, select <strong>Decimal number</strong> (real) as Output field type. Finally enter the following formula into the Expression text field: <code>sqrt(count)</code>. The dialog should now look like shown in the following screenshot. Click OK to proceed.</p>
	</li>
</ul>
<p>
	<img alt="" src="http://farm9.staticflickr.com/8266/8686063388_f2b6128437_z.jpg" /></p>
<ul>
	<li dir="ltr">
		<p dir="ltr">
			Back in the <strong>attribute table</strong> you can take a look at the new column (you may have to scroll the table to the right). Now deactivate <strong>editing mode</strong> by clicking on the blue pencil icon again. QGIS will ask you if you agree to save the changes. Click Save, and Close the attribute table.</p>
	</li>
	<li dir="ltr">
		<p dir="ltr">
			Now hide the layer <strong>tour-de-france</strong> that we created in step 2 by deactivating its checkbox in the layer window on the left. Now we repeat the second step with the new layer (tour-de-france-2), but instead of <strong>count</strong> we will pick the <strong>column radius</strong> for sizing the symbols.</p>
	</li>
	<li dir="ltr">
		<p dir="ltr">
			If you like, change the color to blue and set the transparency to 50%. Finally the map should look like this:</p>
	</li>
</ul>
<p>
	<img alt="" src="http://farm9.staticflickr.com/8542/8686064062_9b6768a4b6_z.jpg" /></p>
<h3>
	Exporting to PDF</h3>
<p dir="ltr">
	In the last section we are going to export our map to PDF.</p>
<p>
	<img alt="" src="http://farm9.staticflickr.com/8401/8686063394_0d49d31b10_z.jpg" /></p>
<ul>
	<li dir="ltr">
		<p dir="ltr">
			Optionally you can disable the black frame by disabling the checkbox <em>General options &gt; Show frame</em> in the panel on the right.</p>
	</li>
	<li dir="ltr">
		<p dir="ltr">
			Now in the menu click on <em>File &gt; Export as PDF</em>&hellip; to finally save the map as PDF. You can now open the map in other graphic tools such as Illustrator to do some fine tuning (adding title, labels etc).</p>
	</li>
</ul>
]]></description> 
      <dc:date>2013-05-02T10:42:29+00:00</dc:date>
    </item>

    <item>
      <title>DDJSchool Tutorial: How to Analyse Datasets with Tableau Public</title>
      <link>http://datadrivenjournalism.net/resources/DDJSchool_Tutorial_Analysing_Datasets_with_Tableau_Public</link>
      <guid>http://datadrivenjournalism.net/resources/DDJSchool_Tutorial_Analysing_Datasets_with_Tableau_Public#When:14:15:02Z</guid>
      <description><![CDATA[<p>
	<em>By <a href="http://driven-by-data.net/">Gregor Aisch</a>, visualization architect and interactive news developer, based on his workshop, Data visualisation, Maps and Timelines on a Shoestring. The workshop is part of the <a href="http://datadrivenjournalism.net/news_and_analysis/announcing_the_school_of_data_journalism_2013_in_perugia">School of Data Journalism 2013</a> at the International Journalism Festival.</em></p>
<h3>
	Prerequisites</h3>
<ol>
	<li>
		Download and install Tableau Public. So far there is only a Windows version available.</li>
	<li>
		Download the dataset <a href="http://www.google.com/url?q=http%3A%2F%2Fslides.vis4.net%2Fperugia13%2Feurostat-youth.csv&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNE7Qz2oC9z8Lgj7J4a8uOeEONbwlA">eurostat-youth.csv</a>.</li>
</ol>
<h3>
	&nbsp;</h3>
<h3>
	Loading a CSV File</h3>
<ol>
	<li>
		Click <em>Open data</em> to open the data import window. From the list on the left pick <em>Text File</em> and select <strong>eurostat-youth.csv</strong>. Make sure that <em>Field separator</em> is set to &quot;Comma&quot;. Click <em>OK</em> to proceed.</li>
</ol>
<p style="text-align: center;">
	<img alt="tableau-csv.png" src="http://datadrivenjournalism.net/uploads/tableau-csv.png" style="height: 414px; width: 600px;" /></p>
<ol start="2" style="">
	<li>
		<em>Note</em>: If this step fails with an error message, try changing your system region to English in the Windows control panel (see <a href="http://www.google.com/url?q=http%3A%2F%2Fslides.vis4.net%2Fperugia13%2Ftools%2Fimg%2Ftableau-locale-settings.png&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNGHKa9GBcCl7Unsamx75dtI2szyfQ">screenshot</a>). It seems that Tableau has cannot comma-separated values if comma is set as decimal separator for numbers in the system settings.</li>
	<li>
		Tableau now lists all the columns of the table in the <em>Data</em> panel on the left. The columns are classified into <em>Dimensions</em> and <em>Measures</em>.</li>
</ol>
<p style="text-align: center;">
	<img alt="tableau-initial-view.png" src="http://datadrivenjournalism.net/uploads/tableau-initial-view.png" style="width: 600px; height: 416px;" /></p>
<ol start="4">
	<li>
		The dataset contains the following columns (all data is 2011 and aggregated on NUTS-2 level):</li>
</ol>
<ul style="margin-left: 40px;">
	<li>
		<strong>secondary_edu</strong>: percentage of population with secondary education.</li>
	<li>
		<strong>youth_unemployed</strong>: percentage of people that aged between 18 and 24, unemployed and do not participate in education or training.</li>
	<li>
		<strong>unemployed_15_24M</strong>: percentage of unemployed males between 15 and 24.</li>
	<li>
		<strong>unemployed_15_24F</strong>: percentage of unemployed females between 15 and 24.</li>
</ul>
<h3>
	Analysing a Dataset</h3>
<p>
	Now we are going to analyze the dataset using Tableau.</p>
<ol>
	<li>
		Now drag the field <strong>youth_unemployed</strong> from <em>Measures</em> to <em>Columns</em>. Then drag <strong>secondary_edu</strong> to <em>Rows</em>.</li>
</ol>
<p style="text-align: center;">
	<img alt="tableau-plot-1.png" src="http://datadrivenjournalism.net/uploads/tableau-plot-1.png" style="width: 600px; height: 423px;" /></p>
<ol start="2" style="">
	<li>
		As you see Tableau computes the sums of the columns instead of plotting the individual values. To fix this we need to right-click the green fields and select <em>Dimension</em>.</li>
</ol>
<p style="text-align: center;">
	<img alt="tableau-plot-2.png" src="http://datadrivenjournalism.net/uploads/tableau-plot-2.png" style="width: 300px; height: 472px;" /></p>
<ol start="3" style="">
	<li>
		If both fields are set to be treated as dimensions you should see a scatterplot like the one shown in the screenshot below. You can see that there is a negative correlation between education and youth unemployment.</li>
</ol>
<p style="text-align: center;">
	<img alt="tableau-plot-3.png" src="http://datadrivenjournalism.net/uploads/tableau-plot-3.png" style="height: 410px; width: 500px;" /></p>
<ol start="4" style="">
	<li>
		Now drag the field <strong>country</strong> from <em>Dimensions</em> to <em>Color</em> to color the plot symbols by country. You can also drag the country to <em>Shape</em> to change the icon.</li>
</ol>
<p style="text-align: center;">
	<img alt="tableau-plot-4.png" src="http://datadrivenjournalism.net/uploads/tableau-plot-4.png" style="width: 500px; height: 396px;" /></p>
<ol start="5" style="">
	<li>
		Add the fields <strong>country</strong> and <strong>geo_name</strong> to the <em>Detail</em> mark to include that piece of information in the tooltips.</li>
	<li>
		Now you can use the color legend and <strong>quick filters</strong> to highlight and hide certain countries.</li>
	<li>
		Focus on Turkey.</li>
	<li>
		Plotting unemployment by gender.</li>
</ol>
<h3>
	&nbsp;</h3>
<h3>
	Bonus: Creating a Map with Tableau</h3>
<ol>
	<li>
		Now we can create a map easily: select the dimension <strong>lat</strong> and <strong>lon</strong> together with the measure <strong>count</strong> (while holding the <em>Ctrl</em> key) and click on <em>Show Me</em> to expand the list of suggested visualizations. Then click on the icon of the map with the blue circles. Click on <em>Show Me</em> again to hide the panel.</li>
</ol>
<p style="text-align: center;">
	<img alt="tableau-select-vis.png" src="http://datadrivenjournalism.net/uploads/tableau-select-vis.png" style="height: 413px; width: 600px;" /></p>
<ol>
	<li>
		Now you should already see the complete table. Tableau is smart enough to use the square roots of the counts for the circles radii automatically, so we don&#39;t have to care about this.</li>
	<li>
		You can make the circles transparent by clicking on <em>Color</em> in the panel <em>Marks</em> and moving the transparency slider. To change the size of the circles, click on <em>Size</em> and adjust the slider.</li>
	<li>
		Now drag the field name to the <em>Mark Label </em>to add the city names as labels to the map.</li>
</ol>
<p>
	&nbsp;</p>
<p>
	<em>Note</em>: This is not a complete tutorial for creating a map with Tableau. The final steps of this tutorial are going to be added in the coming days.&nbsp;</p>
]]></description> 
      <dc:date>2013-04-27T14:15:02+00:00</dc:date>
    </item>

    <item>
      <title>DDJSchool Tutorial: How to Create Charts with Datawrapper</title>
      <link>http://datadrivenjournalism.net/resources/DDJSchool_Tutorial_Creating_Charts_with_Datawrapper</link>
      <guid>http://datadrivenjournalism.net/resources/DDJSchool_Tutorial_Creating_Charts_with_Datawrapper#When:14:01:16Z</guid>
      <description><![CDATA[<p>
	<em>By <a href="http://driven-by-data.net/">Gregor Aisch</a>, visualization architect and interactive news developer, based on his workshop, Data visualisation, Maps and Timelines on a Shoestring. The workshop is part of the <a href="http://datadrivenjournalism.net/news_and_analysis/announcing_the_school_of_data_journalism_2013_in_perugia">School of Data Journalism 2013</a> at the International Journalism Festival.</em></p>
<p>
	This tutorial goes through the basic process of creating simple, embeddable charts using <a href="http://datawrapper.de/">Datawrapper</a>.</p>
<h3>
	Preparing the Dataset</h3>
<ol>
	<li>
		Go to the Eurostat website and download the dataset <a href="http://www.google.com/url?q=http%3A%2F%2Fappsso.eurostat.ec.europa.eu%2Fnui%2Fshow.do%3Fquery%3DBOOKMARK_DS-055624_QID_91D6DBE_UID_-3F171EB0%26layout%3DTIME%2CC%2CX%2C0%3BGEO%2CL%2CY%2C0%3BS_ADJ%2CL%2CZ%2C0%3BAGE%2CL%2CZ%2C1%3BSEX%2CL%2CZ%2C2%3BINDICATORS%2CC%2CZ%2C3%3B%26zSelection%3DDS-055624AGE%2CY_LT25%3BDS-055624SEX%2CT%3BDS-055624S_ADJ%2CSA%3BDS-055624INDICATORS%2COBS_FLAG%3B%26rankName1%3DSEX_1_2_-1_2%26rankName2%3DAGE_1_2_-1_2%26rankName3%3DTIME_1_0_0_0%26rankName4%3DS-ADJ_1_2_-1_2%26rankName5%3DINDICATORS_1_2_-1_2%26rankName6%3DGEO_1_2_0_1%26sortR%3DASC_361_FIRST%26pprRK%3DFIRST%26pprSO%3DPROTOCOL%26ppcRK%3DFIRST%26ppcSO%3DASC%26sortC%3DASC_-1_FIRST%26rStp%3D%26cStp%3D%26rDCh%3D%26cDCh%3D%26rDM%3Dtrue%26cDM%3Dtrue%26footnes%3Dfalse%26empty%3Dfalse%26wai%3Dfalse%26time_mode%3DNONE%26lang%3DEN%26cfo%3D%2523%2523%2523%252C%2523%2523%2523.%2523%2523%2523&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNF5cWjrdNchVGMPILOobMjw_PpPNA">Unemployment rate by sex and age groups - monthly average</a> as Excel spreadsheet. You can also directly download the file&nbsp;<a href="http://www.google.com/url?q=http%3A%2F%2Fslides.vis4.net%2Fperugia13%2Fune_rt_m.xls&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHrEDBPLuDeFwluMb7sMfLm9Ng5XQ">here</a>.</li>
	<li>
		We now need to clean the spreadsheet. Make a copy of the active sheet to keep the original sheet for reference. Now remove the header and footer rows so that GEO/TIME is stored in the first cell (A1).</li>
	<li>
		It&#39;s a good idea to limit the number of shown entries to something around 10 or 15, since otherwise the chart would be too cluttered. Our story will be about how Europe is divided according to the unemployment rate, so I decided to remove anything but the top 3 and bottom 3 countries plus some reference countries of interest in between. The final dataset contains the countries: Greece, Spain, Croatia, Portugal, Italy, Cyprus, France, United Kingdom, Norway, Austria, Germany.</li>
	<li>
		Let&#39;s also try to keep the labels short. For Germany we can remove the appendix <em>&quot;(until 1990 former territory of the FRG)&quot;</em>, since it wouldn&#39;t fit in our chart.</li>
	<li>
		This is how the final dataset looks like in OpenOffice Calc.</li>
</ol>
<p style="text-align: center;">
	<img alt="dw-prepared-dataset.png" src="http://datadrivenjournalism.net/uploads/dw-prepared-dataset.png" style="height: 383px; width: 500px;" /></p>
<h3>
	Loading the Data into Datawrapper</h3>
<ol>
	<li>
		Now, to load the dataset into Datawrapper you can simply copy and paste it. In your spreadsheet software look for the <em>Select All</em> function (e.g. <em>Edit &gt; Select All</em> in OpenOffice).</li>
	<li>
		Copy the data into the clipboard by either selecting <em>Edit &gt; Copy</em> from the menu or pressing <font face="courier new">Ctrl + C</font> (for Copy) on your keyboard.</li>
	<li>
		Go to datawrapper.de and click the link <a href="http://datawrapper.de/chart/4WJd8/upload">Create A New Chart</a>. You can do this either being logged in or as guest. If you create the chart as guest, you can add it to your collection later by signing up for free.</li>
	<li>
		Now paste the data into the big text area in Datawrapper. Click <em>Upload and continue</em> to proceed to the next step.</li>
</ol>
<p style="text-align: center;">
	<img alt="dw-paste.png" src="http://datadrivenjournalism.net/uploads/dw-paste.png" style="height: 348px; width: 500px;" /></p>
<h3>
	Check and Describe the Data</h3>
<ol>
	<li>
		Check if the data has been recognized correctly. Things to check for are the number format (in our example the decimal separator &quot;,&quot; has been replaced with &quot;.&quot;). Also check wether the row and column headers have been recognized.</li>
	<li>
		Change number format to one decimal after &quot;.&quot; to ensure the data is formatted according to your selected language (e.g. decimal comma for France).</li>
	<li>
		Now provide information about the data source. The data has been published by <font face="courier new">Eurostat</font>. Provide the <a href="http://www.google.com/url?q=http%3A%2F%2Fappsso.eurostat.ec.europa.eu%2Fnui%2Fshow.do%3Fquery%3DBOOKMARK_DS-055624_QID_91D6DBE_UID_-3F171EB0%26layout%3DTIME%2CC%2CX%2C0%3BGEO%2CL%2CY%2C0%3BS_ADJ%2CL%2CZ%2C0%3BAGE%2CL%2CZ%2C1%3BSEX%2CL%2CZ%2C2%3BINDICATORS%2CC%2CZ%2C3%3B%26zSelection%3DDS-055624AGE%2CY_LT25%3BDS-055624SEX%2CT%3BDS-055624S_ADJ%2CSA%3BDS-055624INDICATORS%2COBS_FLAG%3B%26rankName1%3DSEX_1_2_-1_2%26rankName2%3DAGE_1_2_-1_2%26rankName3%3DTIME_1_0_0_0%26rankName4%3DS-ADJ_1_2_-1_2%26rankName5%3DINDICATORS_1_2_-1_2%26rankName6%3DGEO_1_2_0_1%26sortR%3DASC_361_FIRST%26pprRK%3DFIRST%26pprSO%3DPROTOCOL%26ppcRK%3DFIRST%26ppcSO%3DASC%26sortC%3DASC_-1_FIRST%26rStp%3D%26cStp%3D%26rDCh%3D%26cDCh%3D%26rDM%3Dtrue%26cDM%3Dtrue%26footnes%3Dfalse%26empty%3Dfalse%26wai%3Dfalse%26time_mode%3DNONE%26lang%3DEN%26cfo%3D%2523%2523%2523%252C%2523%2523%2523.%2523%2523%2523&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNF5cWjrdNchVGMPILOobMjw_PpPNA">link to the dataset</a> as well. This information will be displayed along with the published charts, so readers can trace back the path to the source themselves.</li>
</ol>
<p style="text-align: center;">
	<img alt="dw-source3.png" src="http://datadrivenjournalism.net/uploads/dw-source3.png" /></p>
<ol start="4" style="">
	<li>
		Click <em>Visualize</em> to proceed to the next step.</li>
</ol>
<p style="text-align: center;">
	&nbsp;</p>
<h3>
	Selecting a Visualization</h3>
<ol>
	<li>
		Time series are best represented using line charts, so click on the icon for <strong>line chart</strong> to select this visualization.</li>
	<li>
		Give the chart a <strong>title</strong> that explains both <strong>what</strong> the readers are seeing in the chart and <strong>why</strong> they should care about it. A title like <em>&quot;Youth unemployment rates in Europe&quot;</em> only answers half of the question. A better title would be<em>&quot;Youth unemployment divides Europe&quot;</em> or <em>&quot;Youth unemployment on record high in Greece and Spain&quot;</em></li>
	<li>
		In the <strong>introduction line</strong> we should clarify what exactly is shown in the chart. Click <em>Introduction</em> and type <em>&quot;Seasonally adjusted unemployment rates of under 25 aged&quot;</em>. Of course you can also provide more details about the story.</li>
	<li>
		Now <strong>highlight</strong> the data series that are most important for telling the story. The idea is to let one or two countries really pop out of the chart, and attract the readers attention immediately. Click <em>Highlight</em> and select Greece and Spain from the list. You might also want to include your own country for reference.</li>
	<li>
		Activate <strong>direct labeling</strong> to make it easier to read the chart. Also, since our data is already widely distributed, we can force the extension of the vertical axis to the <strong>zero-baseline</strong>.</li>
	<li>
		We can let the colors support the story by choosing appropriate colors. First, click on the orange field to select it as <strong>base color</strong>. Then click on <em>define custom colors</em> and pick <em>red</em> for high unemployment countries Greece and Spain. For countries with low youth unemployment such as Germany, Norway and Austria we can pick a <em>green</em>, or even better, a <em>blue</em> tone (to respect the color blind). Now the resulting chart should look like this:</li>
</ol>
<p style="text-align: center;">
	<img alt="dw-result1.png" src="http://datadrivenjournalism.net/uploads/dw-result1.png" style="height: 359px; width: 500px;" /></p>
<ol start="7">
	<li>
		Click <em>Publish</em> to proceed to the last step.</li>
</ol>
<h3>
	&nbsp;</h3>
<h3>
	Publishing the Visualization</h3>
<ol>
	<li>
		Now a copy of the chart is being pushed to the content delivery network Amazon S3, which ensures that it loads fast under high traffic.</li>
	<li>
		Meanwhile you can already <strong>copy the embed code</strong> and paste it into your newsrooms CMS to include it in the related news article &ndash; just like you would do with a YouTube video.</li>
</ol>
<p>
	&nbsp;</p>
<p>
	Further tutorials can be found on the Datawrapper <a href="http://docs.datawrapper.de/en/tutorial/">website</a>.&nbsp;</p>
]]></description> 
      <dc:date>2013-04-27T14:01:16+00:00</dc:date>
    </item>

    <item>
      <title>A Survival Guide for Data Visualisation</title>
      <link>http://datadrivenjournalism.net/resources/A_Survival_Guide_for_Data_Visualisation</link>
      <guid>http://datadrivenjournalism.net/resources/A_Survival_Guide_for_Data_Visualisation#When:12:20:33Z</guid>
      <description><![CDATA[<p>
	<em>The following tips are from&nbsp;<a href="http://vis4.net/blog/">Gregor Aisch</a>, visualization architect and interactive news developer,&nbsp;</em><em>based on his workshop,&nbsp;</em><em>Making Data Visualisations: A Survival Guide</em><em>. The workshop is part of the <a href="http://datadrivenjournalism.net/news_and_analysis/announcing_the_school_of_data_journalism_2013_in_perugia">School of Data Journalism 2013</a> at the International Journalism Festival.</em></p>
<h3>
	Charts</h3>
<ul>
	<li>
		Avoid 3D-charts at all costs. The perspective distorts the data, what is displayed &#39;in front&#39; is perceived as more important than what is shown in the background.</li>
	<li>
		Use pie charts with care, and only to show part of whole relationships. Two is the ideal number of slices, but never show more than five. Don&#39;t use pie charts if you want to compare values (use bar charts instead).</li>
	<li>
		Always extend bar charts to the zero baseline. Order bars by value to make comparison easier.</li>
	<li>
		Use line charts to show time series data. That&#39;s simply the best way to show how a variable changes over time.</li>
	<li>
		Avoid stacked area charts, they are <a href="http://www.leancrew.com/all-this/2011/11/i-hate-stacked-area-charts/">easily misinterpreted</a>.</li>
	<li>
		Prefer direct labeling wherever possible. You can save your readers a lot time by placing labels directly onto the visual elements instead of collecting them in a separate legend. Also remember that we cannot differentiate that many colors.</li>
	<li>
		Label your axes! You might think that&#39;s kind of obvious, but still it happens quite often that designers and journalists simply forget to label the axes.</li>
	<li>
		Tell readers why they should care about your graphic. Don&#39;t waste the title line to simply say what data is shown.</li>
</ul>
<h3>
	Color</h3>
<p>
	Colors are difficult. They might make a boring graphic look pretty, but they really need to be handled with care.</p>
<ul>
	<li>
		Use colors sparingly. If possible, use only one or two colors in your visualization.</li>
	<li>
		Double-check your colors for the color blind. You can use tools such as <a href="http://colororacle.org/">ColorOracle </a> to simulate the effect of different types of color blindness.</li>
	<li>
		Say goodbye to red-green color scales. A significant fraction of the male population is color blind and have problems differentiating between red and green tones. Red-blue or purple-green are common alternatives.</li>
	<li>
		In doubt, use color scales from <a href="http://colorbrewer2.com/">colorbrewer2.com</a></li>
</ul>
<h3>
	Maps</h3>
<ul>
	<li>
		Don&#39;t use the <a href="https://en.wikipedia.org/wiki/Mercator_projection">Mercator projection</a> for world maps. The distortion of area is not acceptable. Use <a href="https://en.wikipedia.org/wiki/Map_projection#Equal-area">equal-area projections</a> instead.</li>
	<li>
		Size symbols by area, not diameter. A common mistake is to map data values to the radius of circles. However, our visual system compares symbols by area. Use square root to compute radii from data.</li>
</ul>
<h3>
	Recommended reading</h3>
<ul>
	<li>
		<a href="http://www.amazon.com/Street-Journal-Guide-Information-Graphics/dp/0393072959">Dona Wong: Wallstreet Journal Guide to Information Graphics</a></li>
	<li>
		<a href="https://github.com/propublica/guides/blob/master/news-apps.md">News Apps Style Guide by ProPublica</a></li>
	<li>
		<a href="http://www.treesmapsandtheorems.com/">Jean-Luc DuMont: Trees, maps, theorems &ndash; effective communication for rational minds</a></li>
	<li>
		<a href="http://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728">Darrel Huff: How to lie with statistics</a></li>
</ul>
<p>
	The slides that accompanied this workshop can be found <a href="http://de.slideshare.net/vis4/making-data-visualizations-a-survival-guide/">here</a>.&nbsp;</p>
]]></description> 
      <dc:date>2013-04-26T12:20:33+00:00</dc:date>
    </item>

    <item>
      <title>Social Network Analysis for Journalists Using the Twitter API</title>
      <link>http://datadrivenjournalism.net/resources/social_network_analysis_for_journalists_using_the_twitter_api</link>
      <guid>http://datadrivenjournalism.net/resources/social_network_analysis_for_journalists_using_the_twitter_api#When:13:29:50Z</guid>
      <description><![CDATA[<p>
	<em>A tutorial by <a href="https://twitter.com/mihi_tr">Michael Bauer</a>, instructor at the <a href="http://schoolofdata.org/">School of Data</a>, based on his workshop, Social Network Analysis for Journalists Using the Twitter API. The workshop is part of the <a href="http://www.journalismfestival.com/programme/2013/category/data-journalism-school">School of Data Journalism 2013</a> at the International Journalism Festival.</em></p>
<p>
	Social Network Analysis (SNA) allows us to identify players in a social network and how they are related to each other. For example: I want to identify people who are involved in a certain topic - either to interview or to understand what different groups are engaging in debate.</p>
<p class="c1">
	<span>What you&rsquo;ll need:</span></p>
<ol class="c7" start="1">
	<li class="c0">
		<span>Gephi (</span><span class="c5"><a class="c6" href="http://gephi.org">http://gephi.org</a></span><span>)</span></li>
	<li class="c0">
		<span>OpenRefine (</span><span class="c5"><a class="c6" href="http://openrefine.org">http://openrefine.org</a></span><span>)</span></li>
	<li class="c0">
		<span><a href="https://docs.google.com/a/okfn.org/spreadsheet/ccc?key=0Aq9agjil66PydDlORHRQQlFEckRtYkNVbS15bjd2Vmc#gid=0">The Sample Spreadsheet</a></span></li>
	<li class="c0">
		<span><a href="http://datahub.io/dataset/ddj-2013-04-5-2013-04-18/resource/3163ceb8-63f4-4901-9387-dab3f2b86157">Another sample Dataset</a></span></li>
	<li class="c0">
		<span>Bonus: <a href="https://github.com/mihi-tr/twsearch/raw/master/dist/twitter-search/twsearch.jar">The twitter search to graph tool</a></span></li>
</ol>
<h3>
	&nbsp;</h3>
<h3>
	Step 1: Basic Social Networks</h3>
<p class="c1">
	<span>Throughout this exercise we will use Gephi for graph analysis and visualization. Let&rsquo;s start by getting a small graph into Gephi.</span></p>
<p class="c1">
	<span>Take a look at the sample spreadsheet - this is data from a fictional case you are investigating. </span></p>
<p class="c1" style="text-align: center;">
	<img height="202" src="http://farm9.staticflickr.com/8121/8680797610_7e0fa831c3_o.png" width="318" /></p>
<p class="c1">
	<span>In your country the minister of health (Mark Illinger) recently bought 500,000 respiration masks from a company (Clearsky-Health) during a flu-scare that turned out non-substantial. The masks were never used and rot away in the basement of the ministry. During your investigation you found that during the period of this deal Clearsky-Health was consulted by Flowingwater Consulting and paid them a large sum for their services. A consulting company owned by Adele Meral-Poisson. Adele Meral-Poisson is a well known lobbyist and the wife of Mark Illinger.</span></p>
<p class="c1">
	<span>While we don&rsquo;t need to apply network analysis to understand this fictional case - it helps understanding the sample spreadsheet. Gephi is able to import spreadsheets like this through its &ldquo;Import CSV&rdquo; function. Let&rsquo;s do this.</span></p>
<h3>
	&nbsp;</h3>
<h3>
	Walkthrough Importing CSV into Gephi</h3>
<ol class="c4" start="1">
	<li class="c0">
		<span>Save the Sample Spreadsheet as .csv (or click download as &rarr; comma seperated values if using google spreadsheet)</span></li>
	<li class="c0">
		<span>Start Gephi</span></li>
	<li class="c0">
		<span>Select File &rarr; Open</span></li>
	<li class="c0">
		<span>Select the csv file safed from the sample spreadsheet</span></li>
	<li class="c0">
		<span>You will get a import report - check whether the number of nodes and edges seem correct and there are no errors reported</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="382" src="http://farm9.staticflickr.com/8401/8679685841_3e534d2a0a.jpg" width="585" /></p>
<ol class="c4" start="6">
	<li class="c0">
		<span>The default values are OK for many graphs of this type. If the links between the objects in your spreadsheet are not unilateral but rather bilateral: e.g. lists of friendship, relationships etc., select &quot;undirected&quot; instead of &quot;directed&quot;.</span></li>
	<li class="c0">
		<span>For now we&rsquo;ll go with directed - so click &ldquo;OK&rdquo; to import the graph.</span></li>
</ol>
<p class="c1">
	<span>Now we have imported our simple graph and already see some things on the screen. Let&rsquo;s make it a little nicer by playing around with Gephi a bit.</span></p>
<h3>
	Walkthrough: Basic Layout in Gephi</h3>
<p class="c1">
	<span>See the grey nodes there? Let&rsquo;s make this graph a little easier to read.</span></p>
<ol class="c4" start="1">
	<li class="c0">
		<span>Click on the big fat &ldquo;T&rdquo; on the bottom of the graph screen to activate labels</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="68" src="http://farm9.staticflickr.com/8385/8679685829_0446abb8c6_m.jpg" width="122" /></p>
<ol class="c4" start="2">
	<li class="c0">
		<span>Let&rsquo;s zoom a bit, click on the button on the lower right of the graph window to open the larger menu </span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="66" src="http://farm9.staticflickr.com/8405/8680797206_fd275b3af9_m.jpg" width="135" /></p>
<ol class="c4" start="3">
	<li class="c0">
		<span>You should see a zoom slider now, slide it around to make your graph a little bigger:</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="114" src="http://farm9.staticflickr.com/8403/8680797176_12f84aec21.jpg" width="496" /></p>
<ol class="c4" start="4">
	<li class="c0">
		<span>You can click on individual nodes and drag them around to arrange them nicer.</span></li>
</ol>
<h3>
	&nbsp;</h3>
<h3>
	Step 2: Getting Data Out of Twitter</h3>
<p class="c1">
	<span>Now we have this, let&rsquo;s get some data out of Twitter. We&rsquo;ll be using the Twitter search for a particular hashtag to find information, who talks about it, with whom and what they talk about. Twitter offers loads of information on their API for search. It&rsquo;s here: </span><span class="c5"><a class="c6" href="https://dev.twitter.com/docs/api/1/get/search">https://dev.twitter.com/docs/api/1/get/search</a>.</span></p>
<p class="c1">
	<span>It basically all boils down to using </span><span class="c5"><a class="c6" href="https://search.twitter.com/search.json?q=%23ijf">https://search.twitter.com/search.json?q=%23</a></span><span> tag (the %23 is the &quot;#&quot; character encoded - so %23ijf corresponds to #ijf). If you open the link in the browser you will get the data in JSON format - a format that is ideal for computers to read - but rather hard for you. Luckily Refine can help with this and turn the information into a table. (If you&rsquo;ve never worked with Refine before, consider having a quick look at the <a href="http://schoolofdata.org/handbook/recipes/cleaning-data-with-refine/">Cleaning Data with Refine</a> recipe at the School of Data.</span></p>
<h3>
	Walktrough: Get JSON Data from Web APIs into Refine</h3>
<ol class="c4" start="1">
	<li class="c0">
		<span>Open Refine</span></li>
	<li class="c0">
		<span>Click Create Project </span></li>
	<li class="c0">
		<span>Select &ldquo;Web Adresses&rdquo; </span></li>
	<li class="c0">
		<span>Enter the the following url </span><span class="c5"><a class="c6" href="https://search.twitter.com/search.json?q=%23ijf">https://search.twitter.com/search.json?q=%23ijf</a></span><span>&nbsp;- this searches for the #ijf hashtag on twitter.</span></li>
	<li class="c0">
		<span>Click on &ldquo;Next&rdquo;</span></li>
	<li class="c0">
		<span>You will get &nbsp;a preview window showing you nicely formatted json:</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="255" src="http://farm9.staticflickr.com/8401/8680798254_3fb637e6fe_z.jpg" width="628" /></p>
<ol class="c4" start="7">
	<li class="c0">
		<span>Hover over the curly bracket inside results and click this selects the results as the data to import into a table</span></li>
	<li class="c0">
		<span>Now name your project and click &ldquo;Create project&rdquo; to get the final table</span></li>
</ol>
<p class="c1">
	<span>By now we have the all the tweets in a table. You see there is a ton of information to each tweet: we&rsquo;re interested in who communicates with whom, and about what: so the columns we care about are the &ldquo;text&rdquo; column and the &ldquo;from_user&rdquo; column. Let&rsquo;s delete all the others. (To do so use &ldquo;All &rarr; Edit Columns &rarr; remove/reorder Columns&rdquo;)</span></p>
<p class="c1">
	<span>The &quot;from user&quot; is stripped of the characteristical @ in front of the username that is used in tweets - since we want to extract the usernames from tweets later, let&rsquo;s add a new column with &quot;from&quot; as @tweets. This will involve a tiny bit of programming - don&rsquo;t be afraid it&rsquo;s not rocket science.</span></p>
<h3>
	Walkthrough: Adding a New Column in Refine</h3>
<ol class="c4" start="1">
	<li class="c0">
		<span>On your &quot;from_user&quot; column Select &ldquo;Edit column &rarr; add column based on this column...&rdquo;</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="474" src="http://farm9.staticflickr.com/8528/8679689157_3b6a6e1849.jpg" width="621" /></p>
<ol class="c4" start="2">
	<li class="c0">
		<span>Whoah - Refine wants us to write a little code to tell it what the new column looks like</span></li>
	<li class="c0">
		<span>Let&rsquo;s program then: Later on we&rsquo;ll do something the built-in programming language doesn&rsquo;t let us do, luckily it offers two alternatives Jython (basically Python) and Clojure. We&rsquo;ll go for Clojure as we&rsquo;ll need it later. </span></li>
	<li class="c0">
		<span>Select Clojure as your language</span></li>
	<li class="c0">
		<span>We want to prepend &ldquo;@&rdquo; to each name (here &ldquo;value&rdquo; refers to the value in each row)</span></li>
	<li class="c0">
		<span>Enter (str &ldquo;@&rdquo; value) into the expression field</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="433" src="http://farm9.staticflickr.com/8121/8679688623_0ef4b9bffd.jpg" width="605" /></p>
<ol class="c4" start="7">
	<li class="c0">
		<span>See how the value has been changed from &quot;peppemanzo&quot; to &quot;@peppemanzo&quot; - what happened? In Clojure &ldquo;str&rdquo; can be used to combine multiple strings: (str &ldquo;@&rdquo; value) therefore combines the string &ldquo;@&rdquo; with the string in value - what we wanted to do.</span></li>
	<li class="c0">
		<span>Now simply name your column (eg. &ldquo;From&rdquo;) and click on OK you will have a new column.</span></li>
</ol>
<p class="c1">
	<span>OK we got the first thing of our graph: the &quot;from user&quot; - now let&rsquo;s see what the users talk about. While this will get a lot more complicated - don&rsquo;t worry we&rsquo;ll walk you through....</span></p>
<h3 class="c1 c2">
	Walkthrough: Extracting Users and Hashtags from Tweets</h3>
<ol class="c4" start="1">
	<li class="c0">
		<span>Let&rsquo;s start with adding a new column based on the text column.</span></li>
	<li class="c0">
		<span>The first thing we want to do is to split the tweet into words - we can do so by entering (.split value &ldquo; &ldquo;) into the expression field (make sure your language is still Clojure).</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="339" src="http://farm9.staticflickr.com/8259/8679687785_fc174b244c_z.jpg" width="643" /></p>
<ol class="c4" start="3">
	<li class="c0">
		<span>Our tweet now looks very different - it has been turned into an &ldquo;array&rdquo; of words (an array is simply a collection, you can recognize it by the square brackets).</span></li>
	<li class="c0">
		<span>We don&rsquo;t actually want all words, do we? We only want those starting with &quot;@&quot; or &quot;#&quot; - users and hashtags (so we can see who&rsquo;s talking with whom about what) - so we need to filter our array.</span></li>
	<li class="c0">
		<span>Filtering in Clojure works with the &ldquo;filter&rdquo; function, it takes a filter-function and an Array &nbsp;- the filter-function simply determines whether the value should be kept or not. In our case the filter-function looks like &ldquo;#(contains? #{\# \@} (first %))&rdquo; - looks like comic-book characters swearing? Don&rsquo;t worry, &quot;contains?&quot; basically checks if something is in something else, here whether the first character of the value (&quot;first %&quot;) is either &quot;#&quot; or &quot;@ (#{\# \@})&quot; - exactly what we want. Let&rsquo;s extend our expression:</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="322" src="http://farm9.staticflickr.com/8253/8679688081_2863f49eb6_z.jpg" width="662" /></p>
<ol class="c4" start="6">
	<li class="c0">
		<span>Whoohaa, that seemed to have worked! Now the only thing we need to do is to create a single value out of it - Remember we can do so by using &ldquo;str&rdquo; as above.</span></li>
	<li class="c0">
		<span>If we do this straight away we run into a problem: before we used &ldquo;str&rdquo; as (str &ldquo;1st&rdquo; &ldquo;2nd&rdquo;) now we want to do (str [&ldquo;1st&rdquo; &ldquo;2nd&rdquo;]) because we have an array - Clojure helps us here with the &quot;apply&quot; function: (apply str [&ldquo;1st&rdquo; &ldquo;2nd&rdquo;]) converts (str [&ldquo;1st&rdquo; &ldquo;2nd&rdquo;]) to (str &ldquo;1st&rdquo; &ldquo;2nd&rdquo;). Let&rsquo;s do so...</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="317" src="http://farm9.staticflickr.com/8258/8679691553_7c2370bb85_z.jpg" width="667" /></p>
<ol class="c4" start="8">
	<li class="c0">
		<span>Seems to have worked. Do you spot the problem though?</span></li>
	<li class="c0">
		<span>Exactly the words are joined without a clear seperator, let&rsquo;s add a separator: The easiest way is to interpose a character (e.g. a comma) between all the elements of the array - Clojure does this with the interpose function. (interpose &ldquo;,&rdquo; [1 2 3]) will turn out to be [1 &ldquo;,&rdquo; 2 &ldquo;,&rdquo; 3]. Let&rsquo;s extend our formula: </span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="257" src="http://farm9.staticflickr.com/8124/8680797578_ca636b9dbf_z.jpg" width="639" /></p>
<ol class="c4" start="10">
	<li class="c0">
		<span>So our final expression is: </span></li>
</ol>
<p class="c1 c3">
	<span class="c8">(apply str (interpose &quot;,&quot; (filter #(contains? #{\# \@} (first %)) (.split value &quot; &quot;))))</span></p>
<p class="c1 c3">
	<span>Looks complicated but remember, we built this from the ground up.</span></p>
<ol class="c4" start="11">
	<li class="c0">
		<span>Great - we can now extract who talks to whom! Name your column and click &ldquo;OK&rdquo; &nbsp;to continue.</span></li>
</ol>
<p class="c1">
	<span>Now we have extracted who talks with whom, but the format is still different from what we need in Gephi. So let&rsquo;s clean up to have the data in the right format for Gephi.</span></p>
<h3>
	Waltkthrough Cleaning Up</h3>
<ol class="c4" start="1">
	<li class="c0">
		<span>First, let&rsquo;s remove the two columns we don&rsquo;t need anymore: the text and the original from_user column - do this with &ldquo;all &rarr; edit columns &rarr; remove and reorder columns</span></li>
	<li class="c0">
		<span>Make sure your &ldquo;from&rdquo; column is the first column</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="151" src="http://farm9.staticflickr.com/8546/8680794438_35a59302ce_z.jpg" width="579" /></p>
<ol class="c4" start="3">
	<li class="c0">
		<span>Now, let&rsquo;s split up the to column so we have one row in each entry: use &ldquo;to &rarr; edit cells &rarr; split multi valued cells&rdquo; enter &ldquo;,&rdquo; as separator</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="344" src="http://farm9.staticflickr.com/8535/8679684939_6b796beaa7.jpg" width="354" /></p>
<ol class="c4" start="4">
	<li class="c0">
		<span>Make sure to switch back to &ldquo;rows&rdquo; mode. </span></li>
	<li class="c0">
		<span>Now let&rsquo;s fill the empty rows: Select &ldquo;from &rarr; edit cells &rarr; fill down&rdquo;</span></li>
	<li class="c0">
		<span>Notice that there are some characters in there that don&rsquo;t belong to names (e.g. &ldquo;:&rdquo; ?) Let&rsquo;s remove them.</span></li>
	<li class="c0">
		<span>Select &ldquo;to &rarr; edit cells &rarr; transform...&rdquo;</span></li>
	<li class="c0">
		<span>To replace our transformation is going to be (.replace value &ldquo;:&rdquo; &ldquo;&rdquo;)</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="236" src="http://farm9.staticflickr.com/8546/8679688143_08e98038c5_z.jpg" width="611" /></p>
<p class="c1">
	<span>You&rsquo;ve now cleaned your csv and prepared it enough for Gephi, let&rsquo;s make some graphs! Export the file as csv and open it in Gephi as above.</span></p>
<h3 class="c1 c2">
	A Small Network from a Twitter Search</h3>
<p class="c1">
	<span>Let&rsquo;s play with the network we got through Open Refine:</span></p>
<ol class="c4" start="1">
	<li class="c0">
		<span>Open the csv file from Open Refine in Gephi</span></li>
	<li class="c0">
		<span>Look around the graph - you&rsquo;ll see pretty soon that there are several nodes that don&rsquo;t really make sense: &ldquo;from&rdquo; and &ldquo;to&rdquo; for example. Let&rsquo;s remove them</span></li>
	<li class="c0">
		<span>Switch Gephi to the &ldquo;data laboratory&rdquo; view </span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="101" src="http://farm9.staticflickr.com/8386/8679687767_97622cf279_z.jpg" width="540" /></p>
<ol class="c4" start="4">
	<li class="c0">
		<span>This view will show you nodes and edges found</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="155" src="http://farm9.staticflickr.com/8396/8679688099_3f67db2ac9_z.jpg" width="646" /></p>
<ol class="c4" start="5">
	<li class="c0">
		<span>You can delete nodes by right clicking on them (you could also add new nodes)</span></li>
	<li class="c0">
		<span>Delete &ldquo;from&rdquo; &ldquo;to&rdquo; and &ldquo;#ijf&rdquo; - since this was the term we searched is going to be mentioned everywhere</span></li>
	<li class="c0">
		<span>Activate the labels: it&rsquo;s pretty messy right now so let&rsquo;s add some layouting. To layout simply select the algorithm in layout and click &ldquo;play&rdquo; - see how the graph changes.</span></li>
	<li class="c0">
		<span>Generally combining &ldquo;Force Atlas&rdquo; with &ldquo;Fuchterman Reingold&rdquo; gives nice results. Add &ldquo;Label Adjust&rdquo; to make sure text does not overlap.</span></li>
	<li class="c0">
		<span>Now let&rsquo;s make some more adjustments - let&rsquo;s scale the label by how often things are mentioned. Select &quot;label size&quot; in the ranking menu.</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="134" src="http://farm9.staticflickr.com/8385/8679687711_2c5ba3c4ec.jpg" width="312" /></p>
<ol class="c4" start="10">
	<li class="c0">
		<span>Select &ldquo;Degree&rdquo; as rank parameter</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="244" src="http://farm9.staticflickr.com/8385/8679687711_2c5ba3c4ec.jpg" width="301" /></p>
<ol class="c4" start="11">
	<li class="c0">
		<span>Click on &ldquo;Apply&rdquo; - you might need to run the &ldquo;Label Adjust&rdquo; layout again to avoid overlapping labels</span></li>
	<li class="c0">
		<span>With this simple trick, we see what kind of topics and persons are frequently mentioned</span></li>
</ol>
<p class="c1 c2">
	Great - but it has one downside - the data we&rsquo;re able to get via Open Refine is very limited - so let&rsquo;s explore another route.</p>
<h3>
	A Larger Network from a Twitter Search</h3>
<p class="c1">
	<span>Now we analysed a small network from a search - let&rsquo;s deal with a bigger one. This one is from a week of searching for the Twitter hashtag &quot;#ddj&quot; (you can <a href="http://datahub.io/dataset/ddj-2013-04-5-2013-04-18/resource/3163ceb8-63f4-4901-9387-dab3f2b86157">download it here</a>).</span></p>
<p class="c1">
	<span>The file is in .gexf format - a format for exchanging graph data.</span></p>
<h3>
	Walkthrough: Network Analysis Using Gephi</h3>
<ol class="c4" start="1">
	<li class="c0">
		<span>Open the sample graph file in Gephi</span></li>
	<li class="c0">
		<span>Go to the &quot;Data view&quot; and remove the &quot;#ddj&quot; node</span></li>
	<li class="c0">
		<span>Enable Node labels</span></li>
	<li class="c0">
		<span>Scale labels by Degree (number of edges from this node)</span></li>
	<li class="c0">
		<span>Apply &ldquo;Force Atlas&rdquo;, &ldquo;Fuchterman Rheingold&rdquo; and &ldquo;Label Adjust&rdquo; (remember to stop the first two after a while).</span></li>
	<li class="c0">
		<span>Now you should have &nbsp;a clear view of the network</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="563" src="http://farm9.staticflickr.com/8389/8680795558_f90d868bbe_z.jpg" width="637" /></p>
<ol class="c4" start="7">
	<li class="c0">
		<span>Now let&rsquo;s perform some analysis. One thing we are interested in is: who is central and who&rsquo;s not: in other words: Who is talking and who is talked to.</span></li>
	<li class="c0">
		<span>For this we will run statistics (found in the statistics tab on the right) - we will use the &ldquo;Network diameter&rdquo; statistics first - they tell us about &quot;eccentricity&quot;, &quot;betweenness centrality&quot; and &quot;closeness centrality&quot;. Betweenness centrality tells us which nodes connect nodes: in our terms: high betweenness centrality are nodes who are communication leaders. Low betweenness centrality are topics.</span></li>
	<li class="c0">
		<span>Now we ran our test, we can color the labels according to this. Select the label color ranking and &ldquo;betweenness centrality&rdquo;.</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="276" src="http://farm9.staticflickr.com/8541/8679687403_8d57f9c18f.jpg" width="303" /></p>
<ol class="c4" start="10">
	<li class="c0">
		<span>Pick colors as you like them - I prefer light colors and a dark background.</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="445" src="http://farm9.staticflickr.com/8395/8679696051_8a85c1dbf1_z.jpg" width="510" /></p>
<ol class="c4" start="11">
	<li class="c0">
		<span>Now let&rsquo;s do something different. Let&rsquo;s try to detect the different groups of people who are involved in the discussion. This is done with the &ldquo;Modularity&rdquo; statistic.</span></li>
	<li class="c0">
		<span>Color your labels using the &ldquo;Modularity Class&rdquo; - now you see different clusters of people who are involved in the discussion</span></li>
</ol>
<p class="c1 c3" style="text-align: center;">
	<img height="568" src="http://farm9.staticflickr.com/8523/8680797330_50cef6f74f_z.jpg" width="634" /></p>
<p class="c1 c2">
	&nbsp;</p>
<p class="c1">
	<span>Now we have analysed a bigger network - found the important players and the different groups active in the discussions - all by searching Twitter and storing the result.</span></p>
<h3>
	Bonus: Scraping the Twitter Search with a Small Java Utility</h3>
<p class="c1">
	<span>If you have downloaded the .jar file mentioned above - it&rsquo;s a scraper extracting persons and hastags from Twitter - think of what we did previously but automated. To run it use:</span></p>
<pre>
<code>java twsearch.jar &quot;#ijf&quot; 0 ijf.gexf </code></pre>
<p class="c1">
	<span>This will search for #ijf on Twitter every 20 seconds and write it to the file ijf.gexf - the .gexf format is a graph format understood by Gephi. If you want to end data collection: press &quot;ctrl-c&quot; - simple isn&rsquo;t it? - In fact the utility just runs using Java - it is written entirely in Clojure (the language we used to work with the tweets above).</span></p>
]]></description> 
      <dc:date>2013-04-25T13:29:50+00:00</dc:date>
    </item>

    <item>
      <title>Using Excel to Do Precision Journalism</title>
      <link>http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism</link>
      <guid>http://datadrivenjournalism.net/resources/using_excel_to_do_precision_journalism#When:09:45:20Z</guid>
      <description><![CDATA[<p>
	<em>A tutorial by <a href="https://twitter.com/sdoig" target="_blank">Steve Doig</a>, journalism professor at ASU&#39;s Cronkite School and Pulitzer-winning data journalist, based on his workshop, <a href="http://www.journalismfestival.com/programme/2013/excel-for-journalists" target="_blank">Excel for Journalists</a>. The workshop is part of the <a href="http://www.pbs.org/idealab/2013/03/the-school-of-data-journalism-europes-biggest-data-journalism-event080.html">School of Data Journalism 2013</a>&nbsp;at the <a href="http://www.journalismfestival.com/">International Journalism Festival</a>.</em></p>
<p style="text-align: center;">
	&nbsp;</p>
<p style="text-align: center;">
	<a href="http://www.flickr.com/photos/okfn/8678211770/" title="School of Data Journalism Perugia by okfn, on Flickr"><img alt="School of Data Journalism Perugia" src="http://farm9.staticflickr.com/8258/8678211770_8592f5c1de.jpg" style="width: 350px; height: 350px;" /></a></p>
<p style="text-align: center;">
	<em>Steve Doig at the School of Data Journalism in Perugia. Image by Lucy Chambers</em></p>
<p>
	Microsoft Excel is a powerful tool that will handle most tasks that are useful for a journalist who needs to analyze data to discover interesting patterns. These tasks include:</p>
<ul>
	<li>
		Sorting</li>
	<li>
		Filtering</li>
	<li>
		Using math and text functions</li>
	<li>
		Pivot tables</li>
</ul>
<h3>
	Introduction to Excel</h3>
<p>
	Excel will handle large amounts of data that is organized in table form, with rows and columns. The columns (which are labeled A, B, C&hellip;) list the variables (like Name, Age, Number of Crimes, etc.) Typically, the first row holds the names of the variables. The rest of the rows are for the individual records or cases being analyzed. Each cell (like A1) holds a piece of data.</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.11.02_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.11.02_PM.png" /></p>
<p>
	Modern versions of Excel will hold as many as 1,048,576 records with as many as 16,384 variables! An Excel spreadsheet also will hold multiple tables on separate sheets, which are tabbed on the bottom of the page.</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.12.18_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.12.18_PM.png" /></p>
<h3>
	Sorting</h3>
<p>
	One of the most useful abilities of Excel is to sort the data into a more revealing order. Too often, we are given lists that are in alphabetical order, which is useful only for finding a particular record in a long list. In journalism, we usually are more interested in extremes: The most, the least, the biggest, the smallest, the best, the worst.&nbsp;</p>
<p>
	Consider the data used in this workshop, a list of the provinces of Italy showing the number of various kinds of crimes reported during a recent year.&nbsp; Here is how it looks sorted in alphabetical order of province name:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.14.03_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.14.03_PM.png" /></p>
<p>
	Far more interesting would be to sort it in descending order of the total number of crimes, with the most crime-ridden city at the top of the list:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.17.02_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.17.02_PM.png" /></p>
<p>
	There are two methods of sorting. The first method is quick and can be used for sorting by a single variable. Put the cursor in the column you wish to sort by (&ldquo;Delitti in totale&rdquo; in this case) and then click the Z-A button:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.18.42_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.18.42_PM.png" /></p>
<p>
	But beware! Put the cursor in the column, but DO NOT select the column letter (C, in this case) and then sort. Consider the example below:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.20.32_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.20.32_PM.png" /></p>
<p>
	Doing that will sort ONLY the data in that column, thereby disordering your data! Notice well how this can happen!</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.21.51_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.21.51_PM.png" /></p>
<p>
	The other method of sorting is for when you want to sort by more than one variable. For instance, suppose we wish to sort the crime data first by Territerio in alphabetical order, but then by &ldquo;Delitti in Totale&rdquo; in descending order within each Territerio. To do that, go to the toolbar, click on &ldquo;Data&rdquo; and then &ldquo;Sort&hellip;&rdquo;, and then choose the variables by which you wish to sort. Then click &ldquo;OK&rdquo;.</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.23.45_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.23.45_PM.png" /></p>
<p>
	The result will be this:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.24.28_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.24.28_PM.png" /></p>
<h3>
	Filtering</h3>
<p>
	Sometimes you want to examine only particular records from a large collection of data. For that, you can use Excel&rsquo;s Filter tool. On the toolbar, go to &ldquo;Data&hellip;Filter&hellip;Autofilter&rdquo;. Small buttons will appear at the top of each column:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.25.47_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.25.47_PM.png" /></p>
<p>
	Suppose we wish to see only the records from the territerio of Lazio. Click on the button on the Territerio column and choose Lazio from the list. This is the result:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.26.46_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.26.46_PM.png" /></p>
<p>
	Notice that you now are seeing only rows 36, 44, 78, 80 and 104.</p>
<p>
	More complicated filters are possible. For instance, suppose you wish to see only records in which &ldquo;Delitti in totale&rdquo; is greater than or equal to 50,000. Click on the button and choose &ldquo;Custom Filter&hellip;&rdquo;:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.29.08_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.29.08_PM.png" /></p>
<p>
	You could also, for instance, choose records in which &ldquo;Delitti in totale&rdquo; is greater than 50,000 and &ldquo;Omicidi&rdquo; is less than or equal to 25.</p>
<h3>
	Functions</h3>
<p>
	Excel has many built-in functions useful for performing math calculations and working with dates and text. For instance, assume that we wish to calculate the total number of crimes in all the provinces. To do this, we would go to the bottom of Column C, skip a row, and then enter this formula IN Cell C106: =SUM(C2:C104). The equals sign (=) is necessary for all functions. The colon (:) means &ldquo;all the numbers from Cell C2 to Cell 104&rdquo;. The result is this:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_1.39.47_PM.png" src="http://datadrivenjournalism.net/uploads/teasers/Screen_shot_2013-04-24_at_1.39.47_PM.png" /></p>
<p>
	(The reason for skipping a row is to separate the sum from the main table so that the table can be sorted without pulling the sum into the table during the sorting operation. This way the sum will stay at the bottom of the column.</p>
<p>
	Often you will want to do a calculation on each row of your data table. For instance, you might want to calculate the crime rate (the number of crimes per 100,000 population), which would let you compare the crime problem in cities of different sizes. To do this, we would create a new variable called &ldquo;Crime Rate&rdquo; in Column L, the first empty column. Then, in Cell L2, we would enter this formula:<br />
	=(C2/J2)*100000.&nbsp; This divides the total crimes by the population, then multiplies the result by 100,000.&nbsp; (Notice that there are no spaces and no thousands separators used in the formula.) Here is the result:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_1.41.55_PM.png" src="http://datadrivenjournalism.net/uploads/teasers/Screen_shot_2013-04-24_at_1.41.55_PM.png" /></p>
<p>
	It would be very tedious to repeat writing that calculation in each of 103 rows of data. Happily, Excel has a way to rapidly copy a formula down a column of cells. To do that, you careful move the cursor (normally a big fat white cross) to the bottom right corner of the cell containing the formula. When it is in the right spot, the cursor will change to a small black cross. At that point, you can double-click and the formula will copy down the column until it reaches a blank cell in the column to the left. This would be the result:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_3.14.16_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_3.14.16_PM.png" /></p>
<p>
	Notice that the formula changes for each row, so that Row 6 is =(C6/J6)*100000.</p>
<p>
	Now, if we sort by Crime Rate in descending order, we see the cities with the worst crime problems:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.34.26_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.34.26_PM.png" /></p>
<p>
	and sorting in ascending order, the least crime:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.35.09_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.35.09_PM.png" /></p>
<p>
	Here are some other useful Excel functions that can be used in similar ways:</p>
<p>
	(You can add, subtract, multiply or divide by using the symbols + - * and /)<br />
	=AVERAGE &ndash; calculates the arithmetic mean of a column or row of numbers<br />
	=MEDIAN &ndash; finds the middle value of a column or row of numbers<br />
	=COUNT &ndash; tells you how many items there are in a column or row<br />
	=MAX &ndash; tells you the largest value in a column or row<br />
	=MIN &ndash; tells you the smallest value in a column or row</p>
<p>
	There are also a variety of text functions that can join and cut apart text strings. For instance:</p>
<p>
	If &ldquo;Steve&rdquo; is in Cell B2 and &ldquo;Doig&rdquo; is in Cell C2, then =B2&amp;&rdquo; &ldquo;&amp;C2 will produce &ldquo;Steve Doig&rdquo;. And =C2&amp;&rdquo;, &ldquo;&amp;B2 will produce &ldquo;Doig, Steve&rdquo;. Other text functions include:</p>
<p>
	=SEARCH &ndash; this will find the start of a desired string of text in a larger string.<br />
	=LEN &ndash; this will tell you how many characters are in a text string.<br />
	=LEFT &ndash; this will extract however many characters you specify starting from the left.<br />
	=RIGHT -- this will extract characters starting from the right.<br />
	<br />
	You can also do date arithmetic, such as calculating the number of days or years between two dates, or hours, minutes and/or seconds between two times. For instance, to calculate on April 24, 2010, the age in years of someone whose birth date is in cell B2, you could use this formula: =(DATE(2010,4,24)-B2)/365.25. The first part of the formula calculates the number of days between the two dates, then that is divided by 362.25 (the .25 accounts for leap years) to produce the years. Another useful date function is =WEEKDAY, which will tell you on which day of the week a chosen date falls. For instance =WEEKDAY(DATE(1948,4,21)) returns a 4, which means I was born on a Wednesday.</p>
<p>
	Excel offers well over 200 functions in a variety of categories beyond just math, dates and text: Financial, engineering, database, logical, statistical, etc. But it is unlikely that you will need to be familiar with more than a dozen or so functions, unless you are a journalist with a very specialized beat such as economics.</p>
<h3>
	Pivot Tables</h3>
<p>
	One of Excel&rsquo;s best tricks is the ability to summarize data that is in categories. The tool that does this is called a pivot table, which creates an interactive cross-tabulation of the data by category.</p>
<p>
	To create a pivot table, every column of your data must have a variable label; in fact, it is always good practice to put in a variable label any time you insert or add a new column. First, you make sure your cursor is on some cell in the table. Then go to the tool bar and click on &ldquo;Data&hellip;Pivot Table Report&rdquo;. A window will pop up called the &ldquo;Pivot Table Wizard&rdquo;. Just hit &ldquo;Next&hellip;Next&hellip;Finish&rdquo; on the three steps of the wizard.</p>
<p>
	This will open a new sheet that looks like this:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.37.26_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.37.26_PM.png" /></p>
<p>
	To build a pivot table, you should visualize the piece of paper that would answer your question. Our example data shows 103 provinces in the 20 Territorios of Italy. Imagine that you wanted to know the total number of crimes in each Territorio. The piece of paper that would answer that question would list each Territorio, with the total number of crimes next to each name.</p>
<p>
	To build this pivot table, we would use the mouse to pick up &ldquo;Territorio&rdquo; from the list of variables in the floating box to the right, and place it in the &ldquo;Drop Row Fields Here&rdquo; box. We would then take the &ldquo;Delitti in totale&rdquo; variable and put it in the &ldquo;Drop Data Items Here&rdquo; box. This would be the result:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.38.25_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.38.25_PM.png" /></p>
<p>
	If you click the cursor into the &ldquo;Total&rdquo; Column and hit the Z-A button to sort, you will get this:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.39.24_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.39.24_PM.png" /></p>
<p>
	It is possible to make very complicated pivot tables, with multiple subtotals. But I recommend making a new pivot table for each question you want to answer; several simple tables are easier to understand than one very complicated table that tries to answer many questions at once.</p>
<p>
	The&nbsp;<img alt="Screen_shot_2013-04-24_at_12.40.31_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.40.31_PM.png" />&nbsp;button on the variable list opens up a box that will let you make a variety of other choices about how to summarize and display the result:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.41.00_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.41.00_PM.png" /></p>
<h3>
	Other Excel Tips</h3>
<p>
	Excel will import data that comes in a variety of formats other than the native *.xls that Excel uses. For instance, Excel can readily import text files in which the data columns are separated by commas, tabs, or other characters, like this:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.42.01_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.42.01_PM.png" /></p>
<p>
	If you find a web page with data in table format (rows and columns), Excel can open it as a spreadsheet.</p>
<p>
	Excel also will let you format your data to make it more readable. For instance, &ldquo;Format&hellip;Cells&hellip;Number&rdquo; will allow you to put thousands separators in your numbers, like this:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-24_at_12.42.46_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-24_at_12.42.46_PM.png" /></p>
<h3>
	Finding Data</h3>
<p>
	Government agencies are starting to make some of their data available in Excel or other formats. For instance, ISTAT.IT has very comprehensive data about Italian demographics, economy, crime, etc. Many of their tables can be downloaded directly as Excel files.</p>
<p>
	One trick to find interesting data would be to use Google and add these search terms: site:.it filetype:xls.</p>
<p>
	&nbsp;</p>
<p>
	<strong>Note:</strong>The slides from this workshop are available on <a href="http://www.slideshare.net/lilianabounegru/steve-doig-excel-for-journalists">SlideShare</a>.</p>
<p>
	<em><strong>Need Help?</strong><br />
	Feel free to send me an email at steve.doig@asu.edu. I will be glad to give you advice if I can.&nbsp;</em></p>
]]></description> 
      <dc:date>2013-04-24T09:45:20+00:00</dc:date>
    </item>

    <item>
      <title>Setting the Record(s) Straight: Dealing with Bad Data</title>
      <link>http://datadrivenjournalism.net/resources/setting_the_records_straight_dealing_with_bad_data</link>
      <guid>http://datadrivenjournalism.net/resources/setting_the_records_straight_dealing_with_bad_data#When:23:02:26Z</guid>
      <description><![CDATA[<p>
	<em>By Abbott Katz, London-based Excel instructor and freelance writer, author of <a href="http://www.apress.com/9781430235453" target="_blank">Excel 2010 Made Simple</a>&nbsp;and <a href="http://spreadsheetjournalism.com/" target="_blank">spreadsheetjournalism.com</a>.</em></p>
<p>
	If you&#39;ve ever received two enveloped pleas from one charity with an eye on your checkbook &ndash; and you probably have &ndash; you&#39;ve taken a hit from a database that&rsquo;s misfired, a loose cannon that&rsquo;s been spraying surplus names all over your postal code. You know what&#39;s gone wrong, too: the database has listed you twice, either by sanctioning identical entries of your name and address, or by registering two variants of your name and/or place of residence, gulling the database into believing that you&#39;re in fact two different persons.</p>
<p>
	Either way, the point is made. Spreadsheet data redundancies and discrepancies need to be ruthlessly shown the door, lest they infiltrate the data and lay all your lovely, crafty formulas to waste. The need for data integrity is integral, a primordial piece of spreadsheet infrastructure without which your analytical engine is doomed to stall; and while infrastructure is boring, it&rsquo;s necessary, too. So let&rsquo;s consider three data integrity issues, all of which I&rsquo;ve encountered in real-world spreadsheets, and see how they might be resolved.</p>
<p>
	Start with this very simple batch of data, representing say, serial contributions to a political campaign:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-10_at_10.29.55_AM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-10_at_10.29.55_AM.png" /></p>
<p>
	Feed these into a pivot table, and you&rsquo;ll get something like this:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-10_at_10.31.08_AM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-10_at_10.31.08_AM.png" /></p>
<p>
	<br />
	Now that doesn&rsquo;t look right, does it? You&rsquo;ll find among the first principles of pivot tables the axiom that items gathered into the Row (or Column) Labels area appear once each &ndash; and Mr. Bowie&rsquo;s gotten himself invited twice. What went wrong?</p>
<p>
	The problem - and again I&rsquo;ve bumped up against this problem more than once &ndash; is that the two Bowie entries were in fact respectively keyed:</p>
<p>
	David Bowie</p>
<p>
	David Bowie[space]</p>
<p>
	And those entries are simply incomparable. Typing David Bowie[space] is, relative to David Bowie, tantamount to having typed Barack Obama &ndash; they&rsquo;re two, irretrievably separate items. The obvious workaround? Delete the space. That will work, but this remedy isn&rsquo;t overwhelmingly practicable if you&rsquo;re faced with 20,000 rows of names, all of which need to be vetted for precisely the same potential anomaly. The real remedy, rather, lies in the workings of a low-profile and concise Excel function called TRIM, which pares superfluous spaces from cell entries.</p>
<p>
	For example - say David Bowie[space] finds itself in cell A3. If so,</p>
<p style="text-align: center;">
	=TRIM(A3)</p>
<p>
	will return David Bowie, minus the following space. If you work proactively, then, and post TRIM in an uninhabited column, copy it down the parallel column&rsquo;s worth of entries (in this case A) and finish off the process with a Copy&gt;Paste Special Values sequence back onto the A column, your gratuitous space problem should be cleared away &ndash; and David Bowie will haunt your pivot table just once.</p>
<p>
	Next case. This one&#39;s the classic predicament, the one with which we opened this post. Here the data suffer from record duplication, a particular impediment to lists built atop unique entries, e.g., a membership roster or roll of university students. Here&#39;s another elementary example, which should nevertheless embody the problem:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-10_at_10.38.32_AM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-10_at_10.38.32_AM.png" /></p>
<p>
	(Again, extrapolate to a list 20,000 names long.)</p>
<p>
	You&#39;ve doubtless observed a complication here. Some records exhibit a two-field duplication, and others only one; and before you proceed you need to ask yourself exactly which duplicates you want ousted from the larger list. In this case it seems clear that you&#39;ll want to squelch only those records in which both fields are matched across two or more records &ndash; here, only Mary Walters. Again, that judgment won&#39;t come as easily with 20,000 names worth of students, and that&#39;s where Excel&#39;s Remove Duplicates feature comes in. Click anywhere among the records, then click the Data tab &gt; Remove Duplicates button in the Data Tools button group (again, I&rsquo;m using Excel 2010).&nbsp; You&#39;ll see:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-10_at_10.39.24_AM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-10_at_10.39.24_AM.png" /></p>
<p>
	Tick the My data has headers box (I&#39;d have written data have, but Microsoft didn&#39;t ask me), thus overriding the Column A and B entries above with First Name and Surname. Leave First Name and Surname ticked, because again we want to search for record duplication across both fields, an important point. Click OK and you&#39;ll see:</p>
<p style="text-align: center;">
	<img alt="Screen_shot_2013-04-10_at_10.51.23_AM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-10_at_10.51.23_AM.png" /></p>
<p>
	No grammar check there! But no matter; Remove Duplicates has done its job, having extirpated the redundant entry for Mary Walters.</p>
<p>
	Just keep in mind that a cogent implementation of Remove Duplicates requires a bit of a think-through. After all, a real-world university enrollment list might very well divulge any number of different students owning the same last and surname, and so you&#39;d need to stipulate additional duplicate search parameters in order to&nbsp; snag truly redundant entries, e.g. by ticking First Name, Surname, and Address, assuming these fields feature in the data. Indeed &ndash; large organizations assign their members unique IDs is precisely in order to impose one, irreducibly unambiguous identifier upon their members; think of the UK National Insurance or American Social Security Numbers, for example (and you&#39;re looking at someone who has both). On the other hand, of course, there&#39;s no definitive way for Excel to know if the two instances of Mary Walters in my primitive list signify the same or two different individuals; again, one assumes additional clarifications would avail in a real-world worksheet.</p>
<p>
	The third integrity issue is perhaps the most nettling, because it asks the most of the user/investigator &ndash; a good old eyeballing of the data, banded with equally retro data entry and re-entry chores. I&rsquo;ll exemplify the issue by recalling a request I received from the Times Higher Education (UK) weekly to review some of their spreadsheet data in connection with their World University Rankings. I looked at ranking data for both 2011 and 2012, and copied and pasted the data from the two years into one megasheet; and in the course of that perusal I began to notice that some universities reported different spellings across the two years, e.g.</p>
<p>
	University of California, Berkeley<br />
	University of California Berkeley</p>
<p>
	University of Illinois at Chicago<br />
	 University of Illinois &ndash; Chicago</p>
<p>
	And yes, a lot of this:</p>
<p>
	University of Chicago<br />
	 University of Chicago[space]</p>
<p>
	Now of course Excel is the paradigmatic idiot savant; it can sum 500,000 numbers at the tap of an Enter key, even as it lacks the human, inferential savvy to declare the above pairs equivalent.</p>
<p>
	What to do? Here the problem is the absence of redundancy among the pairs, and in order to achieve the necessary twinnings you could first apply the TRIM stratagem I recommended earlier. Next, you could sort the university names, (closely) observe how neighboring names are spelled, and decide which variant should prevail, in the event a university presents itself with disparate spellings. You could also run the names through a pivot table and assign the institution Name field to the Row Labels area, where again each uniquely-spelled name will appear but once. And here, unique appearances mean you could have a problem. You don&rsquo;t want to see Harvard&rsquo;s name twice in the pivot results &ndash; you want to see it once, even as it appears twice in the data &ndash; once for 2011, once for 2012. But whatever the means for trapping errant spellings, you&rsquo;ve going to have to do some quaint 20th century editing and retyping.</p>
<p>
	Something that charity didn&rsquo;t do. And that&rsquo;s why they mailed you twice.</p>
]]></description> 
      <dc:date>2013-04-14T23:02:26+00:00</dc:date>
    </item>

    <item>
      <title>Tech Tips: Making Sense of JSON Strings – Follow the Structure</title>
      <link>http://datadrivenjournalism.net/resources/Tech_Tips_Making_Sense_of_JSON_Strings_Follow_the_Structure</link>
      <guid>http://datadrivenjournalism.net/resources/Tech_Tips_Making_Sense_of_JSON_Strings_Follow_the_Structure#When:23:50:11Z</guid>
      <description><![CDATA[<p>
	<em>Originally published by <a href="https://twitter.com/psychemedia" target="_blank">Tony Hirst</a> on <a href="http://blog.ouseful.info/" target="_blank">blog.ouseful.info</a> under a <a href="http://creativecommons.org/licenses/by/3.0/" target="_blank">Creative Commons Attribution</a> licence.</em></p>
<p>
	&nbsp;</p>
<p>
	Reading through the Online Journalism blog post on <a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/" target="_blank">Getting full addresses for data from an FOI response (using APIs)</a>, the following phrase &ndash; relating to the composition of some Google Refine code to parse a <a href="http://en.wikipedia.org/wiki/Json">JSON</a> string from the Google geocoding API &ndash; jumped out at me: &ldquo;This took a bit of trial and error&hellip;&rdquo;</p>
<p style="text-align: center;">
	<a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/" target="_blank"><img alt="Screen_shot_2013-04-02_at_3.10.18_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-02_at_3.10.18_PM.png" style="height: 313px; width: 600px;" /></a></p>
<p>
	Why? Two reasons&hellip; Firstly, because it demonstrates a &ldquo;have a go&rdquo; attitude which you absolutely need to have if you&rsquo;re going to appropriate technology and turn it to your own purposes. Secondly, because it maybe (or maybe not&hellip;) hints at a missed trick or two&hellip;</p>
<p>
	So what trick is missing?</p>
<p>
	Here is an <a href="http://maps.googleapis.com/maps/api/geocode/json?sensor=false&amp;address=mk7%206aa,uk" target="_blank">example</a> of the sort of thing you get back from the Google Geocoder:</p>
<p style="margin-left: 40px;">
	<img alt="Screen_shot_2013-04-02_at_3.13.43_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-02_at_3.13.43_PM.png" style="font-size: 12px;" /><br />
	<em>{ &ldquo;status&rdquo;: &ldquo;OK&rdquo;, &ldquo;results&rdquo;: [ { &quot;types&quot;: [ &quot;postal_code&quot; ], &ldquo;formatted_address&rdquo;: &ldquo;Milton Keynes, Buckinghamshire MK7 6AA, UK&rdquo;, &ldquo;address_components&rdquo;: [ { &quot;long_name&quot;: &quot;MK7 6AA&quot;, &quot;short_name&quot;: &quot;MK7 6AA&quot;, &quot;types&quot;: [ &quot;postal_code&quot; ] }, { &ldquo;long_name&rdquo;: &ldquo;Milton Keynes&rdquo;, &ldquo;short_name&rdquo;: &ldquo;Milton Keynes&rdquo;, &ldquo;types&rdquo;: [ &quot;locality&quot;, &quot;political&quot; ] }, { &ldquo;long_name&rdquo;: &ldquo;Buckinghamshire&rdquo;, &ldquo;short_name&rdquo;: &ldquo;Buckinghamshire&rdquo;, &ldquo;types&rdquo;: [ &quot;administrative_area_level_2&quot;, &quot;political&quot; ] }, { &ldquo;long_name&rdquo;: &ldquo;Milton Keynes&rdquo;, &ldquo;short_name&rdquo;: &ldquo;Milton Keynes&rdquo;, &ldquo;types&rdquo;: [ &quot;administrative_area_level_2&quot;, &quot;political&quot; ] }, { &ldquo;long_name&rdquo;: &ldquo;United Kingdom&rdquo;, &ldquo;short_name&rdquo;: &ldquo;GB&rdquo;, &ldquo;types&rdquo;: [ &quot;country&quot;, &quot;political&quot; ] }, { &ldquo;long_name&rdquo;: &ldquo;MK7&Prime;, &ldquo;short_name&rdquo;: &ldquo;MK7&Prime;, &ldquo;types&rdquo;: [ &quot;postal_code_prefix&quot;, &quot;postal_code&quot; ] } ], &ldquo;geometry&rdquo;: { &ldquo;location&rdquo;: { &ldquo;lat&rdquo;: 52.0249136, &ldquo;lng&rdquo;: -0.7097474 }, &ldquo;location_type&rdquo;: &ldquo;APPROXIMATE&rdquo;, &ldquo;viewport&rdquo;: { &ldquo;southwest&rdquo;: { &ldquo;lat&rdquo;: 52.0193722, &ldquo;lng&rdquo;: -0.7161451 }, &ldquo;northeast&rdquo;: { &ldquo;lat&rdquo;: 52.0300728, &ldquo;lng&rdquo;: -0.6977000 } }, &ldquo;bounds&rdquo;: { &ldquo;southwest&rdquo;: { &ldquo;lat&rdquo;: 52.0193722, &ldquo;lng&rdquo;: -0.7161451 }, &ldquo;northeast&rdquo;: { &ldquo;lat&rdquo;: 52.0300728, &ldquo;lng&rdquo;: -0.6977000 } } } } ] }</em></p>
<p>
	The data represents a Javascript object (JSON = JavaScript Object Notation) and as such has a standard form, a hierarchical form.</p>
<p>
	Here&rsquo;s another way of writing the same object code, only this time laid out in a way that reveals the structure of the object:</p>
<ol>
	<li>
		<font face="courier new">&nbsp;{</font></li>
	<li>
		<font face="courier new">&nbsp;<span style="color:#0000ff;">&quot;status&quot;: &quot;OK&quot;,</span></font></li>
	<li>
		<font face="courier new">&nbsp;&nbsp;<span style="color:#0000ff;">&quot;results&quot;:</span> [ {</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp;&nbsp;<span style="color:#0000ff;">&quot;types&quot;:</span> [ <span style="color:#0000ff;">&quot;postal_code&quot;</span> ]<span style="color:#0000ff;">,</span></font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp;&nbsp;<span style="color:#0000ff;">&quot;formatted_address&quot;: &quot;Milton Keynes, Buckinghamshire MK7 6AA, UK&quot;,</span></font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp;&nbsp;<span style="color:#0000ff;">&quot;address_components&quot;:</span> [ {</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp;&nbsp;<span style="color:#0000ff;">&quot;long_name&quot;: &quot;MK7 6AA&quot;,</span></font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp;&nbsp;<span style="color:#0000ff;">&quot;short_name&quot;: &quot;MK7 6AA&quot;,</span></font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp; <span style="color:#0000ff;">&quot;types&quot;:</span> [ <span style="color:#0000ff;">&quot;postal_code&quot;</span> ]</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; }<span style="color:#0000ff;">,</span> {</font></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &quot;long_name&quot;: &quot;Milton Keynes&quot;,</font></span></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &quot;short_name&quot;: &quot;Milton Keynes&quot;,</font></span></li>
	<li>
		<font face="courier new"><span style="color:#0000ff;">&nbsp; &nbsp; &nbsp; &quot;types&quot;: </span>[ <span style="color:#0000ff;">&quot;locality&quot;, &quot;political&quot;</span> ]</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; }<span style="color:#0000ff;">,</span> {</font></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp;&quot;long_name&quot;: &quot;Buckinghamshire&quot;,</font></span></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &quot;short_name&quot;: &quot;Buckinghamshire&quot;,</font></span></li>
	<li>
		<font face="courier new"><span style="color:#0000ff;">&nbsp; &nbsp; &nbsp;&quot;types&quot;:</span> [ <span style="color:#0000ff;">&quot;administrative_area_level_2&quot;, &quot;political&quot; </span>]</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; }, {</font></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp;&quot;long_name&quot;: &quot;Milton Keynes&quot;,</font></span></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &quot;short_name&quot;: &quot;Milton Keynes&quot;,</font></span></li>
	<li>
		<font face="courier new"><span style="color:#0000ff;">&nbsp; &nbsp; &nbsp; &quot;types&quot;:</span> [<span style="color:#0000ff;"> &quot;administrative_area_level_2&quot;, &quot;political&quot; </span>]</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; }, {</font></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &quot;long_name&quot;: &quot;United Kingdom&quot;,</font></span></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &quot;short_name&quot;: &quot;GB&quot;,</font></span></li>
	<li>
		<font face="courier new"><span style="color:#0000ff;">&nbsp; &nbsp; &nbsp; &quot;types&quot;:</span> [ <span style="color:#0000ff;">&quot;country&quot;, &quot;political&quot;</span> ]</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; }, {</font></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &quot;long_name&quot;: &quot;MK7&quot;,</font></span></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &quot;short_name&quot;: &quot;MK7&quot;,</font></span></li>
	<li>
		<font face="courier new"><span style="color:#0000ff;">&nbsp; &nbsp; &nbsp; &quot;types&quot;:</span> [ <span style="color:#0000ff;">&quot;postal_code_prefix&quot;, &quot;postal_code&quot;</span> ]</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; } ],</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; <span style="color:#0000ff;">&quot;geometry&quot;:</span> {</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp;<span style="color:#0000ff;"> &quot;location&quot;:</span> {</font></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; &quot;lat&quot;: </font></span><font face="courier new">52.0249136,</font></li>
	<li>
		<font face="courier new"><span style="color:#0000ff;">&nbsp; &nbsp; &nbsp; &nbsp; &quot;lng&quot;: </span>-0.7097474</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp; }<span style="color:#0000ff;">,</span></font></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &quot;location_type&quot;: &quot;APPROXIMATE&quot;,</font></span></li>
	<li>
		<font face="courier new"><span style="color:#0000ff;">&nbsp; &nbsp; &nbsp; &quot;viewport&quot;:</span> {</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp;<span style="color:#0000ff;"> &quot;southwest&quot;:</span> {</font></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &quot;lat&quot;: </font></span><font face="courier new">52.0193722,</font></li>
	<li>
		<font face="courier new"><span style="color:#0000ff;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &quot;lng&quot;: </span>-0.7161451</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; }<span style="color:#0000ff;">,</span></font></li>
	<li>
		<font face="courier new"><span style="color:#0000ff;">&nbsp; &nbsp; &nbsp; &nbsp; &quot;northeast&quot;: </span>{</font></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &quot;lat&quot;: </font></span><font face="courier new">52.0300728</font><span style="color:#0000ff;"><font face="courier new">,</font></span></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &quot;lng&quot;: </font></span><font face="courier new">-0.6977000</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; }</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp; }<span style="color:#0000ff;">,</span></font></li>
	<li>
		<font face="courier new"><span style="color:#0000ff;">&nbsp; &nbsp; &nbsp;&quot;bounds&quot;: </span>{</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; <span style="color:#0000ff;">&quot;southwest&quot;:</span> {</font></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &quot;lat&quot;: </font></span><font face="courier new">52.0193722</font><span style="color:#0000ff;"><font face="courier new">,</font></span></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &quot;lng&quot;: </font></span><font face="courier new">-0.7161451</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; }<span style="color:#0000ff;">,</span></font></li>
	<li>
		<font face="courier new"><span style="color:#0000ff;">&nbsp; &nbsp; &nbsp; &nbsp; &quot;northeast&quot;:</span> {</font></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &quot;lat&quot;: </font></span><font face="courier new">52.0300728,</font></li>
	<li>
		<span style="color:#0000ff;"><font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &quot;lng&quot;: </font></span><font face="courier new">-0.6977000</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp; &nbsp; }</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; &nbsp; }</font></li>
	<li>
		<font face="courier new">&nbsp; &nbsp; }</font></li>
	<li>
		<font face="courier new">&nbsp; } ]</font></li>
	<li>
		<span style="font-family: 'courier new'; font-size: 12px;">&nbsp;}</span></li>
</ol>
<p>
	&nbsp;</p>
<h3>
	Making Sense of the Notation</h3>
<p>
	At its simplest, the structure has the form: {&ldquo;attribute&rdquo;:&rdquo;value&rdquo;}</p>
<p>
	If we parse this object into the <em>jsonObject</em>, we can access the value of the attribute as <em>jsonObject.attribute</em> or <em>jsonObject[&quot;attribute&quot;]</em>. The first style of notation is called a <em>dot notation</em>.</p>
<p>
	We can add more &quot;attribute:value&quot; pairs into the object by separating them with commas: <em>a={&ldquo;attr&rdquo;:&rdquo;val&rdquo;,&rdquo;attr2&Prime;:&rdquo;val2&Prime;} </em>and address them (that is, refer to them) uniquely: <em>a.attr</em>, for example, or <em>a[&quot;attr2&quot;]</em>.</p>
<p>
	Try it out for yourself&hellip; Copy and paste the following into your browser address bar (where the URL goes) and hit return (i.e. &ldquo;go to&rdquo; that &ldquo;location&rdquo;):</p>
<p>
	<font face="courier new">javascript:a={&quot;attr&quot;:&quot;val&quot;,&quot;attr2&quot;:&quot;val2&quot;}; alert(a.attr);alert(a[&quot;attr2&quot;])</font></p>
<p>
	(As an aside, what might you learn from this? Firstly, you can &ldquo;run&rdquo; JavaScript in the browser via the location bar. Secondly, the JavaScript command <em>alert()</em> pops up an alert box:-)</p>
<p>
	Note that the value of an attribute might be another object.</p>
<p>
	<em>obj={ attrWithObjectValue: { &ldquo;childObjAttr&rdquo;:&rdquo;foo&rdquo; } }</em></p>
<p>
	Another thing we can see in the Google geocoder JSON code are square brackets. These define an <em>array</em> (one might also think of it as an ordered list). Items in the list are address numerically. So for example, given:</p>
<p>
	<em>arr[ &quot;item1&quot;, &quot;item2&quot;, &quot;item3&quot; ]</em></p>
<p>
	we can locate &ldquo;item1&Prime; as <em>arr[0]</em> and &ldquo;item3&Prime; as arr[2]. (Note: the index count in the square brackets starts at 0.) Try it in the browser&hellip; (for example, <font face="courier new">javascript:list=[&quot;apples&quot;,&quot;bananas&quot;,&quot;pears&quot;]; alert( list[1] );).</font></p>
<p>
	Arrays can contain objects too:</p>
<p>
	<em>list=[ &quot;item1&quot;, {&quot;innerObjectAttr&quot;:&quot;innerObjVal&quot; } ]</em></p>
<p>
	Can you guess how to get to the <em>innerObjVal</em>? Try this in the browser location bar:</p>
<p>
	<font face="courier new">javascript: list=[ &quot;item1&quot;, { &quot;innerObjectAttr&quot;:&quot;innerObjVal&quot; } ]; alert( list[1].innerObjectAttr )</font></p>
<h3>
	Making Life Easier</h3>
<p>
	Hopefully, you&rsquo;ll now have a sense that there is structure in a JSON object, and that that (sic) structure is what we rely on if we want to cut down on the &ldquo;trial an error&rdquo; when parsing such things. To make life easier, we can also use &ldquo;tree widgets&rdquo; to display the hierarchical JSON object in a way that makes it far easier to see how to construct the dotted path that leads to the data value we want.</p>
<p>
	A tool I have appropriated for previewing JSON objects is <a href="http://pipes.yahoo.com/" target="_blank">Yahoo Pipes</a>. Rather than necessarily using Pipes to build anything, I simply make use of it as a JSON viewer, loading JSON into the pipe from a URL via the Fetch Data block, and then previewing the result:</p>
<p style="text-align: center;">
	<a href="http://pipes.yahoo.com/" target="_blank"><img alt="Screen_shot_2013-04-02_at_4.05.47_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-02_at_4.05.47_PM.png" /></a></p>
<p>
	Another tool (and one I&rsquo;ve just discovered) is an Air application called <a href="http://code.google.com/p/json-pad/" target="_blank">JSON-Pad</a>. You can paste in JSON code, or pull it in from a URL, and then preview it again via a tree widget:</p>
<p style="text-align: center;">
	<a href="http://code.google.com/p/json-pad/" target="_blank"><img alt="Screen_shot_2013-04-02_at_4.07.38_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-02_at_4.07.38_PM.png" /></a></p>
<p>
	Clicking on one of the results in the tree widget provides a crib to the path&hellip;</p>
<h3>
	Summary</h3>
<p>
	Getting to grips with writing addresses into JSON objects helps if you have some idea of the structure of a JSON object. Tree viewers make the structure of an object explicit. By walking down the tree to the part of it you want, and &ldquo;dotting&rdquo; together* the nodes/attributes you select as you do so, you can quickly and easily construct the path you need.</p>
<p>
	* If the JSON attributes have spaces or non-alphanumeric characters in them, use the <em>obj[&quot;attr&quot;]</em> notation rather than the dotted <em>obj.attr</em> notation&hellip;</p>
<p>
	P.S.: Via my feeds, though something I had bookmarked already, this <a href="http://www.shancarter.com/data_converter/index.html" target="_blank">Data Converter</a> tool may be helpful in going the other way&hellip; (Disclaimer: I haven&rsquo;t tried using it&hellip;)</p>
<p style="text-align: center;">
	<a href="http://www.shancarter.com/data_converter/index.html" target="_blank"><img alt="Screen_shot_2013-04-02_at_4.09.55_PM.png" src="http://datadrivenjournalism.net/uploads/Screen_shot_2013-04-02_at_4.09.55_PM.png" /></a></p>
<p>
	If you know of any other related tools, please feel free to post a link to them in the comments:-)</p>
<p>
	&nbsp;</p>
]]></description> 
      <dc:date>2013-04-07T23:50:11+00:00</dc:date>
    </item>

    
    </channel>
</rss>