Baby steps: A slow start but data journalism in India is gathering pace


Data journalism is at a nascent stage in India. There are currently very few websites that are operating in this area, and news organisations still have a long way to go to reach the global benchmark. The good news is that it has picked up and caught the attention of the Indian government, news organisations, analytics companies, and data visualisation companies. Data can be used to bring about accountability and transparency among people and that awareness is catching people’s attention. It is also fast becoming an important part of the armory of newsrooms in India and plays an important role in multimedia and digital reporting. To find out more about data journalism efforts in India, I spoke with leading data enthusiasts from five start-up initiatives.

1. How India Lives, with John Samuel Raja

John Samuel Raja is co-founder of How India Lives, a data search engine for public data. Prior to this, he worked as financial journalist for 11 years for The Economic Times, Mint, Business Standard and Outlook Business. In 2012, John was selected as Tow Knight Fellow in the City University of New York’s entrepreneurial journalism program. He holds a Masters in Econometrics. Follow on Twitter: @johnraja.

There are two types of data journalism: the first looks at data as a unit of information, processes the data and get interesting analysis out of raw data; and the second uses published data reports like national crime statistics or government survey reports to write stories around them. The second type of data journalism has picked up in India quite rapidly, with many media houses - both traditional and new ones - doing such stories on a regular basis. This is welcome because it will bring insights into reports and surveys that were not previously reported extensively. However, data journalism that involves working with raw data and converting that data into interesting analysis hasn’t picked up. That’s because newsrooms don’t have the resources or the required training skills to execute this kind of journalism. One would need a person with programming skills who can help scrape data and arrange in a database, and then write code to visualise the data. A newsroom where a programmer works alongside journalists, and who understands the data, is needed to execute such work. With government moves to increasingly make data available, there is immense scope to do such type of data journalism. Such moves increase transparency, as we will be making it available to public so that anyone can make sense of it.

How India Lives started

How India Lives is a Delhi-based start-up. The internet-based application aims to organise a massive amount of public data on India, and make it available in a searchable, comparable and visual format. The idea is to offer something of value to everyone who uses public data, be it for decision-making – like the company executive, the government official, the researcher – or for information-seeking. It was founded by a team of journalists with more than 60 years collective experience and has a technology team that has executed large data projects for Mint, Shine.com, The Caravan magazine and others.

What ‘How India Lives’ does

How India Lives’ efforts are two-fold. The first is to make data easy to search, compare and visualise. We feel there is a huge gap between people who know how to access data, and those who don’t. We would like to bridge this gap. We do this in a number of ways, namely:

  • bringing data from different sources into a single database and making different datasets compatible so they can be cross-referenced
  • enabling tag-based search for user friendliness
  • visualising data for easy understanding

By making data available from different sources in an easily searchable manner, we believe more journalists will start using data. By making public data datasets available through data interactives, greater numbers of ordinary citizens can also access stories they are interested in, and be informed. For example, we used vehicle theft data for Delhi and visualised it like a dashboard. We believe the way forward is to build communities and use them to gather data.


Image: One way to search on How India Lives.

Challenges we face

The biggest is data cleaning. Data is in difficult formats - often in scanned PDFs, so it takes the majority of our time just to clean the data. Second, a significant majority of people in India only have access to the internet via mobile devices, and most smartphones have limited storage capacity. This means there is no space to download more apps. So the importance of a mobile-friendly website is vital. And on top of that, presenting data interactives and visualisation in mobile-friendly formats is difficult.

2. Health Analytics, with Syed Nazakat

Syed Nazakat is an award-winning journalist and editor-in-chief of the Centre for Investigative Journalism, a non-profit organisation he founded to promote the cause of watchdog journalism in India. In 2015 he also set-up a data journalism initiative focussed on Indian healthcare, called Health Analytics India. Follow on Twitter: @SyedNazakat.

Health Analytics India is a data journalism initiative dedicated to provide the most illuminating reporting on healthcare. We’re not going to blindly accept the data, but we’re not going to be blind to it either. We crunch the numbers, investigate the issues behind the numbers and turn them into facts and figures based stories that matter to people. Our aim is to make Health Analytics India a single-point source of healthcare data and information in India.

Our initial challenge was to set-up the infrastructure and to build the team. Designing and curating website for data visualisation and analytics was also quite a challenge. Health, as a subject, was chosen because it touches us all. And there is so much scope to improve the health reporting in India. Millions of people are dying in this country from the diseases, which are totally preventable. Lack of health facilities is such a shameful story. Yet the stories hardly make news. Our challenge remains to build engagement with audiences and to present data to them in a way that will help them to understand complex stories. We're conscious of data overload so our daily challenge is to handle and interpret large data.

One of the biggest challenges is to find valid and up-to-date data. We collect data on the healthcare system by searching for the different studies or reports of the union health ministry, or from 29 state health ministries, or from other international organisations such as WHO and research institutes. Sometimes we’re surprised by how much you are able to find. But then there are vast data gaps, which leave you with an impression that nobody really knows anything concrete about healthcare data in India.

Recently while doing a story about death of rabies, we were told that the union health ministry does not collect data on its own. The data comes from 29 states and seven union territories. But in many states there were no cohort studies; community-based studies; and until 2014 one of India’s biggest states Punjab hadn’t even a department to collect rabies data.  Elsewhere, in many states, there are proper rules and guidelines about data collection – but there is no data.

Data journalism is one way in which the media can help the cause of information, transparency, watchdog reporting and quite legitimately hold the government to account. However the lack of data and loopholes in data collection are problematic.

3. Factly, with Rakesh Dubbudu

Rakesh Dubbudu is an engineer by education and an activist by passion. He is the founder of Factly. He has been working on issues related to Right to Information (RTI) for a decade. He has authored many reports on the status of the RTI legislation and has been a Government of India Fellow on RTI. He is also associated with various organisations working in the area of transparency and accountability in governance and has ground level experience of issues. Follow on Twitter: @rakeshdubbudu.

Factly is a platform that brings various aspects of life that directly or indirectly affects the common man but with one major difference: each news story on Factly is backed by factual evidence/data that is either available in the public domain or that is collated/gathered/collected using tools such as the Right to Information (RTI) Act.

Motivation behind launching Factly

In the last 10 years of my experience in this space, I came across a lot of government (public) data. A lot of that data is extremely important to the public. But because this data was not easily accessible and even in cases where it was, it was not easily understood. Hence public data was anything but ‘public’. Misinformation and rumour also frequently trend on social media, but little of it is substantiated. Wrong data/information or a morphed picture that might be sensational has a greater chance of being shared by many people on social media than genuine information. This was the motivation behind Factly, to make public data more meaningful to the public and encourage them to look for facts or genuine data.

All our data comes from government sources such as government websites, answers to questions asked in parliament, government reports and for some, we use RTI. We zero in on a topic and then find relevant data. We also did not want to stop at making data alone meaningful; we wanted to make government information in general more meaningful to the public. We plan to explain policies/laws that are relevant to people in simple language that can be understood by everyone. We are already doing that to a certain extent.

How is it going to be helpful to people?

We have three aims when producing stories based on official data. The first is to make people more knowledgeable about issues with relevant data and information. That is what Factly is doing right now by packaging data in easily understandable stories, well designed infographics and easy to understand visualisations. The second is to mobilise people using this knowledge; and the third is to engage with them. We are currently at the first stage and are working towards introducing tools and other features that can take us to the second and third stages. There are many cases where our data stories have been very useful to people. By mobilise, we essentially wish to inspire people to take action or engage with the system based on the data or information. It could be a local issue to do with how their funds were spent or a state or national issue. We do not want to lead movements ourselves, but be a force that can inspire the activism and subsequent engagement with the system.

For example, we explained how petrol and LPG are priced, which was a big mystery to many. We also explained surrounding planning and land purchases so that the public could easily understand them. Examples of these stories can be found here and here.


Image: A Factly infographic on petrol prices.

Future plans?

We are working on building some data tools and information products. We hope to be ready with the first ones by the end of this year. With all these, we want to expand our base, encourage more people to engage and mobilise for causes based on this data. We are also looking at options to take this offline for those who do not have access to the internet by partnering with a few non-profits that work on the ground. For example we would like to make templates available for stories such as the performance of MPs (see this link); or local spending on village budgets so that anyone can select their MP or village and download copies of the same. These can be printed and be used for offline work. We wish to do similar work in other areas such as governments and schools and tie up with NGOs that work in those local areas.

4. IndiaSpend, with Govindraj Ethiraj

Govindraj Ethiraj, is founder of IndiaSpend and was previously the founding editor-in chief of Bloomberg TV India, a 24-hours business news service launched out of Mumbai in 2008 and a partnership between Bloomberg LLP and the UTV Group in India. Prior to setting up Bloomberg TV India, he worked with Business Standard as editor (new media). He also worked with CNBC-TV18 and The Economic Times. Follow on Twitter: @govindethiraj.

IndiaSpend is the country’s first data journalism initiative whose vision is to improve the quality of public discourse by using data to write stories in areas of public interest. In March 2014, the same team also launched www.factchecker.in, a dedicated fact checking initiative that examines statements and assertions made by individuals and organisations in public life for both accuracy and context. Both initiatives have a strong social media presence. IndiaSpend’s articles are now distributed to India’s leading newspapers, magazines, television stations, online dailies and wire services and area usually cited in at least a dozen major media platforms daily. IndiaSpend/FactChecker are registered as non-profit organisations. They analyse government policy on issues such as the economy, education, health, agriculture and security. The team have reported on a number of important stories, such examples include:

  • Why child rapes have soared 151 per cent in five years: This article looks at the sharp increase in registered child rapes in India and the states that have had the highest cases. In some nine of ten cases, victims knew their attackers. The article is available here.
  • How 46 million Indians are being slowly poisoned: This article explores why millions of Indians are exposed to contaminated water, which could lead to serious health issues such as crippling skeletal damage, kidney degeneration, cirrhosis of the liver and cardiac arrest. The article is available here.

5. DataMeet India, with Nisha Thompson

Nisha Thompson is co-founder of DataMeet India. She has a background in online community organising and has worked for the Sunlight Foundation in Washington DC, with online communities to use US government data to hold elected officials accountable. She moved to Bangalore in October 2010 where contributed to a research report on open government data in India for the Centre for Internet and Society. Follow on Twitter: @fakenisha.

DataMeet is a community that started in 2011 on a Google group, and now has more than 1,000 people throughout India coming together to discuss data issues and civic-minded topics. DataMeet has 6 chapters - Bangalore, New Delhi, Mumbai, Ahmedabad, Pune, and Hyderabad - who meet monthly offline to discuss, learn, share experiences and skills and also organise events.

DataMeet hosted the first Bangalore Open Data Camp in India in 2012, and since then the yearly Open Data Camp has been a great way to bring together people in the data civic space. Since then Hyderabad and New Delhi have had Open Data Camps to organise communities in their cities. These camps are great venues to discuss large ideas and problems encountered in different sectors. At the 2014 Bangalore camps that focused on elections, the community came together and shared Assembly and Parliamentary boundary shape files. The 2015 camp focused on education, and it brought civil society and government together to try to work out issues around education data in India.

DataMeet also has worked with the government on how to open more data in India. After the passage of the National Data Sharing and Accessibility Policy in 2012, we gave them feedback on standards and also implementation issues. When Data.Gov.In - India’s official open data portal – launched, we worked with the officials to make sure they were aware of what high priority datasets were. For example we had requested that the Census be available and currently it is available on the portal.

Part of the open data movement is to raise data literacy. DataMeet has hosted and supported data training events for journalists, with partners like Oorvani Foundation and The Hoot. We also got involved in data expeditions on urban data along with Hyderabad Urban Labs, and other events with Field of View and IIIT Bangalore. These events help introduce data concepts and skills to people who want to learn more and use data more often.


Clearly ‘better late than never’ is an apt description for India with respect to data journalism. Many of the initiatives that have started are making their presence felt in their various areas of expertise. However, this is just the start and as people start recognising the importance of data, it will soon be the sought after area for decades to come.

This article is an edited extract from Data Journalism: Inside the global future. Buy the full book here.

Image: NASA.