Introduction: one doc’s issues with data

I thought I might struggle with data journalism. And not just because of time constraints, or my own IT limitations. I guess it’s because I didn’t know what to expect.  Having learned the fundamentals of using data and a reasonable grasp of basic stats (during research at school, university and work), I enter the field of data journalism with an open mind but a little skepticism.

Sean MacEntee

Data Recovery.  Picture by Sean MacEntee, Flickr Creative Commons

I’ve often been irritated by stories in the press, which don’t do justice to the actual figures. “Data” seems to be thrown around in the media to add authority, a buzz-word to proclaim truth. I believe that the majority of journalists, like scientists, do a decent job when it comes to the numbers behind the stories. However from time to time, there appears to be a woeful lack of insight during the interpretation of these numbers.

Ben Goldacre’s Bad Science column / blog / book and Paul Bradshaw’s Online Journalism Blog are excellent resources for de-bunking scientific or medical myths. They highlight how journalists, politicians and scientists can mislead the public at almost every stage of research – from the methodology, to the interpretation of results.

So, where to start? Taking the advice of my journalism tutor and Dr Goldacre, I just started writing – if only to vent some of my frustrations. These mainly stem from several data stories that appear (to me, on closer reading) incomplete, misrepresented or overestimated in value.

Even the word itself – data, singular datum – causes contention. Let’s be clear: as a mass noun to signify information, it is perfectly acceptable to use data in the singular, although the (more pedantic?) academic types often prefer to acknowledge the Latin roots of the word and would say “these data show” as each piece of information is a datum.

I want people to see things as they truly are, through the objectivity that data offers. This requires both reliable sources and recording of data, and accurate interpretation. Data journalists will have their own styles and opinions, but robust data analysis should yield clear and consistent meanings.

So, here are a few things I’d like to cover: sources of data, presentation and basic analysis. The latter will not involve much in the way of statistics. I also aim to critique some data stories as well as try out software and online tools for my own data stories.

Lastly, for my blog-posts I’d like to invoke my own extension of the “KISS” acronym – now “KISSASS” =

keep it short, sweet, and simple, stupid

Short: I see that around 500 words is recommended for blog-posts, although I can’t say with any conviction what the ideal word is. “Sweet” really means selective and stimulating – one post for one (interesting) idea. And simple: where possible it should be understandable to almost everyone.

I hope it works out and I’m keen to hear your comments.

by bixentro

Picture by bixentro, Flickr Creative Commons