Drowning by numbers. Picture by Jorge Franganillo, Flickr Creative Commons
Data is basically information – a set of quantitative or qualitative values.
As I said in my introduction, the term is used as a mass noun i.e. “the data shows…” (although “the data show…” is also correct).
An individual data point or value represents a piece of information.
Data is usually collected by measurement and visualised by images such as charts or graphs.
Raw data refers to unprocessed information in the form in which it was originally collected. This can be from scientific experiment (based on observation under laboratory conditions) or simply from the field.
However it is collected and in whatever form, it is first necessary to recognise exactly what type of data you are dealing with. The diagram below should give the reader a general idea of the different data types. It is just one way to look at data, and I hope it is clear.
Initially you can make two broad distinctions: whether the data is continuous or discrete.
Continuous data is always quantitative or numerical. It has a numerical value that may be an integer, ratio, or interval. This means it can be a whole number (1,2,3…etc.), or any number from zero to infinity with all decimal values in-between.
Discrete data is also called categorical data as it refers to data arranged in categories. Categorical data can be ordered or ranked such as first, second, third etc or mild, moderate, severe – this is therefore ordinal data.
Alternatively, categorical data may (often) be unranked such as colours of cars. This is nominal data. Both nominal and ordinal data do not have any numerical value – this makes them non-parametric. This is important when it comes to statistics, the subject that gives the data meaning and value.
There you have it – all the types of data. Although journalists probably don’t need to get bogged down in the details, it will always be handy to recognise exactly what you’re dealing with. This is especially true for science journalists I feel.
In case I don’t manage an easy-to-understand statistics post, it is worth me mentioning how we can handle data (as journalists or scientists). There are three broad stages…
- Collection: may be from surveys or (scientific) studies; for journalists, collection is usually from a source
- Presentation: usually in graphic format, with measurement of certain markers e.g. maximums, minimums, averages.
- Interpretation: using statistics is a major part of results analysis, although journalists perhaps rightly look to the expert discussion of the results too.
My aim is to understand the process better. I hope yours is too.