5 November 2014


Data are or data is?

The Wall Street Journal has just published this blog post, in which it finally decides to move away from data "are", saying:
Most style guides and dictionaries have come to accept the use of the noun data with either singular or plural verbs, and we hereby join the majority.
As usage has evolved from the word's origin as the Latin plural of datum, singular verbs now are often used to refer to collections of information: Little data is available to support the conclusions.

Otherwise, generally continue to use the plural: Data are still being collected.

Here's the root of the matter: strictly-speaking, data is a plural term. Ie, if we're following the rules of grammar, we shouldn't write "the data is" or "the data shows" but instead "the data are" or "the data show".
In Latin, data is the plural of datum and, historically and in specialized scientific fields , it is also treated as a plural in English, taking a plural verb, as in the data were collected and classified . In modern non-scientific use, however , despite the complaints of traditionalists, it is often not treated as a plural. Instead, it is treated as a mass noun, similar to a word like information, which cannot normally have a plural and which takes a singular verb. Sentences such as data was (as well as data were ) collected over a number of years are now widely accepted in standard English.
The official view from the Office for National Statistics takes the traditional approach. The ONS style guide for those writing official statistics says:
The word data is a plural noun so write "data are". Datum is the singular.
Andrew Garratt of the Royal Statistical Society says the debate goes back to the 1920s - and reared its head recently with some heated discussion in the Society's newsletter. "We don't have an official view," he says. "Statisticians of a certain age and status refer to them as plural but people like me use it in the singular." National Geographic magazine has debated it too.
So, over to Guardian style guru David Marsh, who makes the rules in these parts about language use. He says:
It's like agenda, a Latin plural that is now almost universally used as a singular. Technically the singular is datum/agendum, but we feel it sounds increasingly hyper-correct, old-fashioned and pompous to say "the data are".
Data takes a singular verb (like agenda), though strictly a plural; no one ever uses "agendum" or "datum"
According to Professor Michael Swan:
Ok, well I say data and um I say both I say "Data is" or "Data are" but probably mostly "is". Originally, data was a plural noun, it comes from a Latin word that means things which are given, and that’s plural. The singular of that is datum or datum (different pronunciation). But English speaking people mostly don’t know Latin and so not everybody recognised the word was supposed to be plural. 
It looks singular to an English speaker, so more and more people came to use it as a singular and now that’s quite normal. At the beginning "The data is" was definitely a mistake but it’s so widely used now that it’s no longer possible to say that it’s a mistake. It’s become part of the language. This is actually quite a common reason for language change. People make mistakes and the mistakes are repeated by other people, and finally they no longer count as mistakes. It happens a lot with vocabulary.

No comments:

Post a Comment