The Privacy Dilemma for Official Statistics in A Big Data World

Over recent years the potential of big data for government, for business, for society has excited much comment, debate and even evangelism. Described as the ‘new science’ with all the answers and a paradigm destroying phenomena of enormous potential big data are all the rage [1,2]. Official statisticians, already with a long history of using non-survey data, which are often very large in terms of volume, must decide whether big data is really something new and useful or just hype. On the one hand, some argue that big data needs to be seen as an entirely new ecosystem comprising new data, new tools and methods whereas others argue to the contrary that big data is just hype and that big data are just Data [3,4]. In deciding whether big data can be useful for official statistics, National Statistics Offices (NSOs) must keep the safeguarding of confidential data at the top of their decision making tree.


Short Communication
To use or not to use -that is the question Over recent years the potential of big data for government, for business, for society has excited much comment, debate and even evangelism. Described as the 'new science' with all the answers and a paradigm destroying phenomena of enormous potential big data are all the rage [1,2]. Official statisticians, already with a long history of using non-survey data, which are often very large in terms of volume, must decide whether big data is really something new and useful or just hype. On the one hand, some argue that big data needs to be seen as an entirely new ecosystem comprising new data, new tools and methods whereas others argue to the contrary that big data is just hype and that big data are just Data [3,4]. In deciding whether big data can be useful for official statistics, National Statistics Offices (NSOs) must keep the safeguarding of confidential data at the top of their decision making tree.

The importance of confidentiality
For official statistics, safeguarding the confidentiality of individual data is sacrosanct and is enshrined in Principle 6 of the United Nations Fundamental Principles of Official Statistics [5], which states 'Individual data collected by statistical agencies for statistical compilation, whether they refer to natural or legal persons, are to be strictly confidential and used exclusively for statistical purposes.' The UN Handbook of Statistical Organization [6], too, 'underscores repeatedly the requirement that the information that statistical agencies collect should remain confidential and inviolate. The Scheveningen Memorandum [7] (Recognize that the implications of big data for legislation especially with regard to data protection and personal rights (e.g. access to big data sources held by third parties) should be properly addressed as a matter of priority in a coordinated manner) prepared by the Directors General of NSOs in the European Union identified the need to adapt statistical legislation in order to use big data -both to secure access but also protect privacy. The failure to treat individual information as a trust would prevent the statistical agency from functioning effectively. For a NSO to function, confidentiality of the persons and entities for which it holds individual data must be protected i.e. a guarantee to protect the identities and information supplied by all persons, enterprises or other entities, and guarantee that their data are used for statistical purposes only. In short, everyone who supplies data for statistical purposes does so with the reasonable presumption that their confidentiality will be respected and protected. In most countries, safeguarding confidentiality is enshrined in national statistical legislation. But with the increased volumes of big data being generated, and the potential to match those data, greater attention must be paid to data suppression techniques to ensure confidentiality can be safeguarded.
of Pandoras Box -what he terms, Pandora 5.0. The introduction in Europe of the new General Data Protection Regulation which comes into effect in 2018, reinforcing citizen's data-protection rights, including among other things the right 'to be forgotten', suggests that privacy is still a real concern [13]-at least in some regions of the world. By contrast, in the United States, users who provide information under the 'third-party doctrine' i.e. to utilities, banks, social networks etc. should have 'no reasonable expectation of privacy.'

The dilemma for official statistics
This introduces two new challenges for official statisticians: one technical and one of perception. The technical challenge arises from the availability of large, linkable datasets which present a problem thought to have been solved in traditional statistics -anonymisation. But big data, combined with the enormous computing power available today, it is clear that simply removing personal identifiers and aggregating individual data is not a sufficient safeguard. A paper by Ohm [14] outlining the consequences of failing to adequately anonymize data graphically illustrates why there is no room for complacency. Thus, a problem that had been solved in the context of traditional official statistics must now be re-solved, in the context of a richer and more varied data ecosystem. The changing nature of perception is arguably a trickier problem. What if Zuckerberg and McNealy are correct and future generations are less concerned about privacy? There appears to be some evidence to suggest that they may be correct. It seems there are clear inter-generational differences in opinion vis-a-vis privacy and confidentiality, where those 'born digital' (roughly those born since 1990) are less concerned about disclosing personal information than older generations [15]. Taplin [16] ponders this, musing 'It very well may be that privacy is a hopelessly outdated notion and that Mark Zuckerberg's belief that privacy is no longer a social norm has won the day.' If this is so, what are the implications for official statistics and anonymisation? If other statistical providers, not governed by the UN fundamental principles, take a looser approach to confidentiality and privacy, it may leave official statistics in a relatively anachronistic and disadvantaged position vis-a-vis other data providers. But moving away from or discarding principle 6 of the UN Fundamental Principles for Official Statistics would seem to be a very risky move, given the importance of public trust for NSOs.

A worthwhile trade-off?
Taplin [16] argues that we trade our privacy with corporations in return for innovation or benefits, 'but it is one thing to forfeit our privacy as individuals to a company that we believe is delivering a needed service and another to open our personal lives to the federal government. MacFeely [17] has warned that if the benefits of privacy are insufficiently clear to the public or policy makers, then it leaves official statistics vulnerable, and possibly facing a precarious and bleak future. Rudder [18] highlights this challenge too noting that 'the fundamental question in any discussion of privacy is the trade-off -what you get for losing it.' Like Taplin, Rudder also argues that the tradeoff benefit with the private sector is clear -better targeted ads! He argues that 'what we get in return for the government's intrusion is less straightforward.' McNealy too, who seems unconcerned about the lack of privacy in the private sector, takes a very different attitude when it comes to government, saying 'It scares me to death when the NSA (National Security Agency (an intelligence agency of the United States Department of Defense)) or the IRS (Internal Revenue Service (the tax authorities in the United States)) know things about my personal life and how I vote. Every American ought to be very afraid of big government' [9]. Curiously, while there is a real fear of government Big Brother, there appears to be few concerns regarding the emergence of a corporate Big Brother. A challenge for official statistics is how to put clear blue water between the NSO and the other institutions of government from the perspective of data sharing but highlight the common benefits of official statistics as a public good. To some extent there is ideology at play here, where a neo-liberal agenda is pushing to minimize the role of the public sector, but it also illustrates the challenge facing national governments and their agencies where their contribution to the wellbeing of economies and societies is poorly understood.

Concluding Thoughts
Big data, if they can be harnessed properly, would appear to offer some tantalizing opportunities -not least improved timeliness and the chance to better align public and official statistics with policy needs. The possibilities of matching different digital data sets may allow us to dramatically improve our understanding of complex, cross-cutting issues, such as, the impacts of life style on health. Advances, such as, the Internet of Things (In 2006 there were some 2 billion 'smart devices' connected to each other. By 2020 it is projected that this 'internet of things' will compromise of somewhere between 30 and 50 billion devices. Goodman notes the result will be 2.5 sextillion potential networked object-to-object interactions) and biometrics will all surely present opportunities to compile new and useful statistics. As yet, the implications of this 'big data bang' for statistics is not immediately clear, but one can envisage a whole host of new ways to measure and understand the human condition. In relative terms, big data are still new. At the turn of the century, Scott Cook, the CEO of Intuit mused 'we're still in the first minutes of the first day of the Internet revolution [19]. Almost two decades later we are probably only in the first hours. Many norms and standards are yet to evolve. But it does not take a huge leap of imagination to foresee that in the not too distant future, the misuse of big data will be at the heart of a serious human rights abuse scandal. Official statistics must take the ethical dimension seriously. Just because something can be measured doesn't mean it should be. Norms and cultural values regarding privacy may be changing, but in assessing whether and how to use big data, NSOs and international organizations must carefully consider the human rights of citizens in this digital age.