The Privacy Dilemma for Official Statistics in A Big Data World
Steve MacFeely*
Adjunct Professor, Centre for Policy Studies, University College Cork, Ireland
Submission: April 20, 2018; Published: July 10, 2018
*Corresponding author: Steve MacFeely, Adjunct Professor, Centre for Policy Studies, University College Cork, Cork, Ireland; Email: steve.macfeely@unctad.org
How to cite this article: Steve M. The Privacy Dilemma for Official Statistics in A Big Data World. Biostat Biometrics Open Acc J. 2018; 7(5): 555722. DOI: 10.19080/BBOAJ.2018.07.555722
Keywords: Big data; Potential; Evangelism; Paradigm destroying; Just hype; Ecosystem; Confidential data; Decision making tree; Legislation; Protect privacy; Statistical agency; Privacy; Official statistics; Anonymisation; Born digital; Forfeit; Straightforward; Tantalizing opportunities; Cross-cutting issues
Abbrevations: NSOs: National Statistics Offices; NSA: National Security Agency; IRS: Internal Revenue Service
Short Communication
To use or not to use - that is the question
Over recent years the potential of big data for government, for business, for society has excited much comment, debate and even evangelism. Described as the ‘new science’ with all the answers and a paradigm destroying phenomena of enormous potential big data are all the rage [1,2]. Official statisticians, already with a long history of using non-survey data, which are often very large in terms of volume, must decide whether big data is really something new and useful or just hype. On the one hand, some argue that big data needs to be seen as an entirely new ecosystem comprising new data, new tools and methods whereas others argue to the contrary that big data is just hype and that big data are just Data [3,4]. In deciding whether big data can be useful for official statistics, National Statistics Offices (NSOs) must keep the safeguarding of confidential data at the top of their decision making tree.
The importance of confidentiality
For official statistics, safeguarding the confidentiality of individual data is sacrosanct and is enshrined in Principle 6 of the United Nations Fundamental Principles of Official Statistics [5], which states ‘Individual data collected by statistical agencies for statistical compilation, whether they refer to natural or legal persons, are to be strictly confidential and used exclusively for statistical purposes.’ The UN Handbook of Statistical Organization [6], too, ‘underscores repeatedly the requirement that the information that statistical agencies collect should remain confidential and inviolate. The Scheveningen Memorandum [7] (Recognize that the implications of big data for legislation especially with regard to data protection and personal rights (e.g. access to big data sources held by third parties) should be properly addressed as a matter of priority in a coordinated manner) prepared by the Directors General of NSOs in the European Union identified the need to adapt statistical legislation in order to use big data - both to secure access but also protect privacy. The failure to treat individual information as a trust would prevent the statistical agency from functioning effectively. For a NSO to function, confidentiality of the persons and entities for which it holds individual data must be protected i.e. a guarantee to protect the identities and information supplied by all persons, enterprises or other entities, and guarantee that their data are used for statistical purposes only. In short, everyone who supplies data for statistical purposes does so with the reasonable presumption that their confidentiality will be respected and protected. In most countries, safeguarding confidentiality is enshrined in national statistical legislation. But with the increased volumes of big data being generated, and the potential to match those data, greater attention must be paid to data suppression techniques to ensure confidentiality can be safeguarded.
Is privacy really dead?
The emergence of big data is forcing many challenging questions to be asked, not least with regard to privacy and confidentiality. Mark Zuckerberg, the founder of Facebook, famously claimed that the age of privacy is over [8]. Scott McNealy, CEO of Sun Microsystems, too famously asserted that concerns over privacy are a ‘red herring’ as we ‘have zero privacy’ [9]. Many disagree and have voiced concerns over the loss of privacy [10,11]. Fry [12] has likened developments with regard to big data and the loss of privacy to the opening of Pandoras Box - what he terms, Pandora 5.0. The introduction in Europe of the new General Data Protection Regulation which comes into effect in 2018, reinforcing citizen’s data-protection rights, including among other things the right ‘to be forgotten’, suggests that privacy is still a real concern [13]- at least in some regions of the world. By contrast, in the United States, users who provide information under the ‘third-party doctrine’ i.e. to utilities, banks, social networks etc. should have ‘no reasonable expectation of privacy.’
The dilemma for official statistics
This introduces two new challenges for official statisticians: one technical and one of perception. The technical challenge arises from the availability of large, linkable datasets which present a problem thought to have been solved in traditional statistics – anonymisation. But big data, combined with the enormous computing power available today, it is clear that simply removing personal identifiers and aggregating individual data is not a sufficient safeguard. A paper by Ohm [14] outlining the consequences of failing to adequately anonymize data graphically illustrates why there is no room for complacency. Thus, a problem that had been solved in the context of traditional official statistics must now be re-solved, in the context of a richer and more varied data ecosystem. The changing nature of perception is arguably a trickier problem. What if Zuckerberg and McNealy are correct and future generations are less concerned about privacy? There appears to be some evidence to suggest that they may be correct. It seems there are clear inter-generational differences in opinion vis-a-vis privacy and confidentiality, where those ‘born digital’ (roughly those born since 1990) are less concerned about disclosing personal information than older generations [15]. Taplin [16] ponders this, musing ‘It very well may be that privacy is a hopelessly outdated notion and that Mark Zuckerberg’s belief that privacy is no longer a social norm has won the day.’ If this is so, what are the implications for official statistics and anonymisation? If other statistical providers, not governed by the UN fundamental principles, take a looser approach to confidentiality and privacy, it may leave official statistics in a relatively anachronistic and disadvantaged position vis-a-vis other data providers. But moving away from or discarding principle 6 of the UN Fundamental Principles for Official Statistics would seem to be a very risky move, given the importance of public trust for NSOs.
A worthwhile trade-off?
Taplin [16] argues that we trade our privacy with corporations in return for innovation or benefits, ‘but it is one thing to forfeit our privacy as individuals to a company that we believe is delivering a needed service and another to open our personal lives to the federal government. MacFeely [17] has warned that if the benefits of privacy are insufficiently clear to the public or policy makers, then it leaves official statistics vulnerable, and possibly facing a precarious and bleak future. Rudder [18] highlights this challenge too noting that ‘the fundamental question in any discussion of privacy is the trade-off - what you get for losing it.’ Like Taplin, Rudder also argues that the trade-off benefit with the private sector is clear - better targeted ads! He argues that ‘what we get in return for the government’s intrusion is less straightforward.’ McNealy too, who seems unconcerned about the lack of privacy in the private sector, takes a very different attitude when it comes to government, saying ‘It scares me to death when the NSA (National Security Agency (an intelligence agency of the United States Department of Defense)) or the IRS (Internal Revenue Service (the tax authorities in the United States)) know things about my personal life and how I vote. Every American ought to be very afraid of big government’ [9]. Curiously, while there is a real fear of government Big Brother, there appears to be few concerns regarding the emergence of a corporate Big Brother. A challenge for official statistics is how to put clear blue water between the NSO and the other institutions of government from the perspective of data sharing but highlight the common benefits of official statistics as a public good. To some extent there is ideology at play here, where a neo-liberal agenda is pushing to minimize the role of the public sector, but it also illustrates the challenge facing national governments and their agencies where their contribution to the wellbeing of economies and societies is poorly understood.
Concluding Thoughts
Big data, if they can be harnessed properly, would appear to offer some tantalizing opportunities - not least improved timeliness and the chance to better align public and official statistics with policy needs. The possibilities of matching different digital data sets may allow us to dramatically improve our understanding of complex, cross-cutting issues, such as, the impacts of life style on health. Advances, such as, the Internet of Things (In 2006 there were some 2 billion ‘smart devices’ connected to each other. By 2020 it is projected that this ‘internet of things’ will compromise of somewhere between 30 and 50 billion devices. Goodman notes the result will be 2.5 sextillion potential networked object-to-object interactions) and biometrics will all surely present opportunities to compile new and useful statistics. As yet, the implications of this ‘big data bang’ for statistics is not immediately clear, but one can envisage a whole host of new ways to measure and understand the human condition. In relative terms, big data are still new. At the turn of the century, Scott Cook, the CEO of Intuit mused ‘we’re still in the first minutes of the first day of the Internet revolution [19]. Almost two decades later we are probably only in the first hours. Many norms and standards are yet to evolve. But it does not take a huge leap of imagination to foresee that in the not too distant future, the misuse of big data will be at the heart of a serious human rights abuse scandal. Official statistics must take the ethical dimension seriously. Just because something can be measured doesn’t mean it should be. Norms and cultural values regarding privacy may be changing, but in assessing whether and how to use big data, NSOs and international organizations must carefully consider the human rights of citizens in this digital age.
References
- https://whatsthebigdata.com/2012/06/29/big-data-quotes-of-the-week-11/
- Stephens Davidowitz (2017) Everybody lies-What the internet can tell us about who we really are. Bloomsbury, London, UK.
- Letouzé E, Jütting J (2015) Official Statistics, Big Data and Human Development.’ Data Pop Alliance, White Paper Series, USA.
- Thamm A (2017) Big Data is dead. LinkedIn.
- United Nations (2014) Resolution adopted by the General Assembly on 29 January 2014 - Fundamental Principles of Official Statistics. General Assembly, A/RES/68/261.
- United Nations (2003) Handbook of Statistical Organization-The Operation and Organization of a Statistical Agency. Department of Economic and Social Affairs Statistics Division Studies in Methods Series F No. 88. United Nations, New York, USA.
- European Commission (2013) Scheveningen Memorandum on Big Data and Official Statistics’’. Adopted by the European Statistical System Committee on 27 September 2013.
- Kirkpatrick M (2010) Facebook’s Zuckerberg Says the Age of Privacy is Over.
- Noyes K (2015) Scott McNealy on privacy: You still don’t have any.’ PC World, IDG News Service.
- Pearson E (2013) Growing Up Digital. Presentation to the OSS Statistics System Seminar Big Data and Statistics New Zealand: A seminar for Statistics NZ staff, Wellington.
- Payton T, Claypoole T (2015) Privacy in the Age of Big Data-Recognizing the Threats Defending Your Rights and Protecting Your Family.’Lanham, MD: Rowman & Littlefield.
- Fry S (2017) The Way Ahead. Lecture delivered on the 28th May 2017, Hay Festival, Hay-on-Wye.
- European Parliament (2016) Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation).
- Ohm P (2010) Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review 57: 1701-1777.
- European Commission (2011) Attitudes on Data Protection and Electronic Identity in the European Union. Special Eurobarometer No. 359, Wave 74.3-TNS Opinion and Social.
- vTaplin J (2017) Move Fast and Break things-How Facebook, Google and Amazon cornered culture and undermined democracy. Little, Brown and Company, New York, USA.
- MacFeely S (2016) The Continuing Evolution of Official Statistics: Some Challenges and Opportunities. Journal of Official Statistics 32(4): 789-810.
- Rudder C (2014) Dataclysm: What our online lives tell us about our offline selves. London, UK.
- Levington S (2000) Internet Entrepreneurs Are Upbeat Despite Market’s Rough Ride. The New York Times.