Today, access to crucial services, such as loans, health care or housing often depends on the results of the processing of large data sets based on machine learning (here brief: Big Data analytics). Therefore, the accuracy of the generated results is highly important — a fact that also the General Data Protection Regulation (GDPR) recognises. While it is widely acknowledged that the data sets which are used as a basis for processing (input) have to be accurate, the results that are generated with Big Data analytics (output) are supposedly unsuitable for a test of accuracy. The category of accuracy, so the popular argument, can only be applied to objective data. Data that is generated in Big Data analytics, however, is in nature predictive and uncertain. Consequently, data protection rights linked to the category of accuracy shall not apply to such subjective data. In this article, conversely, I argue that data protection law should also oblige information inferred in Big Data analytics to be accurate. Rather than arguing over the application of the rights of the GDPR where most needed, research should engage in finding adequate ways to do so.
An article by Niklas Eder LL.M. (King’s College London), Maîtr. en Droit (Paris II, Panthéon Assass)
In Big Data analytics, machine learning algorithms are used in order to cluster individuals into groups. Subsequently, on the basis of the clustering, predictions about the individuals are generated. These predictions, again, are translated into evaluations.
The evaluations cover a great variety of fields. Characteristics, such as the trust- and creditworthiness, the capacity to withstand stress, the suitability for a job vacancy or even the dangerousness of a person are judged on the basis of the outcomes of the data processing. The evaluations received have a great effect on a person’s life. They determine on the conditions under which a person has access to important services, such as bank loans; they determine on the success of job applications; and they even can have an influence on whether a sentence is suspended or not. These evaluations constitute data — more precisely inferred information or generated knowledge. In spite of the great effects it has on people’s lives, under the GDPR, the legal status of the inferred information and the generated knowledge is contested.
The Case Law of the ECJ: Inferred Information is Personal Data
The ECJ has not yet explicitly decided on the qualification of data that has been generated in Big Data analytics. However, as Wachter and Mittelstadt have carved out, the court has rendered verdicts in different contexts that are highly relevant for the legal qualification of inferred data: In two recent cases, the ECJ decided on the legal quality of subjective information.
In the first judgement, YS and M and S from 2014, the ECJ ruled on a case in which individuals invoked rights under the Directive 95/46, the predecessor of the GDPR, in order to access internal legal analyses — in other words: legal evaluations — that an administration had filed on their cases. The ECJ ruled, that the legal analyses did not constitute personal data under the definition of the Directive 95/46 and that thus, the directive was not applicable.
In the second judgement, Nowak from 2017, the ECJ overturned its judgement from 2014. The ECJ had to decide on a case in which an individual invoked his rights under the Directive 95/46 in order to challenge the evaluation he had received in an examination. In this case, the ECJ took the position that the comments of the corrector and the evaluation contain information relating to an individual and thus qualify as personal data. The ECJ ruled that the notion of personal data „is not restricted to information that is sensitive or private, but potentially encompasses all kinds of information, not only objective but also subjective, in the form of opinions and assessments, provided that it ‘relates’ to the data subject.“
These two decisions are relevant to Big Data analytics, because the results produced in Big Data analytics constitute evaluations, too. Endorsing the latest decision of the ECJ on the qualification of evaluations, Nowak, one can assume that the output data of Big Data analytics — individualized evaluations — constitutes personal data, too. This view stands in line with a mantra that the ECJ keeps repeating in its definition of personal data, according to which „the scope of Directive 95/46 is very wide and the personal data covered by that directive is varied„.
Qualifiying Big Data evaluations as personal data brings with it the applicability of the general principles of the GDPR as well as the rights that the GDPR grants to data subjects in order to control their data.
The Problem of the Accuracy of Subjective Data
This conclusion is a step forward in guaranteeing efficient legal protection against Big Data analytics. However, obstacles arise when taking a closer look at the content of important principles and rights of the GDPR. The notion of accuracy is relevant to both, the legality of data processing in general, and the capacity of the individual to control his or her data. And the application of the category of accuracy to inferred information and evaluations is generally denied.
The requirement for data to be accurate is already laid down in principles relating to personal data in the GDPR. According to Art 5 (1) d, personal data shall be accurate. If data processes are based on inaccurate data, individuals can invoke a number of rights in the GDPR, including its erasure (Art 17 (1) d and Art 16) and rectification (Art 16).
The principle of accuracy and the equivalent rights constitute central guarantees of the GDPR. Nonetheless, and despite the qualification of evaluations as personal data, the possibility to challenge Big Data analytics on the basis of inaccuracy is contested. The category of accuracy, it is widely argued in the major commentaries on the GDPR, only applies to facts and not to evaluations. Only objective data can be falsified or verified, it is argued, while the accuracy of subjective data cannot be tested. Thus, so the argument, the principles and rights which presuppose a test of the accuracy of data have to be interpreted in a way which excludes unverifiable data, such as evaluations.
This position, it is claimed, stands in line with the jurisprudence of the ECJ in Nowak, who argued that „(o)f course, the right of rectification (…) cannot enable a candidate to ‘correct’, a posteriori, answers that are ‚incorrect’“. Hence, the applicability of the GDPR, so the stance, does not bring with it the possibility to invoke rights and apply principles which presuppose a test of accuracy.
In Favour of a Reviewing the Accuracy of Inferred Information
The position, that the rights related to the accuracy of data do not apply to evaluations produced in Big Data analytics relies on an incorrect reading of the ECJ’s latest jurisprudence.
In fact, in Nowak, the ECJ has decided, that the right to rectification does apply to the evaluations of the examination — even though in a limited manner. The ECJ argued that „it is clear that the rights of access and rectification (…) may also be asserted in relation to the written answers submitted by a candidate at a professional examination and to any comments made by an examiner with respect to those answers.“ The ECJ recognizing that evaluations can be subjected to a test of accuracy is another step forward in the establishment of protection standards against evaluations. However, in his subsequent explanations, the ECJ has limited the scope of the review to a formal test. The ECJ named a number of situations in which evaluations may be inaccurate. In all of the situations the Court named, formal mistakes lay at the basis of the inaccuracy of the evaluations. The ECJ argued that evaluations may be inaccurate „due to the fact that, by mistake, the examination scripts were mixed up in such a way that the answers of another candidate were ascribed to the candidate concerned, or that some of the cover sheets containing the answers of that candidate are lost, so that those answers are incomplete, or that any comments made by an examiner do not accurately record the examiner’s evaluation of the answers of the candidate concerned.“ Hence, in Nowak, the ECJ is not reviewing the accuracy of the substance of the evaluations, but only their formal correctness. To sum up, in Nowak, the ECJ decided that evaluations are reviewable, but that the review is only a partial, a formal one.
This jurisprudence can be transferred to the evaluations produced in Big Data analytics, but again, only partially. While it is convincing to review the accuracy of inferred information, it is not convincing, to limit the control to formalities.
Instead, inferred Data should be fully reviewable for two reasons: First, because of its implications on fundamental rights, secondly, because of the opinion of the Working Group 29.
Firstly, as it has already been established, Big Data analytics has enormous implications on fundamental rights. The European fundamental rights regimes, protecting the right to data protection and privacy, are higher-ranking law and thus have to guide and, if necessary, determine on the interpretation of the lower-ranking GDPR. It is rather obvious, that the notion of accuracy has to be interpreted in a way that guarantees the protection of fundamental rights in Big Data analytics.
Secondly, one should take into account the convincing opinion of the former advisory group of the EU, the Working Group 29. Contrary to the ECJ, it has already taken an explicit stance on the question whether Art 16 GDPR — the right to rectify — shall be applied to Big Data analytics. In its ‚Guidelines on Automated individual decision-making and Profiling‘ the group argued that Art 16 shall apply where „an individual is placed into a category that says something about their ability to perform a task.“ The Working Group 29 argues in favour of an encompassing control of the accuracy of the output data of Big Data analytics, rather than restricting it to a partial control: „Individuals may wish to challenge the accuracy of the data used and any grouping or category that has been applied to them.“ The Working Group 29 underlines its claim to apply the right to rectification to scores and profiles with the blunt, but convincing argument that „(p)rofiling can involve an element of prediction, which increases the risk of inaccuracy.“ Thus, evaluations generated in Big Data analytics should not be partially, but fully reviewable. The principle of Art 5 (1) d of the GDPR, that data shall be accurate, should not only apply to the data which is used as a basis for processing, but also to the outcome of the data processing: the inferred information. Not only the input data, but also the output data should satisfy standards of accuracy.
Ways to Test the Accuracy of Subjective Data
It is certainly not an obvious or easy task to test the accuracy of Big Data evaluations. However, it is not impossible. In a first step, one can differentiate between reasonable and unreasonable evaluations. It is possible to tell coherent and convincing data processing techniques apart from arbitrary and random techniques. In a second step, standards of reason can then be applied to subjective information, too.
In the meantime, first persuading models to test the accuracy of Big Data evaluations have already been suggested. With their „Right to Reasonable Inferences„, Wachter and Mittelstadt have presented a detailed suggestion of the shape this test may take. In the future, both legal and technical research should focus on the further elaboration of reasonable ways to apply the principles and rights of the GPRR to Big Data analytics, rather than denying the very application of these principles where protection is most needed — and where it will be even more needed in the future.
On the author: Niklas Eder has studied law in Heidelberg, Berlin, Paris and London. He is currently a PhD student and legal researcher at the chair of Prof. Dr. Mattias Wendel at the University of Bielefeld, Germany. Besides his scientific engagement with Big Data, AI and privacy, he writes for the German newspaper Frankfurter Allgemeine Zeitung.
Published under licence CC BY-NC-ND.