Академический Документы
Профессиональный Документы
Культура Документы
Privacy-Enhancing Technologies1
Simone Fischer-Hbner
Professor, Karlstad University, Karlstad/Sweden
Stefan Berthold
Karlstad University
Privacy is considered a core value and is recognized legal systems, privacy is defined as a basic human right
either explicitly or implicitly as a fundamental human that only applies to natural persons.
right by most constitutions of democratic societies. In In general, the concept of personal privacy has several
Europe, the foundations for the right to privacy of indi- dimensions. This chapter mainly addresses the dimension
viduals were embedded in the European Convention on of informational privacy, which can be defined, similarly
Human Rights and Fundamental Freedoms of 1950 as by Westin and by the German Constitutional Court in
(Art. 8) and the Charter of Fundamental Rights of the its Census decision,2 as the right to informational self-
European Union in 2009 (Art. 7 & 8). In 1980, the determination (the right of individuals to determine for
OECD recognized the importance of privacy protection themselves when, how, and to what extent information
with the publication of the OECD Privacy Guidelines [1], about them is communicated to others). Furthermore,
which served as the foundation for many national so-called spatial privacy can be defined as another dimen-
privacy laws. sion of the concept of privacy, which also covers the
“right to be let alone,” where spatial privacy is defined as
the right of individuals to control what is presented to
1. THE CONCEPT OF PRIVACY their senses [4]. Further dimensions of privacy, which
Privacy as a social and legal issue has long been a con- will, however, not be the subject of this chapter, are terri-
cern of social scientists, philosophers, and lawyers. The torial privacy, which concerns the setting of limits on
first definition of privacy by legal researchers was given intrusion into the domestic, workplace, and other environ-
by the two American lawyers Samuel D. Warren and ments (public spaces), and bodily privacy, which con-
Louis D. Brandeis in their famous Harvard Law Review cerns protecting a person against undue interference, such
chapter “The Right to Privacy” [2], in which they defined as physical searches, drug testing, or information violat-
privacy as “the right to be let alone.” At that time, the ing his or her moral sense (see [5,6]).
risks of modern technology in the form of photography Data protection concerns protecting personal data in
used by the yellow press to infringe privacy rights was order to guarantee privacy and is only part of the concept
the motivation for Warren and Brandeis’s discussion of of privacy. Privacy, however, is not an unlimited or abso-
the individual’s right to privacy. lute right, because it can be in conflict with other rights
In the age of modern computing, an early and often or legal values and because individuals cannot participate
referred to definition of privacy was given by Alan fully in society without revealing personal data.
Westin: “Privacy is the claim of individuals, groups and Nevertheless, even in cases where privacy needs to be
institutions to determine for themselves, when, how and restricted, the very core of privacy still needs to be pro-
to what extent information about them is communicated tected. For this reason, the objective of privacy and data
to others” [3]. Even though, according to Westin’s defini- protection laws is to define fundamental privacy princi-
tion, natural persons (humans) as well as legal persons ples that need to be enforced if personal data is collected,
(groups and institutions) have a right to privacy, in most stored, or processed.
1. Parts of this work were conducted within the scope of the PetWeb II
project funded by the Norwegian Research Council (NFR) and the
U-PrIM project funded by the Swedish Knowledge (KK) Foundation. 2. German Constitutional Court, Census decision, 1983 (BVerfGE 65,1).
2. LEGAL PRIVACY PRINCIPLES is of key importance for privacy protection, as the sensi-
tivity of personal data does not only depend on how
In this section, we present an overview to internationally “intimate” the details are, which the personal data are
well-accepted, basic legal privacy principles, for which describing, but is also mainly influenced by the purposes
PETs implementing these principles have been developed. of data processing and context of use. For this reason, the
These principles are also part of the general EU Data data processing purposes need to be specified in advance
Protection Directive 95/46/EC [7], which is an important by the lawmaker or by the data processor before obtaining
legal instrument for protection of privacy in Europe, as it the individual’s consent, and personal data may later not
codifies general privacy principles that have been imple- be (mis)used for any other purposes (cf. Purpose
mented in the national privacy laws of all EU member Specification and Use Limitation Principles of the OECD
states and of many other states. The principles also corre- Guidelines). The objective of privacy policy languages
spond to principles of the OECD Privacy Guidelines to and tools is to enforce this principle.
which we will also refer.
negotiation tools, privacy authorization language for relationship anonymity), which means that the relation of
enforcing negotiated policies and tools allowing users to who is communicating with whom is kept secret. Data
“track” and access their data that they released to remote minimization can also be implemented through obfuscat-
services sides ensure the technical enforcement of all pri- ing the presence of facts and events. The idea is that
vacy requirements mentioned in the section above. adversaries who are unable to detect the presence of facts
or events cannot link them to subjects. Pfitzmann and
Hansen define this privacy goal as undetectability.
4. TRADITIONAL PRIVACY GOALS OF PETS
Undetectability of an item of interest (IOI) from an attacker’s
In this section, privacy goals for achieving data minimiza-
perspective means that the attacker cannot sufficiently distin-
tion are defined, which we call traditional privacy goals
guish whether it exists or not [14].
because early PETs that were developed already in the
1980s followed these goals. Data minimization as an The strongest privacy goal in data minimization is
abstract strategy describes the avoidance of unnecessary unobservability, which combines undetectability and ano-
or unwanted data disclosures. The most fundamental nymity: Unobservability of an item of interest (IOI) means:
information that can be disclosed about an individual is G undetectability of the IOI against all subjects unin-
who he is, that is, an identifier, or which observable
volved in it and
events he is related to. If this information can be kept G anonymity of the subject(s) involved in the IOI even
secret, the individual remains anonymous. Pfitzmann and
against the other subject(s) involved in that IOI [14].
Hansen, who pioneered the technical privacy research ter-
minology, define anonymity as follows: Anonymity of a The third and last way to implement data minimization,
subject means that the subject is not identifiable within a apart from obfuscating the facts, the events (undetectabil-
set of subjects, the anonymity set [14]. ity, unobservability), or the relation between them and the
By choosing the term subject, Pfitzmann and Hansen subjects (unlinkability), is the use of pseudonyms in
aim to define the term anonymity as generally as possible. the place of subjects. Pseudonyms may be random num-
The subject can be any entity defined by facts (names or bers, email addresses, or (cryptographic) certificates. In
identifiers, or causing observable events, such as by send- order to minimize the disclosed information, pseudonyms
ing messages). If an adversary cannot narrow down the must not be linkable to the subject. The corresponding pri-
sender of a specific message to less than two possible vacy goal is pseudonymity: Pseudonymity is the use of
senders, the actual sender of the message remains anony- pseudonyms as identifiers [14].
mous. The two or more possible senders in question form Pseudonymity is related to anonymity, as both con-
the anonymity set. The anonymity set, and particularly its cepts aim at protecting the real identity of a subject. The
size, will be the first privacy metrics discussed in use of pseudonyms, however, allows maintaining a refer-
Section 5, Privacy Metrics. ence to the subject’s real identity (for accountability
An adversary that discovers the relation between a purposes [16]). A trusted third party could, for instance,
fact or an event and a subject identifies the subject. reveal the real identities of misbehaving pseudonymous
Relations cannot only exist between facts or events and users. Pseudonymity also enables a user to link certain
subjects, but may exist between facts, actions, and sub- actions under one pseudonym. For example, a user could
jects. An adversary may, for instance, discover that two reuse the same pseudonym in an online auction system
messages have been sent by the same subject, without (such as eBay) for building up a reputation.
knowing this subject. The two messages would be part of The degree of anonymity protection provided by pseu-
the equivalence relation [15] that is formed by all mes- donyms depends on the amount of personal data of the
sages that have been sent by the same subject. Knowing pseudonym holder that can be linked to the pseudonym,
this equivalence relation (and maybe even others) helps and on how often the pseudonym is used in various con-
the adversary to identify the subject. Pfitzmann and texts/for various transactions. The best privacy protection
Hansen define the inability of the adversary to discover can be achieved if for each transaction a new so-called
these equivalence relations as unlinkability. transaction pseudonym is used that is unlinkable to any
other transaction pseudonyms and at least initially unlink-
Unlinkability of two or more items of interest (IOIs (subjects,
able to any other personal data items of its holder (see
messages, actions, . . . )) from an attacker’s perspective means
also [14]).
that within the system (comprising these and possibly other
items), the attacker cannot sufficiently distinguish whether these
IOIs are related or not [14]. 5. PRIVACY METRICS
A special type of unlinkability is the unlinkability of a Privacy metrics aim to quantify the effectiveness of
sender and recipient of a message (or so-called schemes or technologies with regard to the privacy goals
Chapter | 43 Privacy-Enhancing Technologies 759
defined in the previous section. A simple metrics for mea- properties have to be reestablished afterward. T-closeness
suring anonymity is the anonymity set [14]. The anonym- as one of the latest developed properties in this category
ity set comprises all subjects that may have caused an is a close relative to information-theoretic privacy
event that is observed by the adversary. The subjects in metrics. Information-theoretic metrics measure the infor-
the set cover up for each other against the adversary. The mation that an adversary learns by observing an event or
adversary can thus not hold a single subject responsible a system, for example, all observable events in a commu-
for the observed event as long as the set size of the ano- nication system. The information always depends on the
nymity set is greater than one. Greater set sizes are useful knowledge the adversary had before his observation, the
for protecting against (stronger) adversaries that would a-priori knowledge. Knowledge is expressed as a proba-
accept false positives up to a certain threshold among the bility distribution over the events. For a discrete set of
subjects that are held responsible. In this case, the set size events X 5 fx1 ; x2 ; . . .; xn g and the probability mass func-
has to exceed such a threshold. tion Pr: X-½0; 1, Shannon [24] defines the self-
The anonymity set size is also related to the metrics in information of xi , 1 # i # n, as
k-anonymity [17]. K-anonymity is defined as a property
Iðxi Þ 5 2 log2 Prðxi Þ
or a requirement for databases that must not leak sensitive
private information. This concept was also applied as an The self-information Iðxi Þ is what an adversary learns
anonymity metrics in location-based services [18] and in when he observes the event xi with his a-priori knowledge
VoIP [19,20]. The underlying assumption is that database about all possible events encoded in the probability mass
tables store two kinds of attributes. The first kind is iden- function Pr. The self-information takes the minimal value
tifying information, and the second is identifying sensi- zero for Prðxi Þ 5 1; that is, the adversary will learn mini-
tive information. The claim is that the database table is mal information from observing xi , if he is a-priori certain
anonymous if every search for identifying information that xi will be observed. The self-information approaches
results in a group of at least k candidate records. The k in infinity for Prðxi Þ-0; that is, the more information
k-anonymity is thus the privacy parameter that determines the adversary learns, the less likely the observed events
the minimum group size. Groups of candidate records are. The expected self-information of a system with the
form anonymity sets of identifiers in the database table. events X is the entropy of the system. Shannon defines
The k-anonymity as a privacy property and k as a the entropy as
metrics are not undisputed. In particular, the fact that X
k-anonymity only depends on the identifying informa- HðXÞ 5 Prðxi ÞUIðxi Þ
xi AX
tion in the database table—that is, it is independent of
the sensitive information—leads to remaining privacy The entropy is maximal when the distribution of all
risks [21]. A simple attack building on the k-anonym- events is uniform; that is, Prðxi Þ 5 1n for all 1 # i # n. The
ity’s blindness for sensitive information is described in maximal entropy is thus
[22]. The trick is to search for candidate groups where
1
the sensitive attribute has a constant value for all candi- Hmax 5 2 log2 5 log2 n
dates. The sensitive attribute value is immediately dis- n
closed for all records in the candidate group; thus, privacy The entropy is minimal when one event xi is perfectly
is breached. A solution to these risks is a new privacy certain; that is, Prðxi Þ 5 1 and Prðxj Þ 5 0 for all xj AX and
property, l-diversity, with the privacy parameter l. xj 6¼ xi .
The claim of l-diversity is that a database table is anon- The “degree of anonymity” is the entropy of the com-
ymous if the diversity of sensitive attribute values is at munication system in question [25,26]. This degree of
least l ( . 1) in every candidate group. In some cases, anonymity measures the anonymity of the message’s
l-diversification would not sufficiently protect from sender within the communication system. When the
attribute disclosure or would be too difficult to estab- adversary learns the actual sender of the message, he will
lish. A third property, t-closeness [23], is solving this learn the self-information Iðxi Þ where each xi AX encodes
problem. T-closeness restricts the distribution of sensi- the fact that one specific subject is the sender of the mes-
tive information within a candidate group. The claim sage. The probability distribution Pr encodes the a-priori
is that a database table is anonymous if the distribution knowledge of the adversary about the sender. A uniform
of sensitive information within each candidate group distribution (maximal entropy) indicates minimal a-priori
differs from the table’s distribution at most up to a knowledge (maximal sender anonymity). Absolute cer-
threshold t. tainty, on the other hand (minimal entropy), indicates per-
All these privacy metrics with the exception of the fect a-priori knowledge (minimal sender anonymity). The
anonymity set are tailored to static database tables. degree of anonymity can be used to compare communica-
Changes in the data set are possible, but the privacy tion systems with different features (numbers of subjects)
760 PART | V Privacy and Access Management
when normalized [25] with max entropy Hmax and other- technologies that we think are most relevant from a prac-
wise without normalization [26]. tical and scientific point of view.
The same entropy-based metrics can be applied to
measure the anonymity provided by facts about a subject
[27]. The adversary’s a-priori knowledge, encoded in the DC Network
probability mass function Pr, comprises how well a fea-
ture vector with facts about a subject fits to each subject David Chaum’s DC (Dining Cryptographer) network pro-
xi AX. The adversary learns the self-information Iðxi Þ tocol [28] is an anonymous communication protocol,
when learning that the feature vector applies to the sub- which, even though it cannot be easily used in practice, is
ject xi . The entropy HðXÞ can be seen as the confusion of still very interesting from a scientific perspective. It pro-
the adversary before he learns the subject that the feature vides unconditional sender anonymity, recipient anonym-
vector is applying to. ity, and unobservability, even if we assume a global
The entropy can be used to calculate the expected adversary who can observe all communication in the net-
anonymity set size, which is 2HðXÞ . The expected anonym- work. Hence, it can guarantee the strongest anonymity
ity set size is equal to the anonymity set size, if Pr corre- properties of all known anonymous communication proto-
sponds to the uniform distribution (if the message [25,26] cols. DC nets are based on binary superposed sending.
or feature vector [27] is not more or less linkable to one Before any message can be sent, each participant in the
subject than to any other subject in X). The anonymity set network (user station) has to exchange via a secure chan-
size can be seen as an overestimation of the expected ano- nel a random bit stream with at least one other user sta-
nymity set size, if Pr does not correspond to the uniform tion. These random bit streams serve as secret keys and
distribution; that is, some subjects are more linkable than are at least as long as the messages to be sent. For each
others in the anonymity set. In this case, the expected single sending step (round), every user station adds mod-
anonymity set size or the degree of anonymity is the ulo 2 (superposes), all the key bits it shares and its mes-
more accurate metrics. sage bit, if there is one. Stations that do not wish to
Metrics that are based on entropy have the disadvan- transmit messages send zeros by outputting the sums of
tage that the a-priori knowledge Pr of the adversary has their key bits (without any inversions). The sums are sent
to be known when evaluating the metrics. When this a- over the net and added up modulo 2. The result, which is
priori knowledge can be derived from publicly available broadcast to all user stations, is the sum of all sent mes-
observations (message routing data in a network [25,26]), sage bits because every key bit was added twice (see
the metrics are easy to apply. If the Pr depends on per- Figure 43.1). If exactly one participant transmits a
sonal information which is not available to nonadversaries
[27], the metrics are hard to evaluate without additional sender a
tools being effective (legal transparency tools). message 01001000
keywith b + 10101010
keywith c + 11100011
6. DATA MINIMIZATION TECHNOLOGIES transmission 00000001
broadcast
In this section, we will present the most relevant PETs for
minimizing data on both communication and application
sender b
levels. It is important to note that applications (such as
message 00000000
eCommerce applications) can only be designed and used
keywith a + 10101010
anonymously if their users cannot be identified on a com-
keywith c + 00111010
munication level (via their IP addresses). Hence anony-
transmission 10010000 + 01001000
mous communication is a prerequisite for achieving
anonymity, more generally or data minimization, on the
application level. sender c
message 00000000
keywith a + 11100011
Anonymous Communication
keywith b + 00111010
Already in 1981, David Chaum presented the Mix net transmission 11011001
protocol for anonymous communication, which has
become the fundamental concept for many practical anon-
FIGURE 43.1 A DC network with three users. Sender a sends a mes-
ymous communication technologies that have been sage, but from observing the communication in the network alone, the
broadly used for many years. In this section, we will pro- adversary cannot tell whether it was sent by a, b, or c (and not even if a
vide an overview to anonymous communication meaningful message was sent at all).
Chapter | 43 Privacy-Enhancing Technologies 761
message, the message is successfully broadcast as the 3. All messages are recoded. This is usually done by
result of the global sum to each participant. Collisions cryptography. Without this functionality, the adversary
are easily detected (as the message sent by a user and the could link the input with the output messages by com-
one that is broadcast back to him will be different) and paring the contents of the messages.
have to be resolved, for example, by retransmitting the 4. The sending sequence of delayed messages is deter-
message after a random number of rounds. mined independently of the receiving sequence.
In theory, superposed sending provides, in the Without the delay and reordering of messages, an
information-theoretic sense, perfect (unconditional) adversary could link the input to the output messages
sender anonymity and unobservability, as the fact that by a time correlation attack (he could be sure that the
someone is sending a meaningful message (and not only first message is the first message out).
zeros) is hidden by a one-time pad encryption. From a 5. Mixes can be used to achieve unlinkability of sender
metrics perspective, all senders in the DC network form and recipient, sender anonymity as well as recipient
the anonymity set. No user is more likely to be the anonymity. For achieving the latter two properties, dif-
sender of the message than any other user for an outside ferent recoding functions are used. For providing
adversary. Perfect recipient anonymity can be achieved sender anonymity, asymmetric cryptography is used.
by reliable broadcasting. However, the DC network and The mix user (or more precisely his machine) encrypts
the one-time pad share the same practical shortcomings the message m with the public key eR of the recipient
which have prevented them from being broadly used: and achieves enceR ðmÞ. He then encrypts enceR ðmÞ
the security of DC networks depends on the perfect ran- together with the address of the recipient and a nonce
domness of keys and the secure distribution of keys. with public key e1 of the mix and sends the resulting
Moreover, each key is to be used only once, and the message ence1 ðr1 ; AR ; enceR ðmÞÞ to the mix. Adding the
keys need to be perfectly unavailable to the adversary nonce is necessary to achieve nondeterministic encryp-
(the adversary may not get hold of the keys before or tion, which prevents an adversary from monitoring the
after the message has been sent). output message enceR ðmÞ and address AR of the recipi-
ent, and then simply encrypt both values with the
public key of the mix and compare it with the mes-
Mix Nets sages sent to the mix. Moreover, it prevents the mix
from discarding one of two messages when identical
Mix nets [12] are more practical than DC networks, but contents are intended to be sent. The mix decrypts the
do not provide security against adversaries with unlimited
message with its private key, discards the nonce, and
resources. Nevertheless, most anonymity networks
sends enceR ðmÞ to the address AR of the recipient.
(Mixmaster, Tor, Onion Routing, and AN.ON) build on
6. Using a single mix can only provide anonymity if it is
the mix net concept. A mix is a relay or proxy server that
fully trustworthy and cannot be compromised. For
performs four steps to hide the relation between incoming
improving security, several mixes can be used in a
and outgoing messages (see Figure 43.2):
chain or a “cascade.” Let us assume that the sender
1. Duplicates (replayed messages) are discarded. Without (or more precisely his machine) chooses a chain of n
this functionality, an adversary could launch a replay mixes with addresses Ai and public keys ei , i 5 1. . .n.
attack by sending two identical messages to be for- The sender will first add layers of encryptions using
warded as two identical output messages by the mix. the public keys of the mixes in the path in reverse
The adversary could thus link these messages and order. Each layer includes the message to be forwarded
therefore “bridge over” the mix. by the mix, the address to which the message should be
2. All messages are (randomly) delayed (by temporarily sent (next mix in the chain or the final recipient), plus a
storing them in a buffer). Without this functionality (if nonce to be discarded. The resulting message
messages were immediately forwarded), the anonym- ence1 ðr1 ; A2 ; ence2 ð. . .encen ðrn ; AR ; enceR ðmÞÞ. . .ÞÞ is
ity set for one message would be reduced to one sent to the first mix (with the address A1 ). Each mix
sender (no anonymity at all). on the path decrypts the message with its privacy key
and thereby gets a nonce that is discarded as well as
an encrypted message and address to which it sends
mix this message. The last mix in the path finally sends
1 filterduplicates
1 2 3 4
2 delaymessages
enceR ðmÞ to the recipient. Unlinkability of sender and
3 recodemessages
recipient can in principle also be provided in the pres-
f (x) x<y
4 reordermessages ence of an adversary who monitors all communication
lines, as long as the crypto operations cannot be bro-
FIGURE 43.2 Processing steps within a mix. ken and one mix in the path is trustworthy, that is, one
762 PART | V Privacy and Access Management
Create c , enc x
1 OR (g 1)
1
y1 )
Created c1, g , H (k1 Legend
enc(·) – public-keyencryption
Relay c1, ENC {Ex ENC{·} – symmetric encryption
k1 tend, enc (gx2)} H(·) – one-way hash
OR2
Create c , enc x
c – channel (“circuit”) id
2 OR (g 2)
2 g,x,y – Diffie-Hellman: generator,
y2 ) exponents
Created c2, g , H (k2
y
tend, g 2, H (k2)}
Relay c1, ENCk1 {Ex
FIGURE 43.4 The Tor key negotiation and a simple Web site request [31].
2. Setting up and operating mixes is expensive due to the cells. All cells arriving at an onion router within a fixed
considerable organizational overhead for establishing time interval are mixed together to reduce correlation by
mix cascades and due to the high performance network insiders. Reply onions, which correspond to
requirements. untraceable return addresses, allow for a responder to
send back anonymously a reply after its original circuit is
broken.
Onion Routing/Tor Since individual routing nodes in each circuit only
Onion Routing [30] is a low-latency, mix-based routing know the identities of adjacent nodes, and since the nodes
protocol developed in the 1990s at the Naval Research further encrypt multiplexed virtual circuits, traffic analy-
Laboratory. It provides anonymous socket connections by sis is made difficult. However, if the first node behind the
means of proxy servers. Onion Routing uses the mix net initiator’s proxy and the last node of the circuit cooperate,
concept of layers of public key encryption (the so-called they will be able to determine the source and recipient of
onion) to build up an anonymous bidirectional virtual cir- communication through the number of cells sent over this
cuit between communication partners. The initiator’s circuit or through the duration for which the virtual circuit
proxy (for the service being requested) constructs a “for- was used.
ward onion,” which encapsulates a series of routing nodes Tor [31], the second generation of Onion Routing, has
(“mixes”) forming a path to the responder, and sends it added several improvements. In particular, it provides for-
with a create command to the first node. Each layer of ward secrecy. Once the session keys are deleted they can-
the onion is encrypted with the public key of each node not be obtained any longer. Even if all communication
on the path and contains symmetric crypto function/key has been wiretapped, the long-term secret keys of the
pairs as a payload. After sending the onion, the anony- onion routers ("mixes") become compromised. Therefore,
mous path is established and the initiator’s proxy sends instead of using hybrid encryption for distributing sym-
data through this anonymous connection. The symmetric metric session keys, the Diffie-Hellman key negotiation
function/key pairs are applied by each node on the path to protocol is used, which provides forward secrecy. The
crypt data that will be sent along the virtual circuit. All key negotiation and a simple communication via two
information (onions, data, and network control) are sent onion routers are outlined in Figure 43.4. The symmetric
through the Onion Routing network in uniform-sized session key shared by the user with the first onion router
764 PART | V Privacy and Access Management
OR1 is negotiated by means of a Diffie-Hellman hand- are also needed on the application level. Many such data
shake. The first half of the handshake gx1 is encrypted minimization techniques are based on cryptographic pro-
with the public key of OR1 (to prevent man-in-the-middle tocols (see also [33] for an overview). The classical types
attacks), and encOR1 ðgx1 Þ is then sent to OR1 . The onion of privacy-protecting cryptography that have already been
router replies with the second half of the handshake; that applied for decades are of course encryption schemes
is, gy1 and a hash over the negotiated session key themselves. However, there are a number of more recent
k1 5 gx1 y1 . Any further communication between the sender crypto schemes for protecting data and authenticating
and OR1 will be crypted with this negotiated symmetric information, which are variations or extensions of basic
session key. In order to extend the path from the sender crypto schemes they have “surprising properties” that in
over OR1 to a second onion router OR2 , the sender trans- many cases can offer better data minimization properties
mits ENCk1 fOR2 ; encOR2 ðgx2 Þg to OR1 . The first onion [33]. In this chapter, some of the most relevant examples
router decrypts the symmetric encryption and forwards of mainly cryptographic mechanisms for protecting pri-
the encrypted first half of a new Diffie-Hellman hand- vacy at application level will be given.
shake encOR2 ðgx2 Þ to OR2 . The second onion router replies
with the second half of the handshake gy2 and the hash
over the session key k2 5 gx2 y2 . The reply Blind Signatures and Anonymous eCash
ENCk1 fgy2 ; hashðk2 Þg is forwarded by OR1 back to the
Blind signatures are an extension of digital signatures and
sender. Only the sender and OR1 are now in possession of
provide privacy by allowing someone to obtain a signa-
k1 and only the sender and OR2 are now in possession of
ture from a signer on a document without the signer see-
k2 . The communication between sender and OR2 can now
ing the actual content of the “blinded” document he is
be crypted with k2 . Once a circuit has been established,
signing. Hence, if the signer is later presented with the
the symmetric encryption with the negotiated session
signed “unblinded” document, he cannot relate it with the
keys is applied by each node on the path to crypt data
signing session and with the person on behalf of whom he
that will be sent along the circuit. Advantages of Tor over
has signed the document. Blind signatures were invented
AN.ON are as follows:
by David Chaum as a basic building block for anonymous
1. Tor provides forward secrecy. eCash (see below). They can also be used to achieve ano-
2. It is easy to set up new onion routers (“mixes”), which nymity of other applications, such as eVoting, and also
are run by many volunteers all over the world. serve as a basic building block for other privacy crypto
3. There are lower performance requirements for each protocols, such as anonymous credentials (see below).
“mix.” David Chaum et al. have invented protocols based on
4. Each mix is a possible bottleneck, however, in Tor, blind signatures [10,34,35], which allow electronic money
“mixes” that do not perform can be excluded from the to flow perfectly tracelessly from the bank through con-
dynamic routing. sumer and merchant before returning to the bank.
Chaum’s cryptographic “online” payment protocol based
Disadvantages include:
on blind signatures can be summarized as follows (see
1. Anyone can set up “mixes” independent of their per- also Figure 43.5).
formance, that is, bandwidth, latency, security. Let ðe; nÞ be the bank’s private key indicating a certain
2. There is no audit or certification, thus a lack of reli- value of a signature under this key (in this example: one
able data about legislation and operator. dollar) and ðd; nÞ the bank’s public key.6 f is a
3. Bridging the “mixes” and thus breaching anonymity suitable one-way function. Electronic money has the form
by controlling the entry node and the exit node is eas- ðx; f ðxÞd ðmod nÞÞ, where the one-way function is needed
ier for adversaries, since they can easily set up their to prevent forgery of electronic money (see also [36] for
own new nodes. In particular, an adversary can easily more explanations).
try to attract user traffic by establishing a few well-
1. The customer Alice (his computer) first generates a bank
performing exit nodes and a lot of stable intermediate
note number x (of at least 100 digits) at random and (in
mix nodes that eventually become entry nodes [32].
essence) multiplies it with a blinding factor r, which he
has also chosen at random: B 5 r e Uf ðxÞ ðmod nÞ. He
then signs the blinded bank note number with his private
Data Minimization at Application Level
key and sends it to the bank.
Even if the communication channel is anonymized, users 2. The bank verifies and removes Alice’s signature.
can still reveal personal and identifying data on the appli- Then, it signs the blinded note (and thereby creates
cation level; often users have to reveal more personal
data than needed. Hence, data minimization techniques 6. Using the RSA encryption scheme.
Chapter | 43 Privacy-Enhancing Technologies 765
the blinded signature) with its “worth one dollar” sig- once. If, however, a note is spent twice, the bank will get
nature: enough information to identify the spender’s account.
Bd ðmod nÞ 5 ðr e Uf ðxÞÞa ðmod nÞ 5 rUf ðxÞd ðmod nÞ. Disappointingly, there has been a lack of adoption of
The bank then withdraws one dollar from his account anonymous eCash (attempts of commercial deployment
and returns the note with the blind signature. of Chaum’s schemes failed in the late 1990s), and today,
3. Alice divides out the blinding factor and thereby there are still no widely deployed anonymous electronic
d
extracts: C 5 Br ðmod nÞ 5 f ðxÞd ðmod nÞ from B. For payment services.
paying the online merchant Bob one dollar, Alice
sends him the pair ðx; f ðxÞd ðmodnÞÞ.
4. Bob verifies the bank’s signature and immediately Zero-Knowledge Proofs
contacts the bank to verify that the note has not
A zero-knowledge proof is defined as an interactive proof
already been spent.
in which a prover can prove to a verifier that a statement
5. The bank verifies its signature, checks the note against
is true without revealing anything else than the veracity
a list of those notes already spent, and credits Bob’s
of the statement. Zero-knowledge proofs were first pre-
account by one dollar.
sented in 1985 by Goldwasser et al. [37]. A zero-
The blind signature scheme provides (unconditional) knowledge proof must fulfill the following three
anonymity of the electronic money: Even if the bank and properties:
the merchant cooperate, they cannot determine who spent G Completeness: If the statement is true, the honest veri-
the notes. Since the bank does not know the blinding factors,
fier will be convinced of this fact by an honest prover.
it cannot correlate the note it was signing blindly for Alice G Soundness: If the statement is false, no cheating
with the note that was spent. (However, Alice’s identity is
prover can convince the honest verifier that it is true,
only protected if he also uses an anonymized communica-
except with some very small probability.
tion channel and if he does not personally reveal identifying G Zero-knowledge: If the statement is true, no cheating
information, such as a personal delivery address).
verifier learns anything other than this fact.
In addition to the online eCash protocol version,
where the bank needs to be constantly online for checking Zero-knowledge proofs are building blocks for data-
whether notes have already been spent, in [28] a protocol minimizing technologies, such as anonymous credential
for offline electronic money is presented by Chaum et al. systems. The anonymous credential protocol IdeMix is,
With the offline protocol, a user remains unconditionally for example, based on proofs of knowledge, in which a
anonymous as long as he spends each bank note only prover proves that he knows a secret value or that he is
766 PART | V Privacy and Access Management
able to solve some number theoretic problem, which also enable the user in the transformation to apply any
would contradict the assumption that the problem cannot mathematical function to the (original) attribute value,
be solved by a polynomially bounded Turing machine. allowing him to prove only attribute properties without
revealing the attribute itself. Besides, with the IdeMix
protocol by Camenisch et al., the issuer’s signature is also
Anonymous Credentials
transformed in such a way that the signature in the new
A traditional credential (often also called certificate or attri- certificate cannot be linked to the original signature of the
bute certificate) is a set of personal attributes, such as birth issuer [33]. Hence, different credential uses cannot be
date, name or personal number, signed (and thereby certi- linked by the verifier and/or issuer (unlinkability prop-
fied) by the certifying party (the so-called issuer), and bound erty). Cryptographically speaking, with IdeMix, the user
to its owner by cryptographic means (by requiring the user’s is basically using a zero-knowledge proof to convince the
secret key to use the credential). In terms of privacy, the use verifier of possessing a signature generated by the issuer
of (traditional or anonymous) credentials is better than the on a statement containing the subset of attributes.
direct request to the certifying party, as this prevents the cer- Figure 43.6 provides a scenario illustrating how data
tifying party from profiling the user. Traditional credentials minimization can be achieved in an identity management
require, however, that all attributes are disclosed together if online transaction: First, user Alice obtains an anonymous
the user wants to prove certain properties, so that the verifier driving license credential issued by the Swedish Road
can check the issuer’s signature. This makes different uses authority (the so-called identity provider), with personal
of the same credential linkable to each other. Moreover, the attributes typically stored in the license, including her
verifier and issuer can link the different uses of the user’s birth date. Later, she would like to purchase a video from
credential to the issuing of the credential. an online shop (the so-called relying party, which is also
Anonymous credentials (also called private certifi- the verifier in this scenario), which is only permitted for
cates) were first introduced by Chaum [10] and later adults. After sending a service request, the online shop
enhanced by Brands [38] and by Camenisch and will answer her with a data request for a proof that she is
Lysyanskaya [39] and have stronger privacy properties older than 18. Alice can now take advantage of the selec-
than traditional credentials. Microsoft’s U-Prove technol- tive disclosure feature of the anonymous credential proto-
ogy based on Brands’s protocols and IBM’s IdeMix tech- col to prove with her credential just the fact that she is
nology based on the credential protocols by Camenisch older than 18 without revealing her birth date or any other
et al. are currently the practically most relevant anony- attributes of her credential. If Alice later wants to pur-
mous credential technologies. chase another video that is only permitted for adults at
Anonymous credentials allow the user to essentially the same video online shop, she can use the same anony-
“transform” the certificate into a new one that contains mous credential for a proof that she is over 18. If the
only a subset of attributes of the original certificate (it IdeMix protocol is used, the video shop is unable to rec-
allows proving only a subset of its attributes to a verifier ognize that the two proofs are based on the same creden-
(selective disclosure property)). Instead of revealing the tial. Hence, the two rental transactions cannot be linked
exact value of an attribute, anonymous credential systems to the same person.
Age > 18
1. Issues
credentials
Advantages:
• Selective Disclosure
• Unlinkability of Transactions (Idemix)
• No Profiling by IdPs or Relying Parties
label design more accurately and quickly (see [46]). A sides and which will “travel” with the data transferred to
more advanced policy language, the PrimeLife Policy downstream controllers.
Language (PPL), was developed in the EU FP7 project As PPL has many features that P3P does not provide
PrimeLife [47]. PPL is a language used to specify not only (downstream data controllers, credential selection for cer-
the privacy policies of data controllers but also those of tified data, and obligations), the design of usable PPL
third parties (so-called downstream controllers) to whom user interfaces provides many challenges. The “Send
data are further forwarded as well as privacy preferences Data?” user interfaces for letting the PPL engine interact
of users. It is based on two widespread industry standards, with the user for displaying the result of policy matches,
XACML (eXtensible Access Control Markup Language) identity/credential selection and for obtaining informed
and SAML (Security Assertion Markup Language). The consent to disclose selected certified and uncertified data
data controller and downstream data controller have poli- items were developed and presented [48]. Figure 43.7
cies basically specifying which data are requested from the depicts an example PPL “Send Data?” dialogue.
user and for which purposes and obligations (e.g., under The PPL user interfaces follow the Art. 29 Data
the obligation that the data will be deleted after a certain Protection Working Party recommendation of providing
time period). PPL allows specifying both uncertified data policy information in a multilayered format [49] for mak-
requests and certified data requests based on proofs of the ing policies more understandable and usable. According
possession of (anonymous IdeMix or traditional X.509) to this recommendation, a short privacy notice on the top
credentials that fulfill certain properties. The user’s prefer- layer offers individuals the core information required
ences allow expressing for each data item to which data under Art. 10 EU Directive 95/46/EC, including at least
controllers and downstream data controllers the data can the identity of the service provider and the purpose of
be released and how the user expects his data to be treated. data processing. In addition, a clear indication (in the
The PPL engine conducts an automated matching of the form of URLs—in our example the “privacy policy”
data controller’s policy and the user’s preferences. The URLs of the two data controllers) must be given as to
result can be a mutual agreement concerning the usage of how the individuals can access the other layers presenting
data in the form of a so-called sticky policy, which should the additional information required by Art. 10 and
be enforced by the access control systems at the backend national privacy laws.
Chapter | 43 Privacy-Enhancing Technologies 769
Ex-Post Transparency-Enhancing Tools8 on secure logging systems that usually extend the Kelsey-
Schneier log [53] and protect the integrity and the confi-
An important transparency-enhancing tool that falls into dentiality of the log data. Such a secure logging system
categories 2 and 3 of our classification is the data track and an automated privacy audit facility are key compo-
that has been developed in the PRIME9 and PrimeLife10 nents of a privacy evidence approach proposed by [54].
projects [51]. The data track is a user side transparency This privacy evidence system allows a user to inspect all
tool, which includes both a history function and online log entries that are recording actions of that user with a
access functions. The history function keeps for each special view tool and permits sending the log view cre-
transaction, in which a user discloses personal data to a ated by that tool to the automated privacy audit compo-
communication partner, a record for the user on which nent, which compares the log view with the privacy
personal data are disclosed to whom (the identity of the policy and to construct privacy evidence. This privacy
controller), for which purposes, which credentials and/or evidence indicates to the user whether the privacy policy
pseudonyms have been used in this context, as well as the has been violated.
details of the negotiated or given privacy policy. These The unlinkability of log entries, which means that
transaction records are either stored at the user side or they should not be stored in the sequence of their crea-
centrally (in the Cloud—see [52]) in a secure and tion, is needed to prevent an adversary from correlating
privacy-friendly manner. User-friendly search functionali- the log with other information sources, such as other
ties, which allow the user to easily get an overview about external logs, which could allow him to identify data sub-
who has received what data about him or her, are also jects to whom the entries refer (cf. [55]). Also, anony-
included. The Online access functions allow end users to mous access is needed to prevent an adversary from
exercise their rights to access their data at the remote ser- observing who views which log entries and in this way
vices sides online. In this way, they can compare what conclude to whom the log entries refer. Wouters et al.
data they have disclosed to a services side with what data [56] have presented such a secure and privacy-friendly
are still stored by the services side. This allows them to Logging for eGovernment Services. However, it
check whether data have been changed, processed, or addresses the unlinkability of logs between logging sys-
deleted (in accordance with data retention periods of the tems in eGovernment rather than the unlinkability of log
negotiated or given privacy policy). Online access is entries within a log. Moreover, it does not address insider
granted to a user if he can provide a unique transaction attacks, nor does it allow anonymous access to log
ID (currently implemented as a 16-byte random number), entries.
which is shared between the user (stored in his data track) Within the PrimeLife project, a secure logging system
and the services side for each transaction of personal data has been developed, which addresses these aspects of the
disclosure. This in principle also allows anonymous or unlinkability of log entries and anonymous user access. In
pseudonymous users to access their data. particular, it fulfills the following requirements (see [55]):
While the data track is still in the research prototype
stage, the Google Dashboard is already available in prac- G Only the data subject can decrypt log entries after
tice and grants its users access to a summary of the data they have been committed to the log.
stored with a Google account, including account data and G A user can check the integrity of his log entries. A ser-
the users’ search query history, which are, however, only vice provider can check the integrity of the whole log
a part of the personal data that Google processes. It does file.
not provide any insight into how these data provided by G It is not possible for an attacker to secretly modify log
the users (the users’ search queries) have subsequently entries, which have been committed to the log before
been processed by Google. Besides, access is provided the attacker took over the system (forward integrity).
only to authenticated Google users. G It is practically impossible to link log entries, which
Further examples of transparency tools that allow refer to the same user.
users to view and control how their personal data have G For efficiency reasons, it should be possible for a data
been processed and to check whether this is in compli- subject to read his log entries without the need to
ance with a negotiated or given privacy policy are based download and/or fully traverse the whole log database.
The need and requirements of tools that can anticipate
8. This section corresponds to most parts of Section 2.4.2 that the lead- profiles (category 4 of our definition above) have been
ing author has contributed to ENISA’s study on “Privacy, Accountability analyzed within studies of the FIDIS project (see, for
and Trust—Challenges and Opportunities,” published in 2011 [50]. instance, [57]). To the best of our knowledge, no practical
9. EU FP6 project PRIME (Privacy and Identity Management for
Europe), www.prime-project.eu.
transparency-enhancing tools fulfill the requirements. In
10. EU FP7 project PrimeLife (Privacy and Identity Management for academia, promising approaches [58,59] have been for-
Life), www.primelife.eu. mulated and are the subject of research.
770 PART | V Privacy and Access Management
[31] R. Dingledine, N. Mathewson, P. Syverson, Tor: The Second- [47] PrimeLife, Privacy and Identity Management in Europe for Life-
Generation Onion Router, Naval Research Lab, Washington, DC, Policy Languages, [Online]. Available: ,http://primelife.ercim.
2004. eu/results/primer/133-policy-languages..
[32] R. Böhme, G. Danezis, C. Dı́az, S. Köpsell, A. Pfitzmann, On the [48] J. Angulo, S. Fischer-Hübner, E. Wästlund, T. Pulls, Towards
PET Workshop Panel “Mix Cascades Versus Peer-to-Peer: Is One usable privacy policy display & management for primelife, Inf.
Concept Superior?”, Workshop on Privacy-Enhancing Manag. Comput. Secur. (Emerald) 20 (1) (2012) 417.
Technologies (PET) 2004, 2005. [49] Opinion on More Harmonised Information provisions. 11987/04/
[33] Jan Camenish, Maria Dubovitskaya, Markulf Kohlweiss, Jorn EN WP 100, Chapter 29 Data Protection Working Party,
Lapon, Gregory Neven, Cryptographic mechanisms for privacy, November 25, 2004. [Online]. Available: ,http://ec.europa.eu/
Privacy and Identity Management for Life, Springer, justice_home/fsj/privacy/docs/wpdocs/2004/wp100_en.pdf..
Heidelberg, 2011, pp. 117134. [50] ENISA, Privacy, Accountability and Trust—Challenges and
[34] D. Chaum, A. Fiat, M. Naor, Untraceable Electronic Cash, Adv. Opportunities, 2011. [Online]. Available: ,http://www.enisa.
Cryptol. Crypto’88 (1988). europa.eu/activities/identity-and-trust/privacy-and-trust/library/
[35] D. Chaum, Achieving electronic privacy, Sci. Am. (1992) deliverables/pat-study..
7681. [51] E. Wästlund, S. Fischer-Hübner, End User Transparency Tools:
[36] S. Fischer-Hübner, LNCS IT-Security and Privacy—Design and UI Prototypes, PrimeLife Deliverable D4.2.2. ,www.primelife.
Use of Privacy-Enhancing Security Mechanisms, Springer, eu., June 2010.
Heidelberg, 2001. [52] T. Pulls, Privacy-Friendly Cloud Storage for the Data Track: An
[37] S. Goldwasser, S. Micali, C. Rackoff, The knowledge complexity Educational Transparency Tool, NordSec—17th Nordic
of interactive proof systems, Proceedings of the 17th ACM Conference on Secure IT Systems will be held at Blekinge
Symposium on Theory of Computing, 1985, pp. 291304. Institute of Technology, Karlskrona, October 2012.
[38] S. Brands, Rethinking Public Key Infrastructure and Digital [53] B. Schneier, J. Kelsey, Cryptographic Support for Secure Logs on
Certificates—Building in Privacy, PhD thesis. Eindhoven. Untrusted Machines, The Seventh USENIX Security Symposium
Institute of Technology, 1999. Proceedings, USENIX Press, 1998, pp. 5362.
[39] J. Camenisch, A. Lysyanskaya, Efficient non-transferable anony- [54] S. Sackmann, J.A.R. Strüker, Personalization in privacy-aware
mous multi-show credential system with optional anonymity revo- highly dynamic systems, Commun. ACM 49 (9) (September 2006).
cation, Adv. Cryptol. Eurocrypt 2045 (2001) 93118. [55] H. Hedbom, T. Pulls, P. Hjärtquist, A. Lavén, Adding Secure
[40] D. Cooper, K. Birman, Preserving privacy in a network of mobile Transparency Logging to the PRIME Core, i 5th IFIP WG 9.2,9.6/
computers, Proceedings of the 1995 IEEE Symposium on Security 11.7,11.4,11.6 / PrimeLife International Summer School, revised
and Privacy, Oakland, May 1995. selected papers, published by Springer in 2010, Nice, France, 2009.
[41] R. Ostrovsky, W. Skeith, A Survey of Single-Database Private [56] K. Wouters, K. Simoens, D. Lathouwers, B. Preneel, Secure and
Information Retrieval: Techniques and Applications, Public Key Privacy-Friendly Logging for eGovernment Services, 3rd
CryptographyPKC 2007, Springer, 2007. International Conference on Availability, Reliability and Security
[42] H. Lipmaa, Oblivious Transfer or Private Information Retrieval, (ARES 2008), IEEE, 2008, pp. 10911096.
[Online]. Available: ,http://www.cs.ut.ee/Blipmaa/crypto/link/ [57] M. Hildebrandt, Biometric Behavioral Profiling and Transparency
protocols/oblivious.php.. Enhancing Tools, FIDIS Deliverable D 7.12, ,www.fidis.net.,
[43] R. Leenes, M. Lips, R. Poels, M. Hoogwout, User aspects of 2009.
Privacy and Identity Management in Online Environments: [58] S. Berthold, R. Böhme, Valuating privacy with option pricing the-
Towards a theoretical model of social factors, PRIME Framework ory, in: T. Moore, D.J. Pym, C. Ioannidis (Eds.), Economics of
V1 (Chapter 9) project Deliverable, 2005. Information Security and Privacy, Red. Springer, 2010,
[44] H. Hedbom, A survey on transparency tools for privacy purposes, pp. 187209.
Proceedings of the 4th FIDIS/IFIP Summer School, published by [59] S. Berthold, Towards a Formal Language for Privacy Options,
Springer, 2009, Brno, September 2008. Privacy and Identity Management for Life, 6th IFIP WG 9.2,9.6/
[45] W3C, P3P—The Platform for Privacy Preferences 1.1 (P3P1.1) 11.7, 11.4, 11.6/PrimeLife International Summer School 2010,
Specification, 2006. [Online]. Available: ,http://www.w3.org/ Revised Selected Papers, 2011.
P3P/.. [60] S. Fischer-Hübner, C.J. Hoofnagle, I. Krontiris, K. Rannenberg,
[46] P. Kelley, L. Cesca, J. Bresee, L. Cranor, Standardizing privacy M. Waidner, Online privacy: towards informational self-
notices: an online study of the nutrition label approach, determination on the internet (Dagstuhl Perspectives Workshop
Proceedings of the 28th International Conference on Human 11061), Dagstuhl Manifestos 1 (1) (2011) 120.
Factors in Computing Systems ACM, 2010, p. 1573.