👁 1
💬 0
📄 Extracted Text (4,295 words)
Decentralizing Privacy: Using Blockchain to Protect
Personal Data
Guy Zyskind Oz Nathan Alex 'Sandy' Pentland
MIT Media Lab Tel-Aviv University MIT Media Lab
Cambridge, Massachusetts Tel-Aviv, Israel Cambridge, Massachusetts
Email: [email protected] Email: [email protected] Email: [email protected]
Abstract—The recent Increase in reported incidents of surveil- autonomous deployment of a PDS which includes a mecha-
lance and security breaches compromising users' privacy call into nism for returning computations on the data, thus returning
question the current model, in which third-panies collect and con- answers instead of the raw data itself [61. Across the industry,
trol massive amounts of personal data. Bitcoin has demonstrated leading companies chose to implement their own proprietary
in the financial space that trusted, auditable computing is possible authentication software based on the OAuth protocol [191, in
using a decentralized network of peers accompanied by a public
which they serve as centralized trusted authorities.
ledger. In this paper. we describe a decentralized personal data
management system that ensures users own and control their From a security perspective, researchers developed various
data. We implement a protocol that turns a blockchain into an
techniques targeting privacy concerns focused on personal
automated access-control manager that does not require trust in
data. Data anonymization methods attempt to protect person-
a third party. Unlike Bitcoin, transactions in our system are not
strictly financial — they are used to carry instructions, such as ally identifiable information. k-anonymity, a common property
storing, querying and sharing data. Finally, we discuss possible of anonymized datasets requires that sensitive information of
future extensions to blockchains that could harness them into a each record is indistinguishable from at least k —1 other records
well-rounded solution for trusted computing problems in society. [241. Related extensions to k-anonymity include I-diversity,
which ensures the sensitive data is represented by a diverse
Keywords—blockchain; privacy; bitcoin; personal data
enough set of possible values [151; and t-closeness, which
looks at the distribution of sensitive data [14]. Recent research
I. INTRODUCTION has demonstrated how anonymized datasets employing these
The amount of data in our world is rapidly increasing. techniques can be de-anonymized [18], [5], given even a small
According to a recent report [22], it is estimated that 20% of amount of data points or high dimensionality data. Other
the world's data has been collected in the past couple of years. privacy-preserving methods include differential privacy, a tech-
Facebook, the largest online social-network, collected 300 nique that perturbs data or adds noise to the computational
petabytes of personal data since its inception [I] — a hundred process prior to sharing the data [7], and encryption schemes
times the amount the Library of Congress has collected in over that allow running computations and queries over encrypted
200 years [131. In the Big Data era, data is constantly being data. Specifically, fully homomorphic encryption (FHE) [9]
collected and analyzed, leading to innovation and economic schemes allow any computation to run over encrypted data,
growth. Companies and organizations use the data they col- but are currently too inefficient to be widely used in practice.
lect to personalize services, optimize the corporate decision-
In recent years, a new class of accountable systems
making process, predict future trends and more. Today, data is emerged. The first such system was Bitcoin, which allows
a valuable asset in our economy [211. users to transfer currency (bitcoins) securely without a cen-
While we all reap the benefits of a data-driven society, there tralized regulator, using a publicly verifiable open ledger (or
is a growing public concern about user privacy. Centralized blockchain). Since then, other projects (collectively referred
organizations — both public and private, amass large quantities to as Bitcoin 2.0 [81) demonstrated how these blockchains
of personal and sensitive information. Individuals have little or can serve other functions requiring trusted computing and
no control over the data that is stored about them and how it auditability.
is used. In recent years, public media has repeatedly covered
Our Contribution. I) We combine blockchain and off-
controversial incidents related to privacy. Among the better
known examples is the story about government surveillance blockchain storage to construct a personal data management
[21, and Facebook's large-scale scientific experiment that was platform focused on privacy. 2) We illustrate through our
platform and a discussion of future improvements to the
apparently conducted without explicitly informing participants
[101. technology, how blockchains could become a vital resource
in trusted-computing.
Related Work. There have been various attempts to ad-
dress these privacy issues, both from a legislative perspective Organization. Section II discusses the privacy problem we
([4], [201), as well as from a technological standpoint. Open- solve in this paper, section III provides an overview of the
PDS, a recently developed framework, presents a model for platform, whereas section IV describes in detail the techni-
cal implementation; section V discusses future extensions to
The first two authors contributed equally to this work. blockchains, and concluding remarks are found in section VI.
EFTA01612898
II. THE PRIVACY PROBLEM it to an off-blockchain key-value store, while retaining only
a pointer to the data on the public ledger (the pointer is the
Throughout this paper, we address the privacy concerns
SHA-256 hash of the data).
users face when using third-party services. We focus specifi-
cally on mobile platforms, where services deploy applications Both the service and the user can now query the data using
for users to install. These applications constantly collect high- a Tdatc, transaction with the pointer (key) associated to it. The
resolution personal data of which the user has no specific blockchain then verifies that the digital signature belongs to
knowledge or control. In our analysis, we assume that the either the user or the service. For the service, its permissions
services are honest-but-curious (i.e., they follow the protocol). to access the data are checked as well. Finally, the user can
Note that the same system could be used for other data- change the permissions granted to a service at any time by
privacy concerns, such as patients sharing their medical data issuing a Tacc,„ transaction with a new set of permissions, in-
for scientific research, while having the means to monitor how cluding revoking access to previously stored data. Developing
it is used and the ability to instantly opt-out. In light of this, our a web-based (or mobile) dashboard that allows an overview
system protects against the following common privacy issues: of one's data and the ability to change permissions is fairly
trivial and is similar to developing centralized-wallets, such as
Data Ownership. Our framework focuses on ensuring that Coinbase for Bitcoinl.
users own and control their personal data. As such, the system
recognizes the users as the owners of the data and the services The off-blockchain key-value store is an implementation of
as guests with delegated permissions. Kademilia 116I, a distributed hashtable (or DHT), with added
persistence using LevelDB2 and an interface to the blockchain.
Data Transparency and Auditability. Each user has
The DHT is maintained by a network of nodes (possibly
complete transparency over what data is being collected about disjoint from the blockchain network), who fulfill approved
her and how they are accessed.
read/write transactions. Data are sufficiently randomized across
Fine-grained Access Control. One major concern with the nodes and replicated to ensure high availability. It is
mobile applications is that users are required to grant a set instructive to note that alternative off-blockchain solutions
of permissions upon sign-up. These permissions are granted could be considered for storage. For example, a centralized
indefinitely and the only way to alter the agreement is by cloud might be used to store the data. While this requires some
opting-out. Instead, in our framework, at any given time the amount of trust in a third-party, it has some advantages in terms
user may alter the set of permissions and revoke access to of scalability and ease of deployment.
previously collected data. One application of this mechanism
would be to improve the existing permissions dialog in mobile
applications. While the user-interface is likely to remain the
same, the access-control policies would be securely stored on
a blockchain, where only the user is allowed to change them.
Ill. PROPOSED SOLUTION
We begin with an overview of our system. As illustrated
in Figure I, the three entities comprising our system are
mobile phone users, interested in downloading and using
applications; services, the providers of such applications who
require processing personal data for operational and business-
related reasons (e.g., targeted ads, personalized service); and
nodes, entities entrusted with maintaining the blockchain and a
distributed private key-value data store in return for incentives.
Note that while users in the system normally remain (pseudo)
anonymous, we could store service profiles on the blockchain
and verify their identity. DHT
The system itself is designed as follows. The blockchain
accepts two new types of transactions: Taccen, used for access Fig. I. Overview of the decentralized platform.
control management; and T elma, for data storage and retrieval.
These network operations could be easily integrated into a IV. THE NETWORK PROTOCOL
mobile software development kit (SDK) that services can use
in their development process. We now describe in detail the underlying protocol used
in the system. We utilize standard cryptographic building
To illustrate, consider the following example: a user installs blocks in our platform: a symmetric encryption scheme defined
an application that uses our platform for preserving her privacy. by the 3-tuple Venc) — the generator, encryption
As the user signs up for the first time, a new shared (user, and decryption algorithms respectively; a digital signature
service) identity is generated and sent, along with the asso-
ciated permissions, to the blockchain in a Taccess transaction.
scheme (DSS) described by the 3-tuple (g„„9,$,,,,v„,) -
the
generator, signature and verification algorithms respectively,
Data collected on the phone (e.g., sensor data such as location)
is encrypted using a shared encryption key and sent to the 'Coinbase bitcoin wallet. http://www.combase.com
blockchain in a Zhua transaction, which subsequently routes 2LcvelDB. http://githulxcom/gongleileveldb
EFTA01612899
implemented using ECDSA with secp256k1 curve [121; and the most recent transaction is returned, which allows update
a cryptographic hash function 71, instantiated by a SHA-256 and delete operations in addition to inserts.
[111 implementation.
3) Policy: A set of permissions a user u grants service s,
denoted by POLICY„. For example, if u installs a mobile
A. Building Blocks application requiring access to the user's location and contacts,
We now briefly introduce relevant building blocks that are then POLICY,,,, = {location, contads}. It is instructive to
used throughout the rest of this paper. We assume familiarity note that any type of data could be stored safely this way,
with Bitcoin [171 and blockchains. assuming the service will not subvert the protocol and label
the data incorrectly. Safeguards to partially prevent this could
I) Identities: Blockchains utilize a pseudo-identity mech- be introduced to the mobile SDK, but in any case, the user
anism. Essentially a public-key, every user can generate as could easily detect a service that cheats, as all changes are
many such pseudo-identities as she desires in order to increase visible to her.
privacy. We now introduce compound identities, an extension
of this model used in our system. A compound identity 4) Auxiliary Functions: Parse(x) de-seralizes the mes-
is a shared identity for two or more parties, where some sage sent to a transaction, which contains the arguments;
parties (at least one) own the identity (owners), and the rest CheckPolicy(phki.,xp), illustrated in Protocol 2, verifies that
have restricted access to it (guests). Protocol I illustrates the the originator has the appropriate permissions.
implementation for a single owner (the user) and a single guest
(the service). As illustrated, the identity is comprised of signing Protocol 2 Permissions check against the blockchain
key-pairs for the owner and guest, as well as a symmetric I: procedure CHEcicPoLicr(pkskig,xp)
key used to encrypt (and decrypt) the data, so that the data 2: s (— 0
is protected from all other players in the system. Formally, 3: ., policy = skig)
a compound identity is externally (as seen by the network) 4: if L[a otici,] 0 0 then
observed by the 2-tuple: pe a"9" POLICY„, Parse(L[apaicy])
Compound2): 61k) = (pka;;, (I) 6: if pkskip = SI; or
7: (pkakig = phs ,;; and xp E POLICY.,,,) then
Similarly, the entire identity (including the private keys) is 8: ‹— I
the following 5-tuple: 9: end if
10: end if
Compound„,, = (pk;;;;, sk;;;,pkasi;, skeig sk:`,11) (2) II: return s
12: end procedure
Protocol 1 Generating a compound identity
procedure COMPOUNDIDENTITY0t, B. Blockchain Protocols
2: u and s form a secure channel
3: u executes: Here we provide a detailed description of the core protocols
4: {pk,";;,skauip 4— guy() executed on the blockchain. Protocol 3 is executed by nodes
5: skeuif, ge„ c() in the network when a Tacces, transaction is received, and
6: it shares sku,t,pkay s: with s similarly, Protocol 4 is executed for 'Maga transactions.
7: s executes: As mentioned earlier in the paper, Zsc„„ transactions
8: (piesi;,sklu) 1- gsigo allow users to change the set of permissions granted to a
9: s shares plcssi; with s service, by sending a POLICY.,,, set. Sending the empty set
10: II Both it and s have skeuifc,pk."(9,pk1; revokes all access-rights previously granted. Sending a Tata.,
II: return pk:g,pka4,84;isc transaction with a new compound identity for the first time is
12: end procedure interpreted as a user signing up to a service.
Similarly, Tdata transactions govern read/write operations.
2) Blockchain Memory: We let L be the blockchain mem- With the help of CheckPolicy, only the user (always) or the
ory space, represented as the hastable L : {0,1}256 _+ service (if allowed) can access the data. Note that in lines 9 and
{O, I}N, where N >> 256 and can store sufficiently- 16 of Protocol 4 we used shorthand notation for accessing the
large documents. We assume this memory to be tamper- DHT like a normal hashtable. In practice, these instructions
proof under the same adversarial model used in Bitcoin and result in an off-blockchain network message (either read or
other blockchains. To intuitively explain why such a trusted write) that is sent to the DHT.
data-store can be implemented on any blockchain (including
Bitcoin), consider the following simplified, albeit inefficient, C. Privacy and Security Analysis
implementation: A blockchain is a sequence of timestamped
transactions, where each transaction includes a variable num- We rely on the blockchain being tamper-free, an assump-
ber of output addresses (each address is a 160-bit number). L tion that requires a sufficiently large network of untrusted
could then be implemented as follows — the first two outputs peers. In addition, we assume that the user manages her keys
in a transaction encode the 256-bit memory address pointer, in a secure manner, for example using a secure-centralized
as well as some auxiliary meta-data. The rest of the outputs wallet service. We now show how our system protects against
construct the serialized document. When looking up L[kl, only adversaries compromising nodes in the system. Currently, we
EFTA01612900
Protocol 3 Access Control Protocol V. DISCUSSION OF FUTURE EXTENSIONS
L: procedure HANDLEAccEssTX(pk.kig, m)
In this section, we slightly digress to present possible
2: s 4— 0
future extensions to blockchains. These could play a significant
3: pk:g,pk:;:,POLICY,, = Parse(m)
role in shaping more mature distributed trusted computing
4: if pkikis, = pk:i; then platforms, compared to current state-of-the-art systems. More
5: L[7-i(pkts,)] = m specifically. they would greatly increase the usefulness of the
6: s4-1 platform presented earlier.
7: end if
8: return A. From Storage to Processing
9: end procedure
One of the major contributions of this paper is demonstrat-
ing how to overcome the public nature of the blockchain. So
Protocol 4 Storing or Loading Data far, our analysis focused on storing pointers to encrypted data.
I: procedure HANDLEDATATX(pkskig,m) While this approach is suitable for storage and random queries,
2: C, Xp,rw = Parse(m) it is not very efficient for processing data. More importantly,
3: if Check Policy(phskig, xp) = True then once a service queries a piece of raw data, it could store it for
4: plz. ,pia, POLICY., 4- future analysis.
Parse(LIN(ple A better approach might be to never let a service observe
5: Or, = it(Pk:;; II xp) the raw data, but instead, to allow it to run computations
6: if rw = 0 then r, rw=0 for write. 1 for read directly on the network and obtain the final results. If we split
7: ilc=NW data into shares (e.g., using Shamir's Secret Sharing 123)),
8: L[azj 4— L[a.p ] LI h. rather than encrypting them, we could then use secure Multi-
9: (DHT) ds[hc) 4— C party Computation (MPC) to securely evaluate any function
10: return h.
II: else if c E L[a.p] then
12: (DHT) return dd[hc] In Figure 2, we illustrate how MPC might work with
13: end if blockchains and specifically in our framework. Consider a
14: end if simple example in which a city holds an election and wishes
IS: return 0 to allow online secret voting. It develops a mobile application
16: end procedure for voting which makes use of our system, now augmented
with the proposed MPC capabilities. After the online elections
take place, the city subsequently submits their back-end code
to aggregate the results. The network selects a subset of nodes
are less concerned about malicious services that change the at random and an interpreter transforms the code into a secure
protocol or record previously read data, as they are likely to be MPC protocol. Finally, the results are stored on the public
reputable, but we provide a possible solution for such behavior ledger, where they are safe against tampering. As a result, no
in section V-A. one learns what the individual votes were, but everyone can
Given this model, only the user has control over her data. see the results of the elections.
The decentralized nature of the blockchain combined with
digitally-signed transactions ensure that an adversary cannot procedure EVarE((s)ui •••, ( * )Vn)
nra
pose as the user, or corrupt the network, as that would imply L a. i V,
the adversary forged a digital-signature, or gained control over if s < 0 then
the majority of the network's resources. Similarly, an adversary L[Cleiection] Ul
cannot learn anything from the public ledger, as only hashed else if s > 0 then
pointers are stored in it. L[aci ] 4— u2
end if
An adversary controlling one or more DHT nodes cannot end procedure
learn anything about the raw data, as it is encrypted with keys
that none of the nodes posses. Note that while data integrity is
not ensured in each node, since a single node can tamper with NET Computes:
its local copy or act in a byzantine way, we can still in practice MPC Computes:
if s < 0 then
minimize the risk with sufficient distribution and replication of L[adection] Hp.
broadcast: islp, —> AI PC
the data. else ifs > 0 then
8 4— reconstruct(N)
Finally, generating a new compound identity for each user- L[actection] <— U2 broadcast: s —> NET
service pair guarantees that only a small fraction of the data is end if
compromised in the event of an adversary obtaining both the
signing and encryption keys. If the adversary obtains only one Fig. 2. Example of a flow of secure computation in a blockchain network. The
of the keys, then the data is still safe. Note that in practice top left block (EVote procedure) is the unsecure code, where the arguments
we could further split the identities to limit the exposure of a marked in (•) are private and stored as shares on the DHT. The network
single compromised compound identity. For example, we can selects a subset of nodes at random to compute a secure version of EVote and
broadcasts the results back to the entire netwott, that stores it on the ledger.
generate new keys for every hundred records stored.
EFTA01612901
B. Trust and Decision-Making in Blockchains Finally, we discussed several possible future extensions
for blockchains that could harness them into a well-rounded
Bitcoin, or blockchains in general. assumes all nodes are
solution for trusted computing problems in society.
equally untrusted and that their proportion in the collective
decision-making process is solely based on their computational
REFERENCES
resources (known as the Proof-of-work algorithm) [I7]. In
other words — for every node it, trusty, oc resources(n) 111 Scaling the faccbook data warehouse to 300 pb. 2014.
(probabilistically) decides the node's weight in votes. This 121 James Ball. Nsa's prism surveillance program: how it warts and what
leads to adverse effects, most notably vulnerability to sybil it can do. The Guardian. 2013.
attacks, excessive energy consumption and high-latency. 131 Michael Ben-Or. Shah Goldwasser. and Avi Wigderson. Completeness
theorems for non-cryptographic fault-tolerant distributed computation.
Intuitively, Proof-of-Work reasons that nodes which pour In Proceedings of the nvenfieth annual ACM swaposium on Theory of
significant resources into the system are less likely to cheat. computing. pages 1-10. ACM. 1988.
Using similar reasoning we could define a new dynamic [4] EUROPEAN COMMISSION. Commission proposes a comprehensive
measure of trust that is based on node behavior, such that good refomt of data protection rules to increase users' control of their data
and to cut costs for businesses. 2012.
actors that follow the protocol are rewarded. Specifically, we
could set the trust of each node as the expected value of it 151 Yves-Alexandre de Montjoye. Cesar A Hidalgo. Michel Verleysen. and
Vincent D Blondel. Unique in the crowd: The privacy bounds of human
behaving well in the future. Equivalently, since we are dealing mobility. Scientific reports. 3. 2013.
with a binary random variable, the expected value is simply the 161 Yves-Alexandre de Montjoye. Erez Shmueli. Samuel S Wang. and
probability p. A simple way to approximate this probability is Alex Sandy Pentland. openpds: Protecting the privacy of metackna
by counting the number of good and bad actions a node takes, through safeanswers. PloS one. 9(7):e98790. 2014.
then using the sigmoid function to squash it into a probability. 171 Cynthia Duork. Differential privacy. In Automata, languages and
In practice, every block i we should re-evaluate the trust score programming. pages 1-12. Springer. 2006.
of every node as - 181 Jon Evans. Bitcoin 2.0: Sidechains and ethereum and zerocash. oh my!.
2014.
1
trust;') (3) 191 Craig Gentry. Fully homomorphic encryption using ideal lattices. In
n 1+ e-a(#good-#bad)' STOC. volume 9. pages 169-178. 2009.
1101 Vindu Croel. Facebook tinkers with users' emotions in news feed
where a is simply the step size. experiment. stirring outcry. The New York Times. 2014.
11I] Federal Information and Processing Standards. FIPS PUB 180-4 Secure
With this measure, the network could give more weight to Hash Standard ( SHS 1. (March). 2012.
trusted nodes and compute blocks more efficiently. Since it
1121 Don Johnson. Alfred Menezes. and Scott Vanstonc. The elliptic curve
takes time to earn trust in the system, it should be resistant to digital signature algorithm (ocdsa). International Journal ofInformation
sybil attacks. This mechanism could potentially attract other Security. 1(1):36-63. 2001.
types of attacks, such as nodes increasing their reputation just [13] Michael Lest How much information is there in the world?
to act maliciously at a later time. This might be mitigated by 1141 Ninghui Li. Tiancheng Li. and Suresh Venkatasubranuinian. t-closeness:
randomly selecting several nodes, weighted by their trust, to Privacy beyond k-anonymity and 1-diversity. In ICDE. volume 7. pages
vote on each block, then taking the equally-weighted majority 106-115. 2007.
vote. This should prevent single actors from having too much II 5] Ashwin Machanavajjhala. Daniel Kifer. Johannes Gehrke. and Muthu-
ramakrishnan Venkitasubramaniam. 1-diversity: Privacy beyond k-
influence, regardless of their trust-level. anonymity. ACM Transactions on Knowledge Discovery from Data
(TKDD). 1(1):3. 2007.
VI. CONCLUSION 1161 Petar Maymounkov and David Mazieres. Kademlia: A peer-to-peer
information system based on the xor metric. In Peer-to-Peer Systems.
Personal data, and sensitive data in general, should not be pages 53-65. Springer. 2002.
trusted in the hands of third-parties, where they are suscep- 117] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system.
tible to attacks and misuse. Instead, users should own and Consulted. 1(20121:28. 2008.
control their data without compromising security or limiting 1181 Arvind Narayanan and Vitaly Shmatikov. How to break anonymity of
companies' and authorities' ability to provide personalized the nctflix prize dataset. arXiv preprint es/0610105. 2006.
services. Our platform enables this by combining a blockchain, 1191 Juan Perez. Facebook. google launch data portability programs to all.
re-purposed as an access-control moderator, with an off- 2008.
blockchain storage solution. Users are not required to trust 1201 Rt.com. Obama announces legislation protecting personal data student
any third-party and are always aware of the data that is digital privacy. 2015.
being collected about them and how it is used. In addition, 1211 K Schwab. A Marcus. JO Oyola. W Hoffman. and M Luzi. Personal
the blockchain recognizes the users as the owners of their data: The emergence of a new asset class. In An Initiative of the Ilbrld
Economic Forum. 2011.
personal data. Companies, in turn, can focus on utilizing data
1221 ScienceDaily. Big data, for better or worse: 90% of world's data
without being overly concerned about properly securing and generated over last two years. 2013.
compartmentalizing them. 1231 Adi Shamir. How to share a secret. Communications of the ACM.
Furthermore, with a decentralized platform, making legal 22(11):612-613. 1979.
and regulatory decisions about collecting, storing and sharing 1241 Latanya Sweeney. k-anonymity: A model for protecting privacy.
international Journal of Uncertainty. Fuzziness and Knowledge-Based
sensitive data should be simpler. Moreover, laws and regula- Systems. I0(05):557-570. 2002.
tions could be programmed into the blockchain itself, so that
they are enforced automatically. In other situations, the ledger
can act as legal evidence for accessing (or storing) data, since
it is (computationally) tamper-proof.
EFTA01612902
ℹ️ Document Details
SHA-256
3692a1b9d90831455b488a14a104e0fb1875624b21eca12163f9aa8d463f946b
Bates Number
EFTA01612898
Dataset
DataSet-10
Type
document
Pages
5
💬 Comments 0