EFTA01612898.pdf

DataSet-10 5 pages 4,295 words document
👁 1 💬 0
📄 Extracted Text (4,295 words)
Decentralizing Privacy: Using Blockchain to Protect Personal Data Guy Zyskind Oz Nathan Alex 'Sandy' Pentland MIT Media Lab Tel-Aviv University MIT Media Lab Cambridge, Massachusetts Tel-Aviv, Israel Cambridge, Massachusetts Email: [email protected] Email: [email protected] Email: [email protected] Abstract—The recent Increase in reported incidents of surveil- autonomous deployment of a PDS which includes a mecha- lance and security breaches compromising users' privacy call into nism for returning computations on the data, thus returning question the current model, in which third-panies collect and con- answers instead of the raw data itself [61. Across the industry, trol massive amounts of personal data. Bitcoin has demonstrated leading companies chose to implement their own proprietary in the financial space that trusted, auditable computing is possible authentication software based on the OAuth protocol [191, in using a decentralized network of peers accompanied by a public which they serve as centralized trusted authorities. ledger. In this paper. we describe a decentralized personal data management system that ensures users own and control their From a security perspective, researchers developed various data. We implement a protocol that turns a blockchain into an techniques targeting privacy concerns focused on personal automated access-control manager that does not require trust in data. Data anonymization methods attempt to protect person- a third party. Unlike Bitcoin, transactions in our system are not strictly financial — they are used to carry instructions, such as ally identifiable information. k-anonymity, a common property storing, querying and sharing data. Finally, we discuss possible of anonymized datasets requires that sensitive information of future extensions to blockchains that could harness them into a each record is indistinguishable from at least k —1 other records well-rounded solution for trusted computing problems in society. [241. Related extensions to k-anonymity include I-diversity, which ensures the sensitive data is represented by a diverse Keywords—blockchain; privacy; bitcoin; personal data enough set of possible values [151; and t-closeness, which looks at the distribution of sensitive data [14]. Recent research I. INTRODUCTION has demonstrated how anonymized datasets employing these The amount of data in our world is rapidly increasing. techniques can be de-anonymized [18], [5], given even a small According to a recent report [22], it is estimated that 20% of amount of data points or high dimensionality data. Other the world's data has been collected in the past couple of years. privacy-preserving methods include differential privacy, a tech- Facebook, the largest online social-network, collected 300 nique that perturbs data or adds noise to the computational petabytes of personal data since its inception [I] — a hundred process prior to sharing the data [7], and encryption schemes times the amount the Library of Congress has collected in over that allow running computations and queries over encrypted 200 years [131. In the Big Data era, data is constantly being data. Specifically, fully homomorphic encryption (FHE) [9] collected and analyzed, leading to innovation and economic schemes allow any computation to run over encrypted data, growth. Companies and organizations use the data they col- but are currently too inefficient to be widely used in practice. lect to personalize services, optimize the corporate decision- In recent years, a new class of accountable systems making process, predict future trends and more. Today, data is emerged. The first such system was Bitcoin, which allows a valuable asset in our economy [211. users to transfer currency (bitcoins) securely without a cen- While we all reap the benefits of a data-driven society, there tralized regulator, using a publicly verifiable open ledger (or is a growing public concern about user privacy. Centralized blockchain). Since then, other projects (collectively referred organizations — both public and private, amass large quantities to as Bitcoin 2.0 [81) demonstrated how these blockchains of personal and sensitive information. Individuals have little or can serve other functions requiring trusted computing and no control over the data that is stored about them and how it auditability. is used. In recent years, public media has repeatedly covered Our Contribution. I) We combine blockchain and off- controversial incidents related to privacy. Among the better known examples is the story about government surveillance blockchain storage to construct a personal data management [21, and Facebook's large-scale scientific experiment that was platform focused on privacy. 2) We illustrate through our platform and a discussion of future improvements to the apparently conducted without explicitly informing participants [101. technology, how blockchains could become a vital resource in trusted-computing. Related Work. There have been various attempts to ad- dress these privacy issues, both from a legislative perspective Organization. Section II discusses the privacy problem we ([4], [201), as well as from a technological standpoint. Open- solve in this paper, section III provides an overview of the PDS, a recently developed framework, presents a model for platform, whereas section IV describes in detail the techni- cal implementation; section V discusses future extensions to The first two authors contributed equally to this work. blockchains, and concluding remarks are found in section VI. EFTA01612898 II. THE PRIVACY PROBLEM it to an off-blockchain key-value store, while retaining only a pointer to the data on the public ledger (the pointer is the Throughout this paper, we address the privacy concerns SHA-256 hash of the data). users face when using third-party services. We focus specifi- cally on mobile platforms, where services deploy applications Both the service and the user can now query the data using for users to install. These applications constantly collect high- a Tdatc, transaction with the pointer (key) associated to it. The resolution personal data of which the user has no specific blockchain then verifies that the digital signature belongs to knowledge or control. In our analysis, we assume that the either the user or the service. For the service, its permissions services are honest-but-curious (i.e., they follow the protocol). to access the data are checked as well. Finally, the user can Note that the same system could be used for other data- change the permissions granted to a service at any time by privacy concerns, such as patients sharing their medical data issuing a Tacc,„ transaction with a new set of permissions, in- for scientific research, while having the means to monitor how cluding revoking access to previously stored data. Developing it is used and the ability to instantly opt-out. In light of this, our a web-based (or mobile) dashboard that allows an overview system protects against the following common privacy issues: of one's data and the ability to change permissions is fairly trivial and is similar to developing centralized-wallets, such as Data Ownership. Our framework focuses on ensuring that Coinbase for Bitcoinl. users own and control their personal data. As such, the system recognizes the users as the owners of the data and the services The off-blockchain key-value store is an implementation of as guests with delegated permissions. Kademilia 116I, a distributed hashtable (or DHT), with added persistence using LevelDB2 and an interface to the blockchain. Data Transparency and Auditability. Each user has The DHT is maintained by a network of nodes (possibly complete transparency over what data is being collected about disjoint from the blockchain network), who fulfill approved her and how they are accessed. read/write transactions. Data are sufficiently randomized across Fine-grained Access Control. One major concern with the nodes and replicated to ensure high availability. It is mobile applications is that users are required to grant a set instructive to note that alternative off-blockchain solutions of permissions upon sign-up. These permissions are granted could be considered for storage. For example, a centralized indefinitely and the only way to alter the agreement is by cloud might be used to store the data. While this requires some opting-out. Instead, in our framework, at any given time the amount of trust in a third-party, it has some advantages in terms user may alter the set of permissions and revoke access to of scalability and ease of deployment. previously collected data. One application of this mechanism would be to improve the existing permissions dialog in mobile applications. While the user-interface is likely to remain the same, the access-control policies would be securely stored on a blockchain, where only the user is allowed to change them. Ill. PROPOSED SOLUTION We begin with an overview of our system. As illustrated in Figure I, the three entities comprising our system are mobile phone users, interested in downloading and using applications; services, the providers of such applications who require processing personal data for operational and business- related reasons (e.g., targeted ads, personalized service); and nodes, entities entrusted with maintaining the blockchain and a distributed private key-value data store in return for incentives. Note that while users in the system normally remain (pseudo) anonymous, we could store service profiles on the blockchain and verify their identity. DHT The system itself is designed as follows. The blockchain accepts two new types of transactions: Taccen, used for access Fig. I. Overview of the decentralized platform. control management; and T elma, for data storage and retrieval. These network operations could be easily integrated into a IV. THE NETWORK PROTOCOL mobile software development kit (SDK) that services can use in their development process. We now describe in detail the underlying protocol used in the system. We utilize standard cryptographic building To illustrate, consider the following example: a user installs blocks in our platform: a symmetric encryption scheme defined an application that uses our platform for preserving her privacy. by the 3-tuple Venc) — the generator, encryption As the user signs up for the first time, a new shared (user, and decryption algorithms respectively; a digital signature service) identity is generated and sent, along with the asso- ciated permissions, to the blockchain in a Taccess transaction. scheme (DSS) described by the 3-tuple (g„„9,$,,,,v„,) - the generator, signature and verification algorithms respectively, Data collected on the phone (e.g., sensor data such as location) is encrypted using a shared encryption key and sent to the 'Coinbase bitcoin wallet. http://www.combase.com blockchain in a Zhua transaction, which subsequently routes 2LcvelDB. http://githulxcom/gongleileveldb EFTA01612899 implemented using ECDSA with secp256k1 curve [121; and the most recent transaction is returned, which allows update a cryptographic hash function 71, instantiated by a SHA-256 and delete operations in addition to inserts. [111 implementation. 3) Policy: A set of permissions a user u grants service s, denoted by POLICY„. For example, if u installs a mobile A. Building Blocks application requiring access to the user's location and contacts, We now briefly introduce relevant building blocks that are then POLICY,,,, = {location, contads}. It is instructive to used throughout the rest of this paper. We assume familiarity note that any type of data could be stored safely this way, with Bitcoin [171 and blockchains. assuming the service will not subvert the protocol and label the data incorrectly. Safeguards to partially prevent this could I) Identities: Blockchains utilize a pseudo-identity mech- be introduced to the mobile SDK, but in any case, the user anism. Essentially a public-key, every user can generate as could easily detect a service that cheats, as all changes are many such pseudo-identities as she desires in order to increase visible to her. privacy. We now introduce compound identities, an extension of this model used in our system. A compound identity 4) Auxiliary Functions: Parse(x) de-seralizes the mes- is a shared identity for two or more parties, where some sage sent to a transaction, which contains the arguments; parties (at least one) own the identity (owners), and the rest CheckPolicy(phki.,xp), illustrated in Protocol 2, verifies that have restricted access to it (guests). Protocol I illustrates the the originator has the appropriate permissions. implementation for a single owner (the user) and a single guest (the service). As illustrated, the identity is comprised of signing Protocol 2 Permissions check against the blockchain key-pairs for the owner and guest, as well as a symmetric I: procedure CHEcicPoLicr(pkskig,xp) key used to encrypt (and decrypt) the data, so that the data 2: s (— 0 is protected from all other players in the system. Formally, 3: ., policy = skig) a compound identity is externally (as seen by the network) 4: if L[a otici,] 0 0 then observed by the 2-tuple: pe a"9" POLICY„, Parse(L[apaicy]) Compound2): 61k) = (pka;;, (I) 6: if pkskip = SI; or 7: (pkakig = phs ,;; and xp E POLICY.,,,) then Similarly, the entire identity (including the private keys) is 8: ‹— I the following 5-tuple: 9: end if 10: end if Compound„,, = (pk;;;;, sk;;;,pkasi;, skeig sk:`,11) (2) II: return s 12: end procedure Protocol 1 Generating a compound identity procedure COMPOUNDIDENTITY0t, B. Blockchain Protocols 2: u and s form a secure channel 3: u executes: Here we provide a detailed description of the core protocols 4: {pk,";;,skauip 4— guy() executed on the blockchain. Protocol 3 is executed by nodes 5: skeuif, ge„ c() in the network when a Tacces, transaction is received, and 6: it shares sku,t,pkay s: with s similarly, Protocol 4 is executed for 'Maga transactions. 7: s executes: As mentioned earlier in the paper, Zsc„„ transactions 8: (piesi;,sklu) 1- gsigo allow users to change the set of permissions granted to a 9: s shares plcssi; with s service, by sending a POLICY.,,, set. Sending the empty set 10: II Both it and s have skeuifc,pk."(9,pk1; revokes all access-rights previously granted. Sending a Tata., II: return pk:g,pka4,84;isc transaction with a new compound identity for the first time is 12: end procedure interpreted as a user signing up to a service. Similarly, Tdata transactions govern read/write operations. 2) Blockchain Memory: We let L be the blockchain mem- With the help of CheckPolicy, only the user (always) or the ory space, represented as the hastable L : {0,1}256 _+ service (if allowed) can access the data. Note that in lines 9 and {O, I}N, where N >> 256 and can store sufficiently- 16 of Protocol 4 we used shorthand notation for accessing the large documents. We assume this memory to be tamper- DHT like a normal hashtable. In practice, these instructions proof under the same adversarial model used in Bitcoin and result in an off-blockchain network message (either read or other blockchains. To intuitively explain why such a trusted write) that is sent to the DHT. data-store can be implemented on any blockchain (including Bitcoin), consider the following simplified, albeit inefficient, C. Privacy and Security Analysis implementation: A blockchain is a sequence of timestamped transactions, where each transaction includes a variable num- We rely on the blockchain being tamper-free, an assump- ber of output addresses (each address is a 160-bit number). L tion that requires a sufficiently large network of untrusted could then be implemented as follows — the first two outputs peers. In addition, we assume that the user manages her keys in a transaction encode the 256-bit memory address pointer, in a secure manner, for example using a secure-centralized as well as some auxiliary meta-data. The rest of the outputs wallet service. We now show how our system protects against construct the serialized document. When looking up L[kl, only adversaries compromising nodes in the system. Currently, we EFTA01612900 Protocol 3 Access Control Protocol V. DISCUSSION OF FUTURE EXTENSIONS L: procedure HANDLEAccEssTX(pk.kig, m) In this section, we slightly digress to present possible 2: s 4— 0 future extensions to blockchains. These could play a significant 3: pk:g,pk:;:,POLICY,, = Parse(m) role in shaping more mature distributed trusted computing 4: if pkikis, = pk:i; then platforms, compared to current state-of-the-art systems. More 5: L[7-i(pkts,)] = m specifically. they would greatly increase the usefulness of the 6: s4-1 platform presented earlier. 7: end if 8: return A. From Storage to Processing 9: end procedure One of the major contributions of this paper is demonstrat- ing how to overcome the public nature of the blockchain. So Protocol 4 Storing or Loading Data far, our analysis focused on storing pointers to encrypted data. I: procedure HANDLEDATATX(pkskig,m) While this approach is suitable for storage and random queries, 2: C, Xp,rw = Parse(m) it is not very efficient for processing data. More importantly, 3: if Check Policy(phskig, xp) = True then once a service queries a piece of raw data, it could store it for 4: plz. ,pia, POLICY., 4- future analysis. Parse(LIN(ple A better approach might be to never let a service observe 5: Or, = it(Pk:;; II xp) the raw data, but instead, to allow it to run computations 6: if rw = 0 then r, rw=0 for write. 1 for read directly on the network and obtain the final results. If we split 7: ilc=NW data into shares (e.g., using Shamir's Secret Sharing 123)), 8: L[azj 4— L[a.p ] LI h. rather than encrypting them, we could then use secure Multi- 9: (DHT) ds[hc) 4— C party Computation (MPC) to securely evaluate any function 10: return h. II: else if c E L[a.p] then 12: (DHT) return dd[hc] In Figure 2, we illustrate how MPC might work with 13: end if blockchains and specifically in our framework. Consider a 14: end if simple example in which a city holds an election and wishes IS: return 0 to allow online secret voting. It develops a mobile application 16: end procedure for voting which makes use of our system, now augmented with the proposed MPC capabilities. After the online elections take place, the city subsequently submits their back-end code to aggregate the results. The network selects a subset of nodes are less concerned about malicious services that change the at random and an interpreter transforms the code into a secure protocol or record previously read data, as they are likely to be MPC protocol. Finally, the results are stored on the public reputable, but we provide a possible solution for such behavior ledger, where they are safe against tampering. As a result, no in section V-A. one learns what the individual votes were, but everyone can Given this model, only the user has control over her data. see the results of the elections. The decentralized nature of the blockchain combined with digitally-signed transactions ensure that an adversary cannot procedure EVarE((s)ui •••, ( * )Vn) nra pose as the user, or corrupt the network, as that would imply L a. i V, the adversary forged a digital-signature, or gained control over if s < 0 then the majority of the network's resources. Similarly, an adversary L[Cleiection] Ul cannot learn anything from the public ledger, as only hashed else if s > 0 then pointers are stored in it. L[aci ] 4— u2 end if An adversary controlling one or more DHT nodes cannot end procedure learn anything about the raw data, as it is encrypted with keys that none of the nodes posses. Note that while data integrity is not ensured in each node, since a single node can tamper with NET Computes: its local copy or act in a byzantine way, we can still in practice MPC Computes: if s < 0 then minimize the risk with sufficient distribution and replication of L[adection] Hp. broadcast: islp, —> AI PC the data. else ifs > 0 then 8 4— reconstruct(N) Finally, generating a new compound identity for each user- L[actection] <— U2 broadcast: s —> NET service pair guarantees that only a small fraction of the data is end if compromised in the event of an adversary obtaining both the signing and encryption keys. If the adversary obtains only one Fig. 2. Example of a flow of secure computation in a blockchain network. The of the keys, then the data is still safe. Note that in practice top left block (EVote procedure) is the unsecure code, where the arguments we could further split the identities to limit the exposure of a marked in (•) are private and stored as shares on the DHT. The network single compromised compound identity. For example, we can selects a subset of nodes at random to compute a secure version of EVote and broadcasts the results back to the entire netwott, that stores it on the ledger. generate new keys for every hundred records stored. EFTA01612901 B. Trust and Decision-Making in Blockchains Finally, we discussed several possible future extensions for blockchains that could harness them into a well-rounded Bitcoin, or blockchains in general. assumes all nodes are solution for trusted computing problems in society. equally untrusted and that their proportion in the collective decision-making process is solely based on their computational REFERENCES resources (known as the Proof-of-work algorithm) [I7]. In other words — for every node it, trusty, oc resources(n) 111 Scaling the faccbook data warehouse to 300 pb. 2014. (probabilistically) decides the node's weight in votes. This 121 James Ball. Nsa's prism surveillance program: how it warts and what leads to adverse effects, most notably vulnerability to sybil it can do. The Guardian. 2013. attacks, excessive energy consumption and high-latency. 131 Michael Ben-Or. Shah Goldwasser. and Avi Wigderson. Completeness theorems for non-cryptographic fault-tolerant distributed computation. Intuitively, Proof-of-Work reasons that nodes which pour In Proceedings of the nvenfieth annual ACM swaposium on Theory of significant resources into the system are less likely to cheat. computing. pages 1-10. ACM. 1988. Using similar reasoning we could define a new dynamic [4] EUROPEAN COMMISSION. Commission proposes a comprehensive measure of trust that is based on node behavior, such that good refomt of data protection rules to increase users' control of their data and to cut costs for businesses. 2012. actors that follow the protocol are rewarded. Specifically, we could set the trust of each node as the expected value of it 151 Yves-Alexandre de Montjoye. Cesar A Hidalgo. Michel Verleysen. and Vincent D Blondel. Unique in the crowd: The privacy bounds of human behaving well in the future. Equivalently, since we are dealing mobility. Scientific reports. 3. 2013. with a binary random variable, the expected value is simply the 161 Yves-Alexandre de Montjoye. Erez Shmueli. Samuel S Wang. and probability p. A simple way to approximate this probability is Alex Sandy Pentland. openpds: Protecting the privacy of metackna by counting the number of good and bad actions a node takes, through safeanswers. PloS one. 9(7):e98790. 2014. then using the sigmoid function to squash it into a probability. 171 Cynthia Duork. Differential privacy. In Automata, languages and In practice, every block i we should re-evaluate the trust score programming. pages 1-12. Springer. 2006. of every node as - 181 Jon Evans. Bitcoin 2.0: Sidechains and ethereum and zerocash. oh my!. 2014. 1 trust;') (3) 191 Craig Gentry. Fully homomorphic encryption using ideal lattices. In n 1+ e-a(#good-#bad)' STOC. volume 9. pages 169-178. 2009. 1101 Vindu Croel. Facebook tinkers with users' emotions in news feed where a is simply the step size. experiment. stirring outcry. The New York Times. 2014. 11I] Federal Information and Processing Standards. FIPS PUB 180-4 Secure With this measure, the network could give more weight to Hash Standard ( SHS 1. (March). 2012. trusted nodes and compute blocks more efficiently. Since it 1121 Don Johnson. Alfred Menezes. and Scott Vanstonc. The elliptic curve takes time to earn trust in the system, it should be resistant to digital signature algorithm (ocdsa). International Journal ofInformation sybil attacks. This mechanism could potentially attract other Security. 1(1):36-63. 2001. types of attacks, such as nodes increasing their reputation just [13] Michael Lest How much information is there in the world? to act maliciously at a later time. This might be mitigated by 1141 Ninghui Li. Tiancheng Li. and Suresh Venkatasubranuinian. t-closeness: randomly selecting several nodes, weighted by their trust, to Privacy beyond k-anonymity and 1-diversity. In ICDE. volume 7. pages vote on each block, then taking the equally-weighted majority 106-115. 2007. vote. This should prevent single actors from having too much II 5] Ashwin Machanavajjhala. Daniel Kifer. Johannes Gehrke. and Muthu- ramakrishnan Venkitasubramaniam. 1-diversity: Privacy beyond k- influence, regardless of their trust-level. anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD). 1(1):3. 2007. VI. CONCLUSION 1161 Petar Maymounkov and David Mazieres. Kademlia: A peer-to-peer information system based on the xor metric. In Peer-to-Peer Systems. Personal data, and sensitive data in general, should not be pages 53-65. Springer. 2002. trusted in the hands of third-parties, where they are suscep- 117] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. tible to attacks and misuse. Instead, users should own and Consulted. 1(20121:28. 2008. control their data without compromising security or limiting 1181 Arvind Narayanan and Vitaly Shmatikov. How to break anonymity of companies' and authorities' ability to provide personalized the nctflix prize dataset. arXiv preprint es/0610105. 2006. services. Our platform enables this by combining a blockchain, 1191 Juan Perez. Facebook. google launch data portability programs to all. re-purposed as an access-control moderator, with an off- 2008. blockchain storage solution. Users are not required to trust 1201 Rt.com. Obama announces legislation protecting personal data student any third-party and are always aware of the data that is digital privacy. 2015. being collected about them and how it is used. In addition, 1211 K Schwab. A Marcus. JO Oyola. W Hoffman. and M Luzi. Personal the blockchain recognizes the users as the owners of their data: The emergence of a new asset class. In An Initiative of the Ilbrld Economic Forum. 2011. personal data. Companies, in turn, can focus on utilizing data 1221 ScienceDaily. Big data, for better or worse: 90% of world's data without being overly concerned about properly securing and generated over last two years. 2013. compartmentalizing them. 1231 Adi Shamir. How to share a secret. Communications of the ACM. Furthermore, with a decentralized platform, making legal 22(11):612-613. 1979. and regulatory decisions about collecting, storing and sharing 1241 Latanya Sweeney. k-anonymity: A model for protecting privacy. international Journal of Uncertainty. Fuzziness and Knowledge-Based sensitive data should be simpler. Moreover, laws and regula- Systems. I0(05):557-570. 2002. tions could be programmed into the blockchain itself, so that they are enforced automatically. In other situations, the ledger can act as legal evidence for accessing (or storing) data, since it is (computationally) tamper-proof. EFTA01612902
ℹ️ Document Details
SHA-256
3692a1b9d90831455b488a14a104e0fb1875624b21eca12163f9aa8d463f946b
Bates Number
EFTA01612898
Dataset
DataSet-10
Type
document
Pages
5

Community Rating

Sign in to rate this document

📋 What Is This?

Loading…
Sign in to add a description

💬 Comments 0

Sign in to join the discussion
Loading comments…
Link copied!