I have not written a new entry in this blog in some time as I have been very busy at work and my spare cycles have been consumed with watching and tweeting (@tglocer) about the slow car crash which is the Trump presidency.  At times it feels like The Donald is sucking all the oxygen out of the public square – which is undoubtedly one of the objectives he seeks with his barrage of daily outrages. Nonetheless, I have been thinking a lot lately about data – who owns them, the economic power of the large platforms that de facto possess them, and what we should do about it, if anything.

 

As we approach the 25th anniversary of the World Wide Web, it should require little proof that vast repositories of our personal data are held by large corporations such as Facebook, Netflix and Equifax, as well as by national governments, primarily the US and China.  These vast databases were built-up with little fanfare by organizations that tracked and stored our every choice, like, retweet and movement online. Each individual data point is generally harmless; however, taken as a whole and analyzed by computer algorithms (especially those deploying machine learning techniques) they can become highly compromising of privacy.  For example, the license plate on my car is visible for all to see and all are free to jot down my plate number if I am involved in an accident; however, commercial data vendors will also sell you access to their databases of billions of license plate numbers, time and geo-stamped for analysis.  With a little simple co-variance analysis on this seemingly innocuous dataset, I can discover whether your car happens to be parked at 9pm in a motel parking lot next to the car of a married co-worker more often than random chance would suggest.  Getting worried?

 

The European Union has historically championed the privacy rights of individuals over the desires of private companies to store information.  In May 2014 the European Court of Justice formalized a “right to be forgotten” in a case brought against Google to suppress search results. (see Google Spain SL, Google Inc. v Agencia Española de Protección de Datos, Mario Costeja González (2014)).  This right along with a much broader set of rules governing consent to data collection, privacy and data protection has now been codified into EU law via the General Data Protection Regulation (GDPR).  The short-term effect of the new regulation has been an avalanche of browser pop-ups seeking user consent, but enactment of the GDPR will likely mark the end of an era of unconstrained database-building by the Facebooks and Amazons of this world.

 

While the State of California has followed the EU lead (see California Consumer Privacy Act of 2018), in general, the US has been slow to regulate the balance of power between corporate interests building deep and broad databases and the rights of individuals to use online services but limit data collection about themselves.  There has, however, been an increasing call to use novel interpretations of competition law to seek the break-up of large platforms that are viewed as having monopoly-scale power. See, e.g.,  The Antitrust Case Against Facebook, Google and Amazon.  While such challenges go well beyond data privacy concerns, I believe that the asymmetrical power of these platforms to collect and exploit our data may present the greatest threat to competition.  The data that Facebook collects about my likes and dislikes or that Amazon amasses about my purchase habits already provide a strong competitive advantage.  However, as these and other companies introduce ever more powerful machine learning algorithms, massive user data becomes an insurmountable barrier.  Just ask yourself whose autonomous vehicle you would feel safer to drive in: One trained on the driving experience of every user of Google Maps and Waze or one trained on data supplied by every driver of an Alfa Romeo in the US?

 

I would like to suggest a third way to address the imbalance of power between the data platform goliaths and all of us Davids – a technology solution, but one that will likely require a bit of collective action, social or political, to develop.  What if we could flip the data model and provide each citizen with a secure, encrypted digital “box” that would hold all of our browsing and search history, all of our driving and geolocation data, all of our health records and Fitbit data, and all of our media consumption and photo libraries, etc.?  Each of us would then be free to decide whether we wished to license our search history to Google for a fixed period in exchange for a small payment or separately license our media watching preferences to Netflix.

 

This distributed data model is now technologically feasible.  There are several initiatives underway, including the Solid (Social Linked Data) project at MIT led by Tim Berners-Lee, the scientist credited with developing the World Wide Web in 1989.  I see greater promise in the early efforts to build a more decentralized solution based on the inherently distributed trust model of the blockchain.  So-called dApps, or distributed applications, could provide one such model, especially if combined with individual digital wallets that could collect the micropayments associated with the licensing of our personal data.  A dApp is similar to the existing internet applications with which we are familiar, but rather than taking a hub-and-spoke format to data and processing, dApps take the form of a mesh of peer-to-peer autonomous endpoints.

 

In addition to restoring balance to our relationships with the corporate exploiters of our data, such a distributed model would enhance our security from cyberattack by eliminating inviting centralized targets and allow us more granular control over who gets to use our data, for what and for how long.  It would also finally provide a robust technical framework for implementing the long-sought European ideal of the law of forgetting.  Individual data licenses could be set for limited terms with all access removed after expiry.

 

It is not too late to reclaim our right not only to be forgotten but also to profit from the data that we rightfully own.