TBD
A key aspect of consumer privacy protection is transparency. Yet despite a wealth of privacy policies and other texts that are pertinent to organizations' privacy practices, analyzing those documents has been an obstacle to understanding digital privacy at scale. Efforts to leverage information written in privacy policies, terms of service agreements, cookie notices, privacy laws, and other privacy-related documents suffer from a lack of existing resources with sufficient breadth and depth to cover the privacy landscape.
In response, this project is creating an infrastructure to serve as a repository for the large-scale collection of privacy-related documents online. A core focus of the project is the dissemination of resources for privacy researchers, practitioners, and policymakers, including search engines, corpora, pre-trained language models, APIs, and analysis tools and results. The collection of privacy-related documents online leads to the following benefits: 1) Enables surveying the privacy landscape with previously untenable coverage and accuracy, 2) Supports legal and public policy analyses, and 3) Enables researchers to build technologies that bridge the gap between internet users' privacy expectations and the contents of the documents that influence or describe organizations' privacy practices.
The research team is building a large-scale, longitudinal, annotated, and searchable resource of privacy-related documents: privacy policies, terms of service agreements, cookie notices, privacy laws in the U.S. and around the world, regulatory guidelines, and other related texts on the internet. The team is advancing natural language processing (NLP) techniques for large-scale interpretation of privacy-related documents, as well as analyzing the state of privacy at an unprecedented scale and removing barriers for creating tools that use large amounts of data about privacy practices and regulations to provide insights and recommendations. Research topics being addressed include advancing natural language processing of legal text, identifying privacy norms and outliers for sectors of commerce, and finding ways to effectively communicate changes in privacy-related documents to consumers, researchers, and policymakers. This research is helping to realize long-standing goals to make privacy manageable for consumers, regulators, and others invested in the world's evolving information society.