Faculty Scholarship

The Great Scrape: The Clash Between Scraping and Privacy

Author granted license

Document Type

Article

Publication Date

2025

ISSN

0008-1221

Publisher

University of California Berkeley School of Law

Language

en-US

Abstract

Artificial intelligence (AI) systems depend on massive quantities of data, often gathered by “scraping”—the automated extraction of large amounts of data from the internet. A great deal of scraped data contains people’s personal information. This personal data provides the grist for AI tools such as facial recognition, deep fakes, and generative AI. Although scraping enables web searching, archiving of records, and meaningful scientific research, scraping for AI can also be objectionable and even harmful to individuals and society.

Organizations are scraping at an escalating pace and scale, even though many privacy laws are seemingly incongruous with the practice. In this Article, we contend that scraping must undergo a serious reckoning with privacy law. Scraping violates nearly all of the key principles of privacy laws, including fairness, individual rights and control, transparency, consent, purpose specification and secondary use restrictions, data minimization, onward transfer, and data security. Scraping ignores the data protection laws built around these requirements.

Scraping has evaded a reckoning with privacy law largely because scrapers act as if all publicly available data were free for the taking. But the public availability of scraped data shouldn’t give scrapers a free pass. Privacy law regularly protects publicly available data, and privacy principles are implicated even when personal data is accessible to others.

Comments

Issue release forthcoming
Updated with published version of article on 10/14/2025
Draft available as additional file download

Recommended Citation

Daniel J. Solove & Woodrow Hartzog, The Great Scrape: The Clash Between Scraping and Privacy , 113 California Law Review 1521 (2025).
Available at: https://scholarship.law.bu.edu/faculty_scholarship/3917

The Great Scrape.pdf (1057 kB)

Download

Find on SSRN Link to Publisher Site

Included in

Computer Sciences Commons, Internet Law Commons, Privacy Law Commons, Science and Technology Law Commons

COinS

Faculty Scholarship

The Great Scrape: The Clash Between Scraping and Privacy

Author granted license

Document Type

Publication Date

ISSN

Publisher

Language

Abstract

Comments

Recommended Citation

Included in

Browse

Search

Author Corner

Faculty Scholarship

The Great Scrape: The Clash Between Scraping and Privacy

Authors

Author granted license

Document Type

Publication Date

ISSN

Publisher

Language

Abstract

Comments

Recommended Citation

Included in

Share

Browse

Search

Author Corner