A crucial question of Fourth Amendment law has recently divided courts: When government agents conduct a digital scan through a massive database, how much of a “search” occurs? The issue pops up in contexts ranging from geofence warrants and reverse keyword searches to the installation of Internet pen registers. When a government agent runs a filter through a massive database, resulting in a list of hits, is the scale of the search determined by the size of the database, the filter setting, or the filter output? Fourth Amendment law is closely attuned to the scale of a search. No search means no Fourth Amendment oversight, small searches ordinarily require warrants, and limitless searches are categorically unconstitutional. But how broad is a data scan?
This essay argues that that Fourth Amendment implications of data scans should be measured primarily by filter settings. Whether a search occurs, and how far it extends, should be based on what information is exposed to human observation. This standard demands a contextual analysis of what the output reveals about the dataset based on the filter setting. Data that passes through a filter is searched or not searched depending on whether the filter is set to expose that specific information. The proper question is what information is expressly or implicitly exposed, not what raw data passes through the filter or the raw data output. The implications of this approach are then evaluated for a range of important applications, among them geofence warrants, reverse keyword searches, and Internet pen registers.
The idea for this article started with my blog posts here reacting to the Fifth Circuit’s geofence warrant ruling in United States v. Smith, but I think the issue is one that applies more broadly. Indeed, the more that lower courts construe the Fourth Amendment broadly on what data is protected, the more Fourth Amendment protection depends on how you answer the scanning question.
This is a first draft, and comments are very welcome. I especially welcome comments on the technology discussions (mostly in Section I), including about whether I get the basics correct, whether the examples and analogies work, and whether the terminology is on or off. Thanks.