Does collecting big data make us safer?

The idea of Big data, the bane of privacy and civil liberty activists, now conjures up Edward Snowden, the NSA, and mass surveillance. It’s also regularly presented as a critical tool for national security. So does it really keep us safe?

Intelligence organisations argue that collecting big data helps keep us safe by providing the information to thwart terrorists and other nefarious actors. When we talk about big data, we are also talking about its collection and analysis. For organizations like the NSA, charged with the directive to ‘process, analyze, produce, and disseminate signals intelligence information and data,’ it is no surprise that big data represents a holy grail of sorts. But data alone isn’t an intelligence product; the organization, interpretation, and analysis of data is what’s important.

The argument is that big data is worth what it costs. As one former U.S. intelligence official described Gen. Keith Alexander’s approach, ‘rather than look for a single needle in the haystack…let’s collect the whole haystack.’ After all, why would an intel organisation not want all the data it can get?

Although the official line of ‘54 thwarted terrorist plots’ has received its fair share of scrutiny, there is little doubt that the vast stores of data gathered by the NSA has produced at least some actionable intelligence. But the important question however is whether this method produces results which warrant the costs, including to privacy, compared to more targeted investigations.

The primary challenge for any big data operation is resources. The procurement, storage, and processing of such a great volume of data is resource and labor intensive. The NSA’s newest data center in Utah is slated to cost taxpayers $1.4 billion and that’s not including the supercomputers that’ll reside there. Operating costs are also significant. A look at the leaked Black Budget points to $10.8 billion in funding, and upwards of 35,000 employees slated for NSA operations, second only to the CIA in intelligence spending.

This kind of data collection also impacts the resulting analysis. Gathering a database of records as large as the Library of Congress every six hours is no simple task, but sifting through it is much more challenging.

Transforming the ‘haystack’ into intelligence is the trick. Analytics would need to rely on the use of identifiers, key words, and pattern recognition. And, true to the haystack metaphor, most of the data collected is irrelevant. 72.1 percent of email communications are spam and only a miniscule proportion of intercepted communications are mission relevant. Automation helps, but programed searches often lack the refinement and depth needed for detailed investigatory work.

Despite these vast droves of data, events ranging from the Arab Spring to the Boston bombing do fall through the cracks. Even big data gathering won’t always identify and thwart the “lone wolves and small terrorist cells embracing violent rightwing extremist ideology” that the Department of Homeland Security has long deemed to be “the most dangerous domestic terrorism threat in the United States.” And there are ways to bypass the type of wide-net intelligence gathering practices revealed by Edward Snowden. In fact the United States government is in many ways a major provider of such bypasses, from the State Departments support of Tor development to even the United States Postal Service. The sweeping surveillance of conventional communications channels are too broad to consistently identify outlier threats, while more organized or systemic threats can take advantage of the over reliance on this data.

The appeal of big data for intelligence agencies is undeniable. But data lacks meaning until it has been analysed. And getting to the pertinent pieces of information in a big data set is no easy task. While there is clearly value in this approach, there appears to be an obsession with big data in the NSA and other intelligence organizations around the world. An overreliance on this method compared to others would be detrimental for their respective national security functions.

Many have highlighted the challenge of balancing liberties and security, and this is a vital debate. But there has been a surprising lack of attention paid to the effectiveness or otherwise of mass surveillance while defending the approach in the name of counterterrorism. Those answers are of course hidden behind layers of security clearance. But to get a sense of whether big data warrants its cost to privacy, we need to develop a sense of whether it’s really more effective than more targeted and warranted investigations.

Klee Aiken will be joining ASPI as an analyst in the International Cyber Policy Centre.