Well, this is certainly not great: An unprotected database of more than a billion users’ records from across the internet — including “social media accounts, email addresses, and phone numbers” — was discovered on an unidentified Elasticsearch server that could be accessed by anyone with the server’s web address.
What’s even weirder is, according to Bloomberg, no one is exactly sure how it got there.
The discovery was made in October by cybersecurity experts Bob Diachenko and Vinny Troia; the 4 terabytes of data they found also included Facebook, Twitter, and LinkedIn profile information. All told, the server contained information on four billion user accounts and 650 million unique email addresses, affecting 1.2 billion people.
As WIRED points out, though, it’s important to keep in mind what the data does not include: things like passwords and credit card numbers. So at least there’s that! Troia also told WIRED that the server is no longer online and that he reported its presence to the FBI.
While it’s unknown how the data got to be on this server, there are a few things Troia was able to uncover. First, it seems like the data came from multiple datasets, some of it from data broker People Data Labs (PDL), which provides “data enrichment.” (TL;DR: It provides data points on internet users so brands can create more specific content with which to target these users.)
Second, the server the information was found on did not belong to PDL. Troia reports that PDL “appears to use Amazon Web Services” for their servers, while the mystery data-laden server was residing — again, unprotected — on Google’s Cloud Services. Neither the server or the data were controlled by Google.
Troia and Sean Thorne, co-founder of People Data Labs (PDL), both indicated to WIRED that the data probably wasn’t obtained via a breach of PDL, but may have been obtained legitimately by a customer who bought the data for data enrichment purposes and left it unprotected.
Said Thorne, “The owner of this server likely used one of our enrichment products, along with a number of other data enrichment or licensing services. Once a customer receives data from us, or any other data providers, the data is on their servers and the security is their responsibility.”
To compare the data he found with what PDL had, Troia created a free account, which includes 1,000 searches per month, and cross-checked dozens of people from the PDL search with the data from the unprotected server. He found a nearly complete match, supporting his theory that PDL was the source of much of the data. Only users’ education information was left out of the found data.
Troia also told WIRED it’s possible that some of the data came from another data broker, Oxydata, which denied that any sort of breach of their data had occurred — which means it, too, could have been obtained completely legitimately.
In one more act of public service, Troia supplied the data to breach clearinghouse HaveIBeenPwned, which allows users to see if their accounts have been compromised.
The scariest thing, as Troia points out, is that if this really is just gross mismanagement of legitimately obtained data, there’s little to be done in terms of holding anyone accountable for the breach.
“Because of obvious privacy concerns, cloud providers will not share any information on their customers, making this a dead end,” Troia writes. “Agencies like the FBI can request this information through legal process (a type of official Government request), but they have no authority to force the identified organization to disclose the breach.”
We’ve reached out to Google for comment, but it’s doubtful they can say anything that’ll make us feel better about this whole thing.