The AI Supply Chain Runs on Ignorance

Carlos Amarillo/Shutterstock


Connecting state and local government leaders

Tech companies often fail to tell users how their data will be employed. Sometimes, the firms can’t even anticipate it themselves.

The users posting photos to Ever, a mobile and desktop app similar to Flickr and Photobucket, had a choice. If they opted into facial recognition, the app’s software could analyze photo subjects’ faces, which meant it could group photos, let users search photos by the people in them, suggest tags, and make it easier to find friends and family using the app.

For users, this is tidy and convenient. For Ever, it’s lucrative: NBC News reported last week that Ever licenses its facial-recognition system, trained on user photos, to law-enforcement agencies and the U.S. military.  As more people opt in to facial recognition, the system grows more advanced. Ever did not respond to requests for comment from The Atlantic, but privacy advocates are outraged.

Users are “effectively being conscripted to help build military and law enforcement weapons and surveillance systems,” says Jake Laperruque, the senior counsel at the Project on Government Oversight. Had users been explicitly informed about the military connection, he says, they may have chosen not to enable facial recognition.

Many AI products—from facial recognition to Amazon’s Alexa speakers— follow the same conceit as Ever. Humans generate data as they use the products, while a worldwide network of contract workers in places like India and Romania label and refine the data, making the software smarter and more reliable. Silicon Valley’s model for improving AI rests on obscuring how many humans are involved in that process and keeping them all in the dark about how those data are used down the line.

Experts who study AI’s supply chain, particularly how automation hides human labor, note that each vector of human involvement comes with a way to keep those humans from knowing what’s going on. Long, opaque terms of service agreements conceal to users how their data are used. The contract workers who process those data are also kept out the loop.

Because the raw data furnished by users and refined by workers are so mutable, both parties are kept in the dark about what they’re doing, says Mary Gray, a senior researcher at Microsoft Research and a fellow at Harvard University’s Berkman Klein Center for Internet and Society. The first step to ethical AI, according to Gray, is to expose how obfuscation is built into the supply chain. “Think about it like food,” she says. “When you know the conditions of the people who are growing and picking food, you also know the conditions of the food you’re eating.”

When the people making or refining the data aren’t informed about how it’s being used, they can’t act to stop third parties from employing it for ends they may consider immoral. Last year, for example, Gizmodo reported the existence of Project Maven, a contract between Google and the military to improve the vision systems that drones use. A later investigation by The Intercept found that Google employees nor the contract ghost workers doing basic labeling were aware of what it was used for. After Gizmodo exposed the project, Google workers called for its termination. Although Ever users opted in to face recognition, the users contacted by NBC said they never would've consented if they knew about the military connection.

“For the most part, companies are collecting this data and trying to bundle it up as something they can sell to somebody who might be interested in it,” Gray said. “Which means they don’t know what it’s going to be used for either.”

Companies can mine data to be scraped and used later, with the original user base having no clue what the ultimate purpose down the line is. In fact, companies themselves may not actually know who’ll buy the data later, and for what purpose, so they bake vague permissions into their terms of service. Faced with thousands of words of text, users hit “I Agree,” but neither they nor the company actually know what the risks are. All this makes obtaining informed consent extremely difficult, and terms-of-service agreements patently absurd. Take, for example, the U.K. technology company that included a “community service clause” in its terms of service, binding users to provide janitorial services to the company.

In early 2018, the users and makers of a fitness-tracking app called Strava learned that lesson firsthand when the app revealed the locations of secret military bases in Afghanistan and Somalia. Strava connects to smartphones and Fitbits, not just measuring exercise goals, but also using GPS data to create “heatmaps” of where users run. These heatmaps revealed undisclosed military bases, where 27 million people were using Strava—all of whom, presumably, consented to its terms of service. But what they agreed to was fitness tracking, not international espionage. Everyone involved, including the app’s makers, was stunned to see what the data could be used to do.

Sidney Fussell is a staff writer at The Atlantic, where he covers technology.

NEXT STORY: Risk Assessments Used in Criminal Justice Systems Too Often Clouded By Bias, Report Says