Essays

AI and Theft: A Thought Experiment

What's in the box?

Aaron Ross Powell

Jul 1, 2024 • 7 min read

You’re standing in front of a large box, maybe eight feet on a side. It’s unmarked except for two openings, each about the size of the return slot at the library. One is labeled “INPUT” and the other “OUTPUT.”

You’re told that if you write a question on a piece of paper and put it in the first slot, an answer to your question will come out the second. The answer is relatively accurate, but you’ll probably want to check the details before relying on it. Or, if you take a book you own, or a DVD, and feed it in, you’ll get back a summary of its contents and be able to ask questions further questions specifically about them.

“How does this work?” you say to the attendant who’s talked you through the process.

“We call it a research assistant service,” he tells you. “Though it’s capable of much more than your typical research assistant. It has access to everything publicly available on the Internet, and has the ability to understand it, no matter the topic. The assistant has spent a lot of time and energy reading through all of that, and now knows a lot. Like, a lot. So when you ask a question, the assistant draws on that huge body of knowledge to give a reasonably good answer. Or, when you ask for a summary, the assistant quickly looks at the book or DVD and, also drawing on that broader knowledge, tells you what it’s about.”

You try it out and the results are impressive. The assistant doesn’t just answer questions, but can do fun things too: write poems for you, or help you outline an essay, or pretend to be a character you can talk with. It’s neat.

“How do you fit all those articles, and videos, and books inside of that box? It’s not very big.”

“We don’t. The assistant doesn’t have copies of everything inside there. Rather, the assistant has read a lot and learned from it. Just like if you read a book, the exact content of that book isn’t inside your head, but you can answer questions about it–and about a lot of other things, too.”

From behind you, someone says, “You shouldn’t use this service.” You turn and see another man, who’s walked up when you were playing with the box.

“Why not?” you say. “It seems pretty helpful.”

“None of the people who posted articles online, or sent videos to YouTube, expressly gave permission to have their articles and videos used as resources for a research service,” he says.

“But does the service need permission?” you ask. “In school, I wrote a report about reptiles and read the encyclopedia and some books from the library, and learned a lot that I put into the report. But the encyclopedia, and the authors of those books, didn’t have a statement saying, ‘You can use what you learn in here to write a report about reptiles.’ And they didn’t say anything like, ‘You can read this book, but you can’t write anything about what you’ve found inside.’”

“No, they didn’t give express permission to write a report based on them. But you had to cite the books you used.”

“Well, kind of. I mean, if I were writing an article now for a law review, and I had to follow the rather extreme citation, and frankly kind of irrational, rules of the Bluebook, then I guess I’d need to cite every last thing I write that I learned from those books. But most of the time, we don’t write that way. We learn lots of things from lots of sources, and often forget where we learned them in the first place. If I directly quote, sure, I need to put that in quotation marks and say where I got it, but using information you learned isn’t plagiarism or stealing. I bet most of the articles you write draw on stuff you learned from things you’ve read, but you don’t directly cite everything you’ve read and learned from. Or, if you write fiction, you were clearly influenced by a lot of authors, but you don’t give all of them credit for each idea that inspired you.”

“Here’s the thing, though,” the man replies. “I didn’t steal the books that I read and learned from. I bought them, or was given them, or checked them out from the library.”

You turn to the attendant. “Where did the research assistant get the articles, books, and videos it’s learning from. Did it steal them? Did it pirate them?”

“Some of the early assistants might have, or mistakenly read pirated materials when they were browsing the web, but now we make an effort to only give them materials that are either publicly available, in the sense that you don’t have to pay to access them, or that we’ve purchased. And we’re even taking that a step further by striking deals with some sources to pay them to let our assistants learn from their content, even though they already give the content away for free to everyone else.” He then turns to the critic. “And it seems to me, if you did steal a book and read it, and then you later write about ideas you learned from it, the wrong thing you did was the initial stealing, not the discussion of ideas. Except for trade secrets or non-disclosure agreements, we don’t tend to think it’s wrong to discuss or write about ideas, no matter their source.”

“That doesn’t change anything,” the critic says, and then quickly shifts gears. “I make my living writing the kinds of things you’re getting from this assistant. I write poetry. I have friends who write research reports. You asked for help with a marketing plan, and I know a guy who does those and would love for you to pay him for it. This assistant is free for you to us, or only $20 a month if you want to use it a lot, and that’s significantly less than what I or my friends think the work we do is worth. It’s wrong of you to use the cheaper alternative. You should be paying us, and paying more, because the work we do is important.”

“But doesn’t that apply to a lot of people’s jobs?” you ask. “Just last week I bought a nifty robot lawn mower. It’s not perfect, and it occasionally misses spots, but it mows the law for me, and is quite convenient. My neighbor across the street pays a landscaping service to mow his lawn. That service does a better job, but it costs a lot more over time than my mower. Is it wrong for me to use the mower? I’m sure the landscapers think landscaping is worth more than I’m paying now, and me not hiring them is costing them some amount of their livelihood. But if one of them came to my door and demanded I get rid of my robot mower in order to give them jobs, that would be odd–and a little rude.”

The critic replies, “If you’re having written texts prepared for you, it matters that it’s accurate. And not just accurate, but high quality. The service doesn’t give you that. It’s middling, at best.”

“I’ve read articles written by your friends, and some are pretty good. Others aren’t, and still others have mistakes. Having played with this service for a while, I know what I’m getting from it, and it seems like a reasonable trade-off for the cost.”

“It steals my work, though.”

“How so?”

“Well, sometimes it’ll give direct quotes without attribution, or produce something very similar.”

“Fair enough,” you say. “Though, again, in my experience, it’s actually pretty difficult to get it to give length direct quotes. I asked it to summarize Edgar Allan Poe’s The Tell-Tale Heart and it did a fine job. But when I asked it to tell me the story, it didn’t produce anything like a verbatim transcription. And many of the quotes it told me were directly from the story actually weren’t. In fact,” you say, turning to the attendant from the service, “I sometimes wonder if it checks its work.”

“The service is the best it can be given the costs involved,” he says, “and also, well, the overwhelming number of topics and styles it’s able to produce.”

“But even still,” you continue, “even if it does occasionally produce what we might call copyright infringing output, the infringement is in the performance of the story, not the fact that it knows the story. It is legal to memorize poems, holding a perfect copy in your brain. It is not legal to them write them down and sell them.”

“And remember, please,” says the attendant, “that our service doesn’t really keep perfect copies of what it learns from, anyway. Rather, it draws on what it knows about the books it reads. Just like you do when you read a book.”

The critic tries one final argument. “What it comes down to,” he says, “is I think this service is bad and the people running it immoral. My friends agree. Because I don’t like them or what they do, they shouldn’t be allowed to use articles I’ve published freely online, and it shouldn’t be allow to read my books to learn from.”

“It might be fair to be mad that someone you don’t like has read your book,” you reply, “but we don’t typically think that means you get to control who reads it, so long as they didn’t steal the book from you.”

With that, you thank the attendant and the critic for the interesting conversation, but note that you really have to run to pick up your kids from school.

On the way to get them, you realize you never asked the attendant what is actually inside the box. Imagining what it might have been, you wonder how that knowledge would change whether you think the critic is correct in his arguments or not.

What if inside the box was a single, very smart person, who had read a lot, and was able to quite quickly process and synthesize knowledge?

What if, instead, the box contained a video conferencing computer, and so “inside” of it was in fact hundreds of smart people who had read a lot in different areas and collaborated in producing answers to your questions?

Or, you suppose, what it were just a big computer, which had itself read a lot, and ran a sophisticated program to produce answers?

And if your assessment of the critic’s arguments did change depending on which of those was the case, why would that be? What would be the relevant distinction between the three possibilities in terms of how it changed whether the critic was right or wrong?

Thank you for supporting my work. If you’d like to add some free months to your premium subscription, you can by referring your friends to this newsletter.