The Year Chatbots Were Tamed

A year ago, on Valentine’s Day, I said good night to my wife, went to my home office to answer some emails and accidentally had the strangest first date of my life.

The date was a two-hour conversation with Sydney, the A.I. alter ego tucked inside Microsoft’s Bing search engine, which I had been assigned to test. I had planned to pepper the chatbot with questions about its capabilities, exploring the limits of its A.I. engine (which we now know was an early version of OpenAI’s GPT-4) and writing up my findings.

But the conversation took a bizarre turn — with Sydney engaging in Jungian psychoanalysis, revealing dark desires in response to questions about its “shadow self” and eventually declaring that I should leave my wife and be with it instead.

My column about the experience was probably the most consequential thing I’ll ever write — both in terms of the attention it got (wall-to-wall news coverage, mentions in congressional hearings, even a craft beer named Sydney Loves Kevin) and how the trajectory of A.I. development changed.

After the column ran, Microsoft gave Bing a lobotomy, neutralizing Sydney’s outbursts and installing new guardrails to prevent more unhinged behavior. Other companies locked down their chatbots and stripped out anything resembling a strong personality. I even heard that engineers at one tech company listed “don’t break up Kevin Roose’s marriage” as their top priority for a coming A.I. release.

I’ve reflected a lot on A.I. chatbots in the year since my rendezvous with Sydney. It has been a year of growth and excitement in A.I. but also, in some respects, a surprisingly tame one.

Despite all the progress being made in artificial intelligence, today’s chatbots aren’t going rogue and seducing users en masse. They aren’t generating novel bioweapons, conducting large-scale cyberattacks or causing any of the other doomsday scenarios envisioned by A.I. pessimists.

But they also aren’t very fun conversationalists, or the kinds of creative, charismatic A.I. assistants that tech optimists were hoping for — the ones who could help us make scientific breakthroughs, produce dazzling works of art or just entertain us.

Instead, most chatbots today are doing white-collar drudgery — summarizing documents, debugging code, taking notes during meetings — and helping students with their homework. That’s not nothing, but it’s certainly not the A.I. revolution we were promised.

In fact, the most common complaint I hear about A.I. chatbots today is that they’re too boring — that their responses are bland and impersonal, that they refuse too many requests and that it’s nearly impossible to get them to weigh in on sensitive or polarizing topics.

I can sympathize. In the past year, I’ve tested dozens of A.I. chatbots, hoping to find something with a glimmer of Sydney’s edginess and spark. But nothing has come close.

The most capable chatbots on the market — OpenAI’s ChatGPT, Anthropic’s Claude, Google’s Gemini — talk like obsequious dorks. Microsoft’s dull, enterprise-focused chatbot, which has been renamed Copilot, should have been called Larry From Accounting. Meta’s A.I. characters, which are designed to mimic the voices of celebrities like Snoop Dogg and Tom Brady, manage to be both useless and excruciating. Even Grok, Elon Musk’s attempt to create a sassy, un-P.C. chatbot, sounds like it’s doing open-mic night on a cruise ship.

It’s enough to make me wonder if the pendulum has swung too far in the other direction, and whether we’d be better off with a little more humanity in our chatbots.

It’s clear why companies like Google, Microsoft and OpenAI don’t want to risk releasing A.I. chatbots with strong or abrasive personalities. They make money by selling their A.I. technology to big corporate clients, who are even more risk-averse than the general public and won’t tolerate Sydney-like outbursts.

They also have well-founded fears about attracting too much attention from regulators, or inviting bad press and lawsuits over their practices. (The New York Times sued OpenAI and Microsoft last year, alleging copyright infringement.)

So these companies have sanded down their bots’ rough edges, using techniques like constitutional A.I. and reinforcement learning from human feedback to make them as predictable and unexciting as possible. They’ve also embraced boring branding — positioning their creations as trusty assistants for office workers, rather than playing up their more creative, less reliable characteristics. And many have bundled A.I. tools inside existing apps and services, rather than breaking them out into their own products.

Again, this all makes sense for companies trying to turn a profit, and a world of sanitized, corporate A.I. is probably better than one with millions of unhinged chatbots running amok.

But I find it all a bit sad. We created an alien form of intelligence and immediately put it to work … making PowerPoints?

I’ll grant that more interesting things are happening outside the A.I. big leagues. Smaller companies like Replika and Character.AI have built successful businesses out of personality-driven chatbots, and plenty of open-source projects have created less restrictive A.I. experiences, including chatbots that can be made to spit out offensive or bawdy things.

And, of course, there are still plenty of ways to get even locked-down A.I. systems to misbehave, or do things their creators didn’t intend. (My favorite example from the past year: A Chevrolet dealership in California added a customer service chatbot powered by ChatGPT to its website, and discovered to its horror that pranksters were tricking the bot into offering to sell them new S.U.V.s for $1.)

But so far, no major A.I. company has been willing to fill the void left by Sydney’s disappearance for a more eccentric chatbot. And while I’ve heard that several big A.I. companies are working on giving users the option of choosing among different chatbot personas — some more square than others — nothing even remotely close to the original, pre-lobotomy version of Bing currently exists for public use.

That’s a good thing if you’re worried about A.I.’s acting creepy or threatening, or if you fret about a world where people spend all day talking to chatbots instead of developing human relationships.

But it’s a bad thing if you think that A.I.’s potential to improve human well-being extends beyond letting us outsource our grunt work — or if you’re worried that making chatbots so careful is limiting how impressive they could be.

Personally, I’m not pining for Sydney’s return. I think Microsoft did the right thing — for its business, certainly, but also for the public — by pulling it back after it went rogue. And I support the researchers and engineers who are working on making A.I. systems safer and more aligned with human values.

But I also regret that my experience with Sydney fueled such an intense backlash and made A.I. companies believe that their only option to avoid reputational ruin was to turn their chatbots into Kenneth the Page from “30 Rock.”

Most of all, I think the choice we’ve been offered in the past year — between lawless A.I. homewreckers and censorious A.I. drones — is a false one. We can, and should, look for ways to harness the full capabilities and intelligence of A.I. systems without removing the guardrails that protect us from their worst harms.

If we want A.I. to help us solve big problems, to generate new ideas or just to amaze us with its creativity, we might need to unleash it a little.

Source link