The New ChatGPT Offers a Lesson in AI Hype

When OpenAI unveiled the latest version of its immensely popular ChatGPT chatbot this month, it had a new voice possessing humanlike inflections and emotions. The online demonstration also featured the bot tutoring a child on solving a geometry problem.

To my chagrin, the demo turned out to be essentially a bait and switch. The new ChatGPT was released without most of its new features, including the improved voice (which the company told me it postponed to make fixes). The ability to use a phone’s video camera to get real-time analysis of something like a math problem isn’t available yet, either.

Amid the delay, the company also deactivated the ChatGPT voice that some said sounded like the actress Scarlett Johansson, after she threatened legal action, replacing it with a different female voice.

For now, what has actually been rolled out in the new ChatGPT is the ability to upload photos for the bot to analyze. Users can generally expect quicker, more lucid responses. The bot can also do real-time language translations, but ChatGPT will respond in its older, machine-like voice.

Nonetheless, this is the leading chatbot that upended the tech industry, so it was worth reviewing. After trying the sped-up chatbot for two weeks, I had mixed feelings. It excelled at language translations, but it struggled with math and physics. All told, I didn’t see a meaningful improvement from the last version, ChatGPT-4. I definitely wouldn’t let it tutor my child.

This tactic, in which A.I. companies promise wild new features and deliver a half-baked product, is becoming a trend that is bound to confuse and frustrate people. The $700 Ai Pin, a talking lapel pin from the start-up Humane, which is funded by OpenAI’s chief executive, Sam Altman, was universally panned because it overheated and spat out nonsense. Meta also recently added to its apps an A.I. chatbot that did a poor job at most of its advertised tasks, like web searches for plane tickets.

Companies are releasing A.I. products in a premature state partly because they want people to use the technology to help them learn how to improve it. In the past, when companies unveiled new tech products like phones, what we were shown — features like new cameras and brighter screens — was what we were getting. With artificial intelligence, companies are giving a preview of a potential future, demonstrating technologies that are being developed and working only in limited, controlled conditions. A mature, reliable product might arrive — or might not.

The lesson to learn from all this is that we, as consumers, should resist the hype and take a slow, cautious approach to A.I. We shouldn’t be spending much cash on any underbaked tech until we see proof that the tools work as advertised.

The new version of ChatGPT, called GPT-4o (“o” as in “omni”), is now free to try on OpenAI’s website and app. Nonpaying users can make a few requests before hitting a timeout, and those who have a $20 monthly subscription can ask the bot a larger number of questions.

OpenAI said its iterative approach to updating ChatGPT allowed it to gather feedback to make improvements.

“We believe it’s important to preview our advanced models to give people a glimpse of their capabilities and to help us understand their real-world applications,” the company said in a statement.

(The New York Times sued OpenAI and its partner, Microsoft, last year for using copyrighted news articles without permission to train chatbots.)

Here’s what to know about the latest version of ChatGPT.

Geometry and Physics

To show off ChatGPT-4o’s new tricks, OpenAI published a video featuring Sal Khan, the chief executive of the Khan Academy, the education nonprofit, and his son, Imran. With a video camera pointed at a geometry problem, ChatGPT was able to talk Imran through solving it step by step.

Even though ChatGPT’s video-analysis feature has yet to be released, I was able to upload photos of geometry problems. ChatGPT solved some of the easier ones correctly, but it tripped up on more challenging problems.

For one problem involving intersecting triangles, which I dug up on an SAT preparation website, the bot understood the question but gave the wrong answer.

Taylor Nguyen, a high school physics teacher in Orange County, Calif., uploaded a physics problem involving a man on a swing that is commonly included on Advanced Placement Calculus tests. ChatGPT made several logical mistakes to give the wrong answer, but it was able to correct itself with feedback from Mr. Nguyen.

“I was able to coach it, but I’m a teacher,” he said. “How is a student supposed to pick out those mistakes? They’re making this assumption that the chatbot is right.”

I did notice that ChatGPT-4o succeeded at some division calculations that its predecessors did incorrectly, so there are signs of slow improvement. But it also failed at a basic math task that past versions and other chatbots, including Meta AI and Google’s Gemini, have flunked at: the ability to count. When I asked ChatGPT-4o for a four-syllable word starting with the letter “W,” it responded, “Wonderful.”

OpenAI said it was constantly working to improve its systems’ responses to complex math problems.

Mr. Khan, whose company uses OpenAI’s technology in its tutoring software Khanmigo, did not respond to a request for comment on whether he would leave ChatGPT the tutor alone with his son.

Reasoning

OpenAI also highlighted that the new ChatGPT was better at reasoning, or using logic to come up with responses. So I ran it through one of my favorite tests: I asked it to generate a Where’s Waldo? puzzle. When it showed an image of a giant Waldo standing in a crowd, I said that the point is that he’s supposed to be hard to find.

The bot then generated an even larger Waldo.

Subbarao Kambhampati, a professor and researcher of artificial intelligence at Arizona State University, also put the chatbot through some tests and said he saw no noticeable improvement in reasoning compared with the last version.

He presented ChatGPT a puzzle involving blocks:

If block C is on top of block A, and block B is separately on the table, can you tell me how I can make a stack of blocks with block A on top of block B and block B on top of block C, but without moving block C?

The answer is that it’s impossible to arrange the blocks under these conditions, but, just as with past versions, ChatGPT-4o consistently came up with a solution that involved moving block C. With this and other reasoning tests, ChatGPT was occasionally able to take feedback to get the correct answer, which is antithetical to how artificial intelligence is supposed to work, Mr. Kambhampati said.

“You can correct it, but when you do that you’re using your own intelligence,” he said.

OpenAI pointed to test results that showed GPT-4o scored about two percentage points higher at answering general knowledge questions than previous versions of ChatGPT, illustrating that its reasoning skills had slightly improved.

Language

OpenAI also said the new ChatGPT could do real-time language translation, which could help you converse with someone speaking a foreign language.

I tested ChatGPT with Mandarin and Cantonese and confirmed that it was OK at translating phrases, such as “I’d like to book a hotel room for next Thursday” and “I want a king-size bed.” But the accents were slightly off. (To be fair, my broken Chinese is not much better.) OpenAI said it was still working to improve accents.

ChatGPT-4o also excelled as an editor. When I fed it paragraphs that I wrote, it was fast and effective at removing excessive words and jargon. ChatGPT’s decent performance with language translation gives me confidence that this will soon become a more useful feature.

Bottom Line

A major thing OpenAI got right with ChatGPT-4o is making the technology free for people to try. Free is the right price: Since we are helping to train these A.I. systems with our data to improve, we shouldn’t be paying for them.

The best of A.I. has yet to come, and it might one day be a good math tutor that we want to talk to. But we should believe it when we see it — and hear it.

Source link