Microsoft's Bing chatbot is also offering bad results

After months of hype, Google and Microsoft announced the imminent arrivals of Bard and a ChatGPT-integrated Bing search engine within 24 hours of one another. At first glance, both tech giants’ public demonstrations appeared to display potentially revolutionary products that could upend multiple industries. But it wasn’t long before even cursory reviews highlighted egregious flaws within Google’s Bard suggestions. Now, it’s Microsoft’s turn for some scrutiny, and the results are as bad as Bard’s, if not worse.

Independent AI researcher Dmitri Brereton published a blog post Monday detailing numerous glaring issues in their experience with a ChatGPT-powered Bing. Bing’s demo frequently contained shoddy information: from inaccurate recommended product details, to omitting or misstating travel stop details, to even misrepresenting seemingly straightforward financial reports. In the latter instance, Bing’s AI summation of basic financial data—something that should be “trivial” for AI, per Brereton—contained completely false statistics out of nowhere.

But even when correct, Bing may have grossly sidestepped simple ethical guardrails. According to one report from PCWorld’s Mark Hachman, the AI provided the Hachman’s children with a litany of ethnic slurs when asked for cultural nicknames. Although Bing prefaced its examples by cautioning that certain nicknames are “neutral or positive, while others are derogatory or offensive,” the chatbot didn’t appear to bother categorizing its results. Instead, it simply created a laundry list of good, bad, and extremely ugly offerings.

Microsoft’s director of communications, Caitlin Roulston told The Verge that the company “expect[ed] that the system may make mistakes during this preview period, and the feedback is critical to help identify where things aren’t working well so we can learn and help the models get better.”

As companies inevitably rush to implement “smart” chatbot capabilities into their ecosystems, critics argue it’s vital that these issues be tackled and resolved before widespread adoption. For Chinmay Hegde, an Associate Professor at NYU Tandon School of Engineering, the missteps were wholly unsurprising, and Microsoft debuted its technology far too early.

“At a high level, the reason why these errors are happening is that the technology underlying ChatGPT is a probabilistic [emphasis Hegde] large language model, so there is inherent uncertainty in its output,” he writes in an email to PopSci. “We can never be absolutely certain what it’s going to say next.” As such, programs like ChatGPT and Bard may be good for tasks where there is no unique answer—like making jokes or recipe ideas—but not so much when precision is required, such as historical facts or constructing logical arguments, says Hegde.

“I am shocked that the Bing team created this pre-recorded demo filled with inaccurate information, and confidently presented it to the world as if it were good,” Brereton writes in their blog post before admonishing, “I am even more shocked that this trick worked, and everyone jumped on the Bing AI hype train without doing an ounce of due diligence.”

Win the Holidays with PopSci's Gift Guides

Why do disinfectants only kill 99.9% of germs? Why do disinfectants only kill 99.9% of germs?

CES 2025: 18 new products we’re looking forward to this year CES 2025: 18 new products we’re looking forward to this year

Drones, AI, and smart meetings at the beginning of the Microsoft Build conference Drones, AI, and smart meetings at the beginning of the Microsoft Build conference

AI turned a Rembrandt masterpiece into 5.6 terabytes of data AI turned a Rembrandt masterpiece into 5.6 terabytes of data

Military medics and mechanics may soon have access to a virtual help desk through AMIGOS Military medics and mechanics may soon have access to a virtual help desk through AMIGOS

Why Spotify’s music recommendations always seem so spot on Why Spotify’s music recommendations always seem so spot on

Facebook might rebrand as a ‘metaverse’ company. What does that even mean? Facebook might rebrand as a ‘metaverse’ company. What does that even mean?

Google’s about to get better at understanding complex questions Google’s about to get better at understanding complex questions

Do we trust robots enough to put them in charge? Do we trust robots enough to put them in charge?

The newest Roomba is finally smart enough to avoid pet poop The newest Roomba is finally smart enough to avoid pet poop

Microsoft’s new video authenticator could help weed out dangerous deepfakes Microsoft’s new video authenticator could help weed out dangerous deepfakes

Microsoft’s new Flight Simulator makes flying—and turbulence—a lot more realistic Microsoft’s new Flight Simulator makes flying—and turbulence—a lot more realistic

iPhone users can also embrace Google and Microsoft apps. Here’s how. iPhone users can also embrace Google and Microsoft apps. Here’s how.

From the archives: This talking gadget from the 1920s measured water levels From the archives: This talking gadget from the 1920s measured water levels

From the archives: Rube Goldberg machines are serious business From the archives: Rube Goldberg machines are serious business

It’s an especially dangerous time to be a pedestrian in America It’s an especially dangerous time to be a pedestrian in America

Ohio bill proposes to criminalize electronic stalking and tracking Ohio bill proposes to criminalize electronic stalking and tracking

A new AT&T update could make 911 calls more effective A new AT&T update could make 911 calls more effective

Share

Win the Holidays with PopSci's Gift Guides