Google’s AI chatbot isn’t the only one to make factual errors during its first demo. Independent AI researcher Dmitri Brereton has discovered that Microsoft’s first Bing AI demos were full of financial data mistakes.
Microsoft confidently demonstrated its Bing AI capabilities a week ago, with the search engine taking on tasks like providing pros and cons for top selling pet vacuums, planning a 5-day trip to Mexico City, and comparing data in financial reports. But, Bing failed to differentiate between a corded / cordless vacuum, missed relevant details for the bars it references in Mexico City, and mangled financial data — by far the biggest mistake.
In one of the demos, Microsoft’s Bing AI attempts to summarize a Q3 2022 financial report for Gap clothing and gets a lot wrong. The Gap report (PDF) mentions that gross margin was 37.4 percent, with adjusted gross margin at 38.7 percent excluding an impairment charge. Bing inaccurately reports the gross margin as 37.4 percent including the adjustment and impairment charges.
Bing then goes on to state Gap had a reported operating margin of 5.9 percent, which doesn’t appear in the financial results. The operating margin was 4.6 percent, or 3.9 percent adjusted and including the impairment charge.
During Microsoft’s demo, Bing AI then goes on to compare Gap financial data to Lulumeon’s same results during the Q3 2022 quarter. Bing makes more mistakes with the Lululemon data, and the result is a comparison riddled with inaccuracies.
Brereton also highlights an apparent mistake with a query related to the pros and cons of top selling pet vacuums. Bing cites the “Bissell Pet Hair Eraser Handheld Vacuum,” and lists the con of it having a short cord length of 16 feet. “It doesn’t have a cord,” says Brereton. “It’s a portable handheld vacuum.”
However, a quick Google search (or Bing!) will show there’s clearly a version of this vacuum with 16-foot cord in both a written review and video. There’s also a cordless version, which is linked in the HGTV article that Bing sources. Without knowing the exact URL Bing sourced in Microsoft’s demo, it looks like Bing is using multiple data sources here without listing those sources fully, conflating two versions of a vacuum. The fact that Brereton himself made a small mistake in fact-checking Bing shows the difficulty in assessing the quality of these AI-generated answers.
Bing’s AI mistakes aren’t limited to just its onstage demos, though. Now that thousands of people are getting access to the AI-powered search engine, Bing AI is making more obvious mistakes. In an exchange posted to Reddit, Bing AI gets super confused and argues that we’re in 2022. “I’m sorry, but today is not 2023. Today is 2022,” says Bing AI. When the Bing user says it’s 2023 on their phone, Bing suggests checking it has the correct settings and ensuring the phone doesn’t have “a virus or a bug that is messing with the date.”
Microsoft is aware of this particular mistake. “We’re expecting that the system may make mistakes during this preview period, and the feedback is critical to help identify where things aren’t working well so we can learn and help the models get better,” says Caitlin Roulston, director of communications at Microsoft, in a statement to The Verge.
Other Reddit users have found similar mistakes. Bing AI confidently and incorrectly states “Croatia left the EU in 2022,” sourcing itself twice for the data. PCWorld also found that Microsoft’s new Bing AI is teaching people ethnic slurs. Microsoft has now corrected the query that led to racial slurs being listed in Bing’s chat search results.
“We have put guardrails in place to prevent the promotion of harmful or discriminatory content in accordance to our AI principles,” explains Roulston. “We are currently looking at additional improvements we can make as we continue to learn from the early phases of our launch. We are committed to improving the quality of this experience over time and to making it a helpful and inclusive tool for everyone.”
Other Bing AI users have also found that the chatbot often refers to itself as Sydney, particularly when users are using prompt injections to try and surface the chatbot’s internal rules. “Sydney refers to an internal code name for a chat experience we were exploring previously,” says Roulston. “We are phasing out the name in preview, but it may still occasionally pop up.”
Personally, I’ve been using the Bing AI chatbot for a week now and have been impressed with some results and frustrated with other inaccurate answers. Over the weekend I asked it for the latest cinema listings in London’s Leicester Square, and despite using sources for Cineworld and Odeon, it persisted in claiming that Spider-Man: No Way Home and The Matrix Resurrections, both films from 2021, were still being shown. Microsoft has now corrected this mistake, as I see correct listings now that I run the same query today, but the mistake made no sense when it was sourcing data with the correct listings.
Microsoft clearly has a long way to go until this new Bing AI can confidently and accurately respond to all queries with factual data. We’ve seen similar mistakes from ChatGPT in the past, but Microsoft has integrated this functionality directly into its search engine as a live product that also relies on live data. Microsoft will need to make a lot of adjustments to ensure Bing AI stops confidently making mistakes using this data.