Siri Chatbot prototype nears ChatGPT quality, but hallucinates more than Apple wants

AppleInsider · June 1, 2025 12:29PM

A new report claims that internally, Apple has already been testing Large Language Models for Siri that are vastly more powerful than the shipping Apple Intelligence, but executives disagree about when to release it.

Colorful swirling shapes and a glowing orb on a black background, featuring vibrant shades of pink, blue, and white, creating a dynamic and abstract design.

Internally, Siri us close to ChatGPT level accuracy.

Backing up AppleInsider's position that Apple is not behind on AI, the company has regularly been publishing research showing its progress moving the field forward. Now according to Bloomberg, Apple has already been working with AI systems considerably more power than the on-device Apple Intelligence it has shipped so far.

Specifically, the report says that internally, Apple is using multiple different models with ever greater complexity. Apple is said to be testing models with 3 billion, 7 billion, 33 billion, and 150 billion parameters.

For comparison, Apple in 2024 said that Apple Intelligence's foundation language models were of the order of 3 billion parameters.

That version of Apple Intelligence is intentionally small in order for it to be possible to run on-device instead of requiring all prompts and requests to be sent to the cloud. The larger versions are cloud-based, and in the case of the 150 billion parameter model, now also said to approach the quality of ChatGPT's most recent releases.

However, there reportedly remain concerns over AI hallucinations. Apple is said to have held off releasing this Apple Intelligence model in part because of this, implying that the level of hallucinations is too high.

There is said to be another reason for not yet shipping this cloud-based and much improved Siri Chatbot, though. It is claimed that there are philosophical differences between Apple's senior executives over the release.

It's conceivable that these differences solely concern each executive's tolerance for hallucinations in Apple Intelligence. However, there is no further detail.

Previously, it's been reported that former Siri chief executive chief John Giannandrea is against releasing it yet, others in the executive staff are more keen to launch a Siri Chatbot.

Perhaps because of this internal disagreement, it is now also claimed that there will be fewer Apple Intelligence announcements at WWDC than expected. The whole WWDC keynote is said to be smaller in scope than previous years, but it is still expected to feature a dramatic redesign of the Mac, iPhone, and iPad, to exploit Apple's user interface lessons learned on the Apple Vision Pro.

Rumor Score: Likely

Read on AppleInsider

foregoneconclusion · June 1, 2025 1:17PM

The "power" comes from the database that is used to train the LLM. The LLM itself is worthless without it.

applezulu · June 1, 2025 3:23PM

In a nutshell, this explains why Apple is “behind” with AI, but actually isn’t.

It’s remarkable the consistency with which this pattern repeats, yet even people who consider themselves Apple enthusiasts don’t see it. Tech competitors “race ahead” with an iteration of some technology, while Apple seemingly languishes. Apple is doomed. Then Apple comes out “late” with their version of it, and the initial peanut gallery reception pronounces it too little, too late.

Then within a couple of years, Apple’s version is the gold standard and the others -those cutting-edge innovators- race to catch up, because “first” is often also “half-baked.”

In the news this week, it was exposed that RFK Jr’s “Make America Healthy Again” report was evidently an AI-produced document, replete with hallucinations, most notably in the bibliography, and of course it was. This is what happens when the current cohort of AI models are uncritically used to produce a desired result, without any understanding of how profoundly bad these AI models are. When I read about this in the news, I decided to experiment with it myself. Using MS Copilot -in commercial release as part of MS Word- I picked a subject and asked for a report taking a specific, dubious position on it, with citations and a bibliography. After it dutifully produced the report, I started checking the bibliography, and one after another, failed to find the research papers that Copilot used to back the position taken. I didn’t check all the references, so it’s possible some citations were real, but finding several that weren’t was sufficient to bin the whole thing. It’s bad enough when humans intentionally produce false and misleading information, but when a stock office product will do it for you with no disclaimers or warnings, should that product really be on the market? I also once asked ChatGPT to write a story about green eggs and ham, in the style of Dr. Seuss. It then plaigerized the actual Seuss story, almost verbatim, in a clear abuse of copyright law. This is the stuff that Apple is supposedly trailing behind.

So the report here that Apple is developing AI but, unlike their “cutting edge” competitors, not releasing something that produces unreliable garbage, suggests that no, they’re not behind. They’re just repeating the same pattern again of carefully producing something of high quality and reliability, and in a form that is intuitively useful, rather than a gimmicky demonstration that they can do a thing, whether it’s useful or not. Eventually they’ll release something that consistently produces reliable information, and likely does so while respecting copyright and other intellectual property rights. The test will be that not only will it be unlikely to hallucinate in ways that mislead or embarrass its honest users, it will actually disappoint those with more nefarious intent. When asked to produce a report with dubious or false conclusions, it won’t comply like a sociopathic sycophant. It will respond by telling the user that the reliable data not only doesn’t support the requested position, but actually refutes it. Hopefully this will be a feature that Apple uses to market their AI when it’s released.

P.S. As a corollary, the other thing that Apple is likely concerned with (perhaps uniquely so) is AI model collapse. This is the feedback loop where AI training data is scooped up from sources that include AI-produced hallucinations, not only increasing the likelihood that the bad data will be repeated, but reducing any ability for the AI model to discern good data from bad. Collapse occurs when the model is so poisoned with bad data that even superficial users find the model to be consistently wrong and useless. Effectively every query becomes an unamusing version of that game where you playfully ask for “wrong answers only.” Presumably the best way to combat that is to train the AI as you would a human student: start by giving it information sources known to be reliable, and eventually train it to discern those sources on its own. That takes more time. You can’t just dump the entire internet into it and tell it that the patterns repeated the most are most likely correct.

P.P.S. I just repeated the above experiment in Pages, using Apple’s link to Chat GPT. It also produced hallucinated references. I just chased down the first citation in the bibliography it created. Searching for the cited article didn’t turn up anything. I did find the cited journal, went to it and searched for the cited title, got nothing. Searched for the authors, got nothing. Finally, I browsed to find the issue supposedly containing the referenced article, and that article does not exist. So Apple gets demerits for subbing in ChatGPT in their uncharacteristic worry that they not be perceived as being “late.” This part does not fit their usual pattern, with the exception perhaps of their hastened switch to Apple Maps, based largely at first on third-party map data. In the long run, their divorce from Google maps was important, as location services was rapidly becoming a core OS function, not just a sat nav driving convenience that can adequately be left to third party apps. The race to use AI is perhaps analog, but the hopefully temporary inclusion of ChatGPT’s garbage should be as embarrassing as those early Apple Maps with bridges that went underwater, etc.

edited June 1

jintech · June 1, 2025 6:48PM

AppleZulu said:

P.P.S. I just repeated the above experiment in Pages, using Apple’s link to Chat GPT. It also produced hallucinated references. I just chased down the first citation in the bibliography it created. Searching for the cited article didn’t turn up anything. I did find the cited journal, went to it and searched for the cited title, got nothing. Searched for the authors, got nothing. Finally, I browsed to find the issue supposedly containing the referenced article, and that article does not exist. So Apple gets demerits for subbing in ChatGPT in their uncharacteristic worry that they not be perceived as being “late.” This part does not fit their usual pattern, with the exception perhaps of their hastened switch to Apple Maps, based largely at first on third-party map data. In the long run, their divorce from Google maps was important, as location services was rapidly becoming a core OS function, not just a sat nav driving convenience that can adequately be left to third party apps. The race to use AI is perhaps analog, but the hopefully temporary inclusion of ChatGPT’s garbage should be as embarrassing as those early Apple Maps with bridges that went underwater, etc.

This is why Apple puts a caution that you are using ChatGPT and not Siri. It would be embarrassing if Apple baked ChatGPT into Siri and called this content their own.

charlesn · June 1, 2025 7:58PM

AppleZulu said:

In a nutshell, this explains why Apple is “behind” with AI, but actually isn’t.

Indeed. No doubt that Apple is "behind" in the technosphere press and comment boards like this one for the tech obsessed (and I include myself), but these arenas are hardly representative of mainstream buyers that make up most of Apple's customer base. Here's what Apple is mainly selling: "it just works" ease of use, seamless integration with other Apple products and privacy/security. If this were still about a hardware/software "features" war, Apple would have lost that long ago. Instead, they have been and remain the most successful consumer electronics company in history.

ssfe11 · June 2, 2025 10:54AM

So will Apple even need ChatGPT if Apples models are on the same level as them? I guess hey why not?

stabitha_christie · June 2, 2025 12:25PM

ssfe11 said:

So will Apple even need ChatGPT if Apples models are on the same level as them? I guess hey why not?

There is the model, the thing that does the reasoning, and there is the data the model is trained on. So you can have two models of equal ability but trained on different sets of data depending on their purpose. You can even have the same model trained on different data. My guess is the Siri chat bot will not be trained on as large of a data set as ChatGPT. So, while they may have comparable reasoning abilities they would be two entirely different tools with different purposes.

massiveattack · June 2, 2025 1:41PM

I would not be surprised if this Siri chatbot will be renamed once it is ready..
But how long does it take until it is ready? At WWDC 2024, they introduced this what does not exist.

So count down since WWDC 2024..........

danvm · June 2, 2025 1:46PM

AppleZulu said:

In a nutshell, this explains why Apple is “behind” with AI, but actually isn’t.

It’s remarkable the consistency with which this pattern repeats, yet even people who consider themselves Apple enthusiasts don’t see it. Tech competitors “race ahead” with an iteration of some technology, while Apple seemingly languishes. Apple is doomed. Then Apple comes out “late” with their version of it, and the initial peanut gallery reception pronounces it too little, too late.

Then within a couple of years, Apple’s version is the gold standard and the others -those cutting-edge innovators- race to catch up, because “first” is often also “half-baked.”

I’d agree with you if we were talking about hardware—but in this case, we’re not. Just take a look at Apple’s software and cloud services. In many areas, they’re actually lagging behind the competition.

In the news this week, it was exposed that RFK Jr’s “Make America Healthy Again” report was evidently an AI-produced document, replete with hallucinations, most notably in the bibliography, and of course it was. This is what happens when the current cohort of AI models are uncritically used to produce a desired result, without any understanding of how profoundly bad these AI models are. When I read about this in the news, I decided to experiment with it myself. Using MS Copilot -in commercial release as part of MS Word- I picked a subject and asked for a report taking a specific, dubious position on it, with citations and a bibliography. After it dutifully produced the report, I started checking the bibliography, and one after another, failed to find the research papers that Copilot used to back the position taken. I didn’t check all the references, so it’s possible some citations were real, but finding several that weren’t was sufficient to bin the whole thing. It’s bad enough when humans intentionally produce false and misleading information, but when a stock office product will do it for you with no disclaimers or warnings, should that product really be on the market? I also once asked ChatGPT to write a story about green eggs and ham, in the style of Dr. Seuss. It then plaigerized the actual Seuss story, almost verbatim, in a clear abuse of copyright law. This is the stuff that Apple is supposedly trailing behind.

So the report here that Apple is developing AI but, unlike their “cutting edge” competitors, not releasing something that produces unreliable garbage, suggests that no, they’re not behind. They’re just repeating the same pattern again of carefully producing something of high quality and reliability, and in a form that is intuitively useful, rather than a gimmicky demonstration that they can do a thing, whether it’s useful or not. Eventually they’ll release something that consistently produces reliable information, and likely does so while respecting copyright and other intellectual property rights. The test will be that not only will it be unlikely to hallucinate in ways that mislead or embarrass its honest users, it will actually disappoint those with more nefarious intent. When asked to produce a report with dubious or false conclusions, it won’t comply like a sociopathic sycophant. It will respond by telling the user that the reliable data not only doesn’t support the requested position, but actually refutes it. Hopefully this will be a feature that Apple uses to market their AI when it’s released.

P.S. As a corollary, the other thing that Apple is likely concerned with (perhaps uniquely so) is AI model collapse. This is the feedback loop where AI training data is scooped up from sources that include AI-produced hallucinations, not only increasing the likelihood that the bad data will be repeated, but reducing any ability for the AI model to discern good data from bad. Collapse occurs when the model is so poisoned with bad data that even superficial users find the model to be consistently wrong and useless. Effectively every query becomes an unamusing version of that game where you playfully ask for “wrong answers only.” Presumably the best way to combat that is to train the AI as you would a human student: start by giving it information sources known to be reliable, and eventually train it to discern those sources on its own. That takes more time. You can’t just dump the entire internet into it and tell it that the patterns repeated the most are most likely correct.

P.P.S. I just repeated the above experiment in Pages, using Apple’s link to Chat GPT. It also produced hallucinated references. I just chased down the first citation in the bibliography it created. Searching for the cited article didn’t turn up anything. I did find the cited journal, went to it and searched for the cited title, got nothing. Searched for the authors, got nothing. Finally, I browsed to find the issue supposedly containing the referenced article, and that article does not exist. So Apple gets demerits for subbing in ChatGPT in their uncharacteristic worry that they not be perceived as being “late.” This part does not fit their usual pattern, with the exception perhaps of their hastened switch to Apple Maps, based largely at first on third-party map data. In the long run, their divorce from Google maps was important, as location services was rapidly becoming a core OS function, not just a sat nav driving convenience that can adequately be left to third party apps. The race to use AI is perhaps analog, but the hopefully temporary inclusion of ChatGPT’s garbage should be as embarrassing as those early Apple Maps with bridges that went underwater, etc.

I get where you're coming from with your experiment, but to me, it’s not all that different from using Google for research. No matter the tool—whether it’s Google, Copilot, ChatGPT, or any other AI—you still have to verify your sources and make sure the information is reliable.

Personally, I use AI at work as a time-saver, not as a replacement for doing the work myself. It’s just like how I use the internet: as a tool to help me be more efficient. Sure, I’ve seen AI give incorrect answers, but I’ve had the same thing happen plenty of times with Google Search too. So is ChatGPT—or any AI—“garbage,” like you said? I don’t think so. When you use it with the right expectations, it can be incredibly useful. At least, that’s been my experience.

As for Apple and AI, it’s pretty clear they’ve fallen behind. They had years to improve Siri but didn’t seem to have the vision to take it further. Plus, they don’t have the kind of large-scale infrastructure needed to support AI at the level of Amazon, Microsoft, or Google, so they’ll likely have to rely on those companies to host their services. And by making Google the default search engine on all their devices, they basically handed over a treasure trove of user data—data that’s now helping Google strengthen its own AI. Also Apple’s strong stance on privacy is admirable, but it also limits what they can do with AI. I think all of these factors have contributed to where they are today. It’ll be interesting to see how they respond in the next few years.

edited June 2

danox · June 3, 2025 12:16AM

charlesn said:

AppleZulu said:

In a nutshell, this explains why Apple is “behind” with AI, but actually isn’t.

Indeed. No doubt that Apple is "behind" in the technosphere press and comment boards like this one for the tech obsessed (and I include myself), but these arenas are hardly representative of mainstream buyers that make up most of Apple's customer base. Here's what Apple is mainly selling: "it just works" ease of use, seamless integration with other Apple products and privacy/security. If this were still about a hardware/software "features" war, Apple would have lost that long ago. Instead, they have been and remain the most successful consumer electronics company in history.

If you have no concerns (Apple) you too can be at the lead of the so-called AI pack as long as you Apple use the same solution which is to stay connected to the super computers back home when answering questions, Google, Meta, Microsoft and others don’t have any problems having you attached to them 24/7 feeding you answers like Alexa remember her?

The question is do you (the public) have a problem with it? I don’t think Apple‘s path is the same as their competition which one do you think will take longer to accomplish the hardware/software solution or we’ll just keep you connected to our super computer solution?

edited June 2

danox · June 3, 2025 12:48AM

MassiveAttack said:

I would not be surprised if this Siri chatbot will be renamed once it is ready..
But how long does it take until it is ready? At WWDC 2024, they introduced this what does not exist.

So count down since WWDC 2024..........

So a work in progress announcement which was clearly annunciated by Apple at WWDC June 2024 is a release? A release did occur to one country (the USA) four months later, with plenty of documentation, indicating that it was still an ongoing work in progress. What part of that is unclear?

https://www.apple.com/apple-intelligence/

https://www.apple.com/newsroom/2024/10/apple-intelligence-is-available-today-on-iphone-ipad-and-mac/. October 28, 2024 release to one country the USA and Apple still reiterated that it was an ongoing work in progress. The Message is still very clear?

The biggest part of the announcement at WWDC 24 was not the Apple intelligence announcement itself, the biggest most important was the fact that Apple indicated that they were going to use the M2 ultra’s as servers (which are now probably M3 ultras, jumping to M5) as part of their solution which meant that Apple has finally got off of their aversion (remember those three Apple engineers who ended up at Qualcomm?) to building the in house software/hardware infrastructure (Apple servers) using Apple Silicon M series solutions, long term Apple probably can’t rely on Intel, Nvidia, Microsoft, Amazon, or AMD.

edited June 2

blastdoor · June 4, 2025 1:57PM

foregoneconclusion said:

The "power" comes from the database that is used to train the LLM. The LLM itself is worthless without it.

The data is worthless without the model, too.

blastdoor · June 4, 2025 2:01PM

Asserting that Apple isn’t behind is equivalent to admitting you don’t use an LLM for productive purposes.

It’s like people in 1996 arguing that cooperative multitasking is just as good as preemptive multitasking. Such people just hadn’t used a preemptive multitasking operating system.

Siri Chatbot prototype nears ChatGPT quality, but hallucinates more than Apple wants

Comments