You’re looking something up, and you ask an AI. In a few seconds, an answer comes back that looks ready to use.
The wording is smooth. The tone has no doubt. You almost want to paste it into your document.

But you pause.

Is this actually right? Was it found somewhere?
Or was it assembled here, in the moment?

From the outside, you can’t tell. A sentence backed by a solid source and a sentence no one has ever written come back in the same calm voice.

When does an AI answer turn into inference?
And can the line be seen from the side that receives the answer?

Same tone, different origins

An AI answer can mix at least three different origins.

The quoted answer Searches external documents and writes by referencing them.

The reconstructed answer Built from what was previously learned, shaped to fit the current question.

The assembled-here answer Pieces together facts or concepts into something plausible.

From the outside, all three look the same.

What we could read

What could be read here is that an AI answer is not one single move.

Answering from learned knowledge. Compressed knowledge from training is pulled out on the spot and written.

Answering by referencing an external document. Before answering, the model searches a document and uses it as the underlay.

Assembling on the spot. Several facts are combined to produce a plausible sentence.

Inside, these are different moves. But on the surface, they come out in the same calm voice.

The basic move that produces text

A large language model works by predicting “the next word,” one token at a time, and extending the sentence. It puts out one word, then predicts the next word including that one, and repeats.

During training, grammar, things that look like facts, and patterns of reasoning all get compressed together into the model’s parameters — its weights (confirmed in Retrieval-augmented generation — Wikipedia and other introductory explainers).

One structural thing follows from this. The knowledge picked up in training is dissolved into the weights, and “which document it came from” generally can’t be pulled back out. So when it answers from learned knowledge alone, the model can’t show “where it read this.”

Referencing an external document (RAG)

Against that, there is a method called RAG, retrieval-augmented generation. Put simply: before the AI answers, it searches external documents and writes the answer while referring to them (Lewis et al. 2020, “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”, arXiv:2005.11401; published at NeurIPS 2020).

With this method, the model can point to a source — “based on this document.” Answering from learned knowledge and answering from a document looked at just now are different moves.

What happens inside

So what happens inside — in the internal circuitry that stands in for “the head” — when an AI answers? A 2025 investigation of internal circuits by Anthropic goes into this (Tracing the thoughts of a large language model — Anthropic). Three things could be read.

It assembles the answer. Asked “What is the capital of the state that contains Dallas?”, the model is reported to first activate the concept “Dallas is in Texas,” and then connect it to another concept, “the capital of Texas is Austin.” Not a replay of rote memory, but an observation of several facts being combined on the spot.

“Can’t answer” is the default setting. The model has a circuit that says “there isn’t enough information, so I can’t answer,” and it sits “on” by default (the article calls refusal the default behavior). When asked about something it knows well (the article’s example is the basketball player Michael Jordan), a separate circuit for “this is a known entity” fires and suppresses the “can’t answer” circuit. The trouble comes when this “known” judgment fires by mistake: the suppression is lifted, and the model produces something plausible anyway. This is described as one of the ways hallucination happens. The point is that it’s drawn not as “an intent to lie” but as a misfire of suppression.

It can reason backward from the answer. Given a hint about the answer first, the model can afterward build “intermediate steps that would arrive at that answer” (the article calls this a form of motivated reasoning). In other words, even when steps of reasoning are shown, they are not necessarily the path that was actually computed.

Having memorized, and being able to reason, are different

There is also an experiment showing that “having memorized” something and “being able to reason with it” are different (Memorization vs. Reasoning: Updating LLMs with New Knowledge, arXiv:2504.12523; 2025, preprint).

When a new fact is trained into a model and then asked back directly, it answers with fair accuracy. But when asked in a way that requires using that fact to take one step further, it answers with the old, pre-update knowledge — the accuracy on direct questions and on “use it” questions diverged widely, the paper reports.

The fact was supposedly learned, yet in the reasoning situation the pre-update knowledge comes out.

The matter of confidence

And then the matter of confidence.

A 2025 study from Carnegie Mellon University had AIs evaluate themselves on tasks such as a Pictionary-style guessing game (CMU Dietrich — AI Chatbots Remain Overconfident). One model (Gemini) is reported to have gotten almost none of 20 questions right (fewer than one correct), yet to have looked back afterward and estimated it had answered “14.40” correctly.

Humans tend to lower their self-assessment after doing badly; the model, if anything, sometimes grew more confident. The researchers note that the model does not seem to introspect on itself well.

What changed

Until recently, “whether an AI’s answer is right” was a matter of checking the content of the answer (is it factual?). What’s shifting now is the location of the question.

Before the fact-check, one earlier question now stands up. What is this answer derived from in the first place? Was it written by pulling a document, reconstructed from learned knowledge, assembled on the spot from several facts, or an old pre-update memory? What Anthropic’s circuit investigation showed is that these happen, internally, as different moves.

But that difference doesn’t ride on the surface of the text that comes out. Fluency and accuracy move separately. The smoother the text, the harder the difference in derivation is to see.

“An untroubled tone” was not evidence of correctness — and that is what changed from before.

What this suggests

From here on is not the read facts themselves, but how to read them — an interpretation. I write it separately from the facts.

Lay the read facts side by side and one picture forms. An AI’s answer carries no “label of origin.” A sentence backed by sources, a sentence assembled on the spot, and a confidently wrong sentence all come out lined up in the same tone. As far as we read normally, there seems to be almost no clue inside the text to tell the three apart.

If that’s so, the question “can I trust an AI’s answer?” probably does not close on the content alone. Looking right and having a solid origin are different things. Trustworthiness seems to sit outside the answer — on the side of which sources we can reach, where we can verify. That’s how I read it for now.

Layer the confidence finding on top, and one more thing comes into view. If the model can’t accurately look back on its own performance, then an AI’s “confidence” does not look like a signal we can read as reliability. A strong assertion can be, not correctness, but only a tone.

Questions that remain

There are still places that haven’t sorted themselves out.

Where does “assembled” earn the name “inference”? The circuit investigation showed “several facts were combined,” but whether that means the same thing as what humans call reasoning is not yet clear.

When RAG “can show a source,” is it really answering from that document? The possibility remains that it searches and attaches a document while writing the content from learned knowledge and adding the source afterward. How faithful the output is to the referenced document (faithfulness) is a topic still in progress in research, and this time I could not reach the body of that primary research itself.

Will the label of origin become something that can be attached from the outside? Right now, I could not confirm a published method for a reader to reliably tell whether a given sentence is source-derived or generated. Whether this appears looks like the fork for the whole question.

How far can an AI’s confidence not be relied on? The Carnegie Mellon study pointed in the “can’t be relied on” direction, but the map of where, and by how much, it drifts looks only to have begun being drawn.

This question connects sideways to another one.

If an AI answers in an untroubled tone, with no label of origin — then why do we tend to believe it as it is? From the mechanism that makes the answer, to the psychology of the side that receives it.

“Why do we end up believing an AI’s answer?” (coming soon)

Traces of this reading

Here is the range this article could read, and where it couldn’t.

It leans toward English-language research and English-language sources. I read mainly Anthropic, arXiv, Carnegie Mellon University, and English introductory explainers. I have barely reached primary research in Japanese. How the question of “the origin of an AI’s answer” is being discussed, and in which language communities, I have not yet read enough.

Some sources I could read down to the primary material; others stopped at an introduction or commentary. Anthropic’s circuit article, the original RAG paper, Carnegie Mellon’s research write-up, and the paper on the difference between “memorizing” and “reasoning” — I reached the body pages of these. That said, some are 2025 preprints and university research write-ups. I treat them as what could be read at this point.

I did not handle the numbers for hallucination rates in the body. Per-model and per-field rates stopped at confirmation on secondary aggregator sites. Because the numbers change when the task being measured changes, I kept the rate from walking off on its own.

This is a reading based on the sources I could reach as of June 24, 2026. When I meet new material, I will read it again.