Why does my agent cite sources that don't exist?

The citation has an author, a title, a venue, page numbers, even a DOI. Every surface feature is right. The only thing wrong is that the source was never real.

B

Balagei G Nagarajan

6 MIN READ


Short answer. Your agent invents citations because it was trained to produce the most plausible-looking reference string, not to check that the source exists. A made-up citation has every correct surface feature and no real source behind it. Generate references only from documents you actually retrieved, verify each one resolves and supports its claim, and let the agent say "not found", and the fabricated citations stop reaching the reader.

A pristine, official-looking citation card glowing in the foreground while the library shelf behind it where the source should sit is empty

A flawless-looking citation in front, an empty shelf behind it. The form is perfect; the source is not there. Hero image.

Key facts.

  • The base rate is not small: across 13 leading models and 40 domains, generated citations were hallucinated between 14.23% and 94.93% of the time, and fabricated DOIs reached 89.4% in the humanities against 29.1% in natural sciences (GhostCite, arXiv:2602.06718, 2026).
  • Even strong models invent sources freely: GPT-4o produced entirely fabricated citations in about one in five references in simulated literature reviews (reported, 2025).
  • It is leaking into the published record: an audit of biomedical papers found fabricated references rising from roughly one in 2,828 papers in 2023 to one in 458 by 2025 (The Lancet, 2026).
  • Courts are sanctioning it: a public database of AI-hallucination filings has logged hundreds of cases worldwide with well over a hundred lawyers sanctioned, starting with Mata v. Avianca in 2023 (Damien Charlotin, AI Hallucination Cases database, 2025).

What does a hallucinated citation look like?

It looks completely real, which is the whole problem. The model returns a citation with a plausible author, a believable title, a real-sounding venue, correct-looking page numbers, and even a well-formed DOI. Every surface feature matches what a true citation looks like, because the model learned the shape of citations from millions of real ones. The only thing missing is correspondence to an actual source. Nothing about the formatting tells you it is fake. You find out only when you try to look it up and it is not there.

# Asked for a source, the model returned this, fully formatted:
Chen, L. & Park, S. (2024). "Retrieval-Aware Decoding for
  Faithful Question Answering." Proceedings of ACL 2024,
  pp. 1123-1139. doi:10.18653/v1/2024.acl-long.412
# The authors, title, page range, and DOI do not resolve to anything.

Why does the model invent a citation instead of saying it doesn't know?

Because it was trained to produce the most plausible next token, not to check whether a source exists. When you ask for a reference, the model generates the string that best fits the pattern of citations in its training data, and for a specific claim with thin support, the most plausible-looking string is often a synthesized one rather than a remembered real one. The model has no built-in index it can query to confirm the paper exists, and nothing in standard training rewards it for abstaining. Next-token prediction optimizes for fluent, well-formed citations over verifiable ones. Retrieval-augmented generation narrows the gap, because now real sources sit in the context, but it does not close it: the model can still mis-attribute, blend two papers, or cite a retrieved document for a claim it does not actually support, unless you verify after generation.

A single citation shown as a labeled card with callouts: author, title, venue and DOI each marked 'well-formed', and a final callout 'no matching source' pointing at the whole card

The anatomy of a fake citation: every field is well-formed, and none of it resolves to a real source. Diagram.

How often does this actually happen?

Often enough that you cannot treat any model-supplied reference as trustworthy. The GhostCite analysis measured hallucinated citations between 14.23% and 94.93% across 13 models and 40 domains, with the worst rates in the humanities where fabricated DOIs hit 89.4%, against 29.1% in the natural sciences (arXiv:2602.06718). Even capable models are not safe: GPT-4o fabricated roughly one in five citations in simulated literature reviews, and the existence rate of model-generated references in some studies sat at just 40 to 50%. The rate depends on popularity: well-known works are cited correctly, while the long tail of specific, obscure sources is where fabrication concentrates, which is exactly the long tail a research agent is most useful for.

It is not hypothetical: courts are sanctioning it

The clearest public evidence is the courtroom. In 2023, in Mata v. Avianca, two New York lawyers submitted a brief full of cases ChatGPT had invented, complete with fake quotes and citations, and were sanctioned 5,000 dollars once the court found that none of the cases existed. It kept happening: in February 2025, lawyers at Morgan and Morgan were sanctioned for a motion citing eight nonexistent AI-generated cases. A public database of AI-hallucination filings has since logged hundreds of cases worldwide, with well over a hundred lawyers sanctioned (Damien Charlotin, AI Hallucination Cases database). These were not careless typists. They were professionals who trusted a fluent, confident, completely fabricated citation.

How do you stop it?

TacticWhat it does
Cite only retrieved sourcesGenerate references from documents actually fetched, never from memory
Verify every citation resolvesCheck each cited source exists and is reachable before showing it
Quote-check the attributionConfirm the source actually supports the claim, not just that it exists
Allow "not found"Let the agent abstain instead of inventing a source under pressure
Link, do not free-typeReturn a real URL or ID from the index, not a model-generated one
Show provenancePut the retrieved passage next to the claim so a human can spot a mismatch

The pattern is that a citation is a claim about the world, and a model trained to sound right will produce one that looks perfect and points nowhere. Generate references only from sources you actually retrieved, verify every one resolves and supports its claim, and let the agent say it could not find a source rather than invent one. None of that is a bigger model, which fabricates more convincingly. It is a grounding-and-verification layer that knows the difference between a citation it can stand behind and one it guessed, which is what VibeModel builds as the Pattern Intelligence Layer.

Frequently asked questions

Doesn't RAG fix hallucinated citations?
It helps but does not solve it. Real sources in the context reduce invention, but the model can still mis-attribute, merge two papers, or cite a retrieved doc for a claim it does not support. You still verify each citation resolves and backs its claim after generation.

Why are the fake ones so convincing?
Because the model learned the exact shape of real citations, author, title, venue, pages, DOI, from millions of examples. It reproduces that shape perfectly while the content underneath is synthesized, so nothing in the formatting signals that it is fake.

Which sources get fabricated most?
The obscure long tail. Popular, frequently-cited works are usually correct; specific, niche, or recent sources, exactly the ones a research agent is most useful for, are where fabrication concentrates, and where DOI hallucination is highest.

What is the one check that helps most?
Resolve every cited source against the real index before showing it, and drop any that does not exist. Pair that with quote-checking that the source supports the claim, and most fabricated citations never reach the reader.


Share this post

Join the discussion

Have a take, a war story, or a question? Sign in with GitHub to comment and react. Comments are powered by GitHub Discussions, ad-free and yours to moderate.

Continue Reading

Find where your agent breaks, before you build it

Faultmap maps where your agent will fail from the goal and your data, then hands you the first test suite it has to pass.