Google's Mueller Says llms.txt Can't Help LLMs Differentiate Sites

Google’s John Mueller argued that LLM systems cannot use files like llms.txt to decide which websites to display for a given query.

He made the comments on a recent episode of Unofficial researchthe podcast from the Google Search Relations team.

His comment points to a broader signal problem, not just intentional play. Even a well-written llms.txt file still contains information self-reported by the site that wishes to be chosen.

For discovery, Mueller referred to normal HTML pages and internal links.

What Mueller said

The conversation started with the question of whether publishers should convert websites to Markdown for LLMs. Mueller and co-host Martin Splitt agreed that HTML remains the foundation for exploration and discovery.

The discussion became more specific when Mueller turned to llms.txt. He described the discovery use case as a dead end:

“Basically, you’re telling these systems that I have the best website ever. And here are all the pages that everyone has to go to. And you have to buy all of my products or whatever you put on there. So in the LLM system, basically by design, it can’t trust what’s here to differentiate between different websites.”

His argument boils down to differentiation. If sites use llms.txt to promote themselves, the files may make similar claims. An LLM that decides which site best answers a query still needs another way to tell them apart.

What “by design” could mean

“By design” could mean two different things, and Mueller did not specify which.

A reading is architectural. LLM systems evaluate web content and cannot use self-reported files when selecting sources.

The other reading treats it as a signal problem. Self-reported signals lose value when everyone provides them. Meta keywords stopped working for the same reason. Every site was filling them up and search engines were failing to extract a useful ranking signal.

Both readings come to the same conclusion about the discovery. But they imply different things about whether the limitation might change over time.

Where Mueller sees a role

Mueller did not reject all uses of llms.txt. He identified one case where this could help:

“If someone is already on your website, perhaps some sort of automated system would be helpful.”

He used the example of an agent trying to purchase a photograph from a specific site. The LLM would visit the site and look for instructions on how to complete the purchase.

The argument separates discovery from navigation. llms.txt cannot help an LLM choose which site to visit. But it could be useful once the agent is already there, like a store directory for someone who has already entered.

Beyond the game argument

Mueller called creating Markdown pages for bots a “stupid idea”. He is also compared llms.txt to keyword meta tag.

Roger Montti of SEJ wrote that llms.txt is “inherently unreliable” because nothing stops site owners from adding self-serving content. SE Ranking analysis of 300,000 domains found no link between adoption of llms.txt and citation frequency in LLM responses.

These arguments have focused on what happens when people manipulate the files. Mueller’s podcast commentary adds the nuance that there is no mechanism in the files to help an LLM choose one site over another.

Why it matters

The game’s argument against llms.txt has always had a counterargument available. Platforms could learn to penalize manipulation, the same way search engines treat spammy structured data.

The differentiation argument poses a more difficult problem. Penalizing manipulation may address abuse, but it does not explain how self-reported files help an LLM choose one site over another. Your most accurate llms.txt file still can’t tell an LLM to choose your site over a competitor’s.

Looking to the future

Standards for how agents navigate sites aren’t yet set, Mueller acknowledged. He mentioned WebMCP alongside other file types under discussion.

None have become a standard. By his estimation, it could take six months to a year or more for the agent systems to agree on a format. The discovery layer, where HTML and internal links already work, is not part of this discussion.

Source link

Google’s Mueller Says llms.txt Can’t Help LLMs Differentiate Sites

What Mueller said

What “by design” could mean

Where Mueller sees a role

Beyond the game argument

Why it matters

Looking to the future

Leave a ReplyCancel Reply

Where synthetic data fits into customer research

Google defends AI training as fair use in its governance document

Which plastic belt widths are best suited to different product dimensions?

What Mueller said

What “by design” could mean

Where Mueller sees a role

Beyond the game argument

Why it matters

Looking to the future

Leave a ReplyCancel Reply

Trending now

Where synthetic data fits into customer research

Google defends AI training as fair use in its governance document

Which plastic belt widths are best suited to different product dimensions?