The Facts About Google Click Signals, Rankings, and SEO


Clicks as a ranking-related signal have been a topic of debate for over twenty years, although these days most SEOs understand that clicks are not a direct ranking factor. The simple truth about clicks is that it is raw data and, surprisingly, processed with some similarity to scores from human raters.

Clicks are a raw signal

The DOJ’s September 2025 antitrust memorandum mentions clicks as a “raw signal” used by Google. It also classifies content and search queries as raw signals. This is important because a raw signal is the lowest level data point that is processed into higher level ranking signals or used to train a model like RankEmbed and its successor, RankEmbedBERT.

These are considered raw signals because they are:

  • Directly observed
  • But not yet interpreted or used for training data

The DOJ document quotes Professor James Allan, who testified as an expert on behalf of Google:

“Signals vary in complexity. There are “raw” signals, such as the number of clicks, the content of a web page, and the terms of a query.

…These signals can be created with simple methods, such as hit counts (for example, how many times a web page was clicked in response to a particular query). Identifier.
at 28:59:3–28:60:21 (Allan) (regarding the Navboost signal)”

It then compares the raw signals with the way they are processed:

“At the other end of the spectrum are innovative deep learning models, which are machine learning models that discern complex patterns in large data sets.

Deep models find and exploit patterns in large data sets. They add unique capabilities at a high cost.

Professor Allan explains that “high-level signals” are used to produce the “final” scores of a web page, particularly in terms of popularity and quality.

Raw signals are data that must be further processed

Navboost is mentioned several times in the September 2025 antitrust document as popularity data. This is not mentioned in the context of clicks having a ranking effect on individual sites.

This is called a way to measure popularity and intent:

“…popularity measured by user intent and commenting systems, including Navboost/Glue…”

And elsewhere, as part of explaining why certain Navboost data is privileged:

“This is “popularity as measured by user intent and feedback systems including Navboost/Glue”…”

As part of explaining why certain Navboost data is privileged:

“As part of the proposed remedy, Google must make available to qualified competitors… the following data sets:

1. User-side data used to construct, create or operate the GLUE statistical model(s);

2. User-side data used to train, create or operate the RankEmbed model(s); And

3. User-side data used as training data for GenAI models used in research or any GenAI products that may be used to access research.

Google uses the first two data sets to create search signals and the third to train and refine the models underlying the AI ​​insights and (arguably) the Gemini app.

Clicks, like scores from human reviewers, are just a raw signal that is used further up the algorithm chain to train AI models to better match web pages to queries or to generate a quality or relevance signal that is then added to the rest of the ranking signals by a ranking engine or rank change engine.

70 days of research logs

The DOJ document refers to the use of 70 days of research logs. But that’s only eleven words in a larger context.

Here is the part that is frequently cited:

“70 days of research logs and scores generated by human reviewers”

I understand, it’s simple and straightforward. But there is more context:

“RankEmbed and its later iteration RankEmbedBERT are ranking models that rely on two primary sources of data: % (redacted) from 70 days of search logs plus scores generated by human raters and used by Google to measure the quality of organic search results.”

The 70 days of search logs are not click data used for ranking purposes in Google, AI Mode or Gemini. This is aggregate data that is then processed in order to train specialized AI models such as RankEmbedBERT which, in turn, rank web pages based on natural language analysis.

This part of the DOJ document does not claim that Google directly uses click data to rank search results. This is data, like human rater data, that is used by other systems for training data or for further processing.

What is Google RankEmbed?

RankEmbed is a natural language approach to identifying relevant documents and ranking them.

The same DOJ document explains:

“The RankEmbed model itself is an AI-powered deep learning system that has a strong understanding of natural language. This allows the model to more efficiently identify the best documents to retrieve, even if a query is missing certain terms.”

It is trained on less data than previous models. The data is partly made up of query terms and pairs of web pages:

“…RankEmbed is trained on 1/100th of the data used to train previous ranking models, while providing higher quality search results.

…Among the underlying training data is information about the query, including the salient terms that Google derived from the query and the resulting web pages.

This is training data for training a model to recognize the relevance of query terms for web pages.

The same document explains:

“The data behind RankEmbed models is a combination of click and query data and web page ratings by human reviewers.”

It’s clear that in the context of this specific passage, it’s describing the use of click data (and human reviewer data) to train AI models, not to directly influence rankings.

What about Google’s click ranking patent?

In 2006, Google filed a patent related to clicks called: Changing the ranking of search results based on implicit user feedback. The invention relates to the mathematical formula for creating a “relevance measure” from aggregated raw click data (plural).

The patent distinguishes between the creation of the signal and the act of classification itself. The “relevance measure” is passed to a ranking engine, which can then add it to existing ranking scores to rank search results for new searches.

Here is what the patent describes:

“A ranking subsystem may include a ranking modification engine that uses implicit user feedback to cause search results to be reranked to improve the final ranking.
presented to a user of an information retrieval system.

User selections in search results (click data) can be tracked and transformed into a fraction of a click that can be used to rerank future search results.

This “click fraction” is a measure of relevance. The invention described in the patent is not about tracking the click; this is the mathematical measurement (the click fraction) that results from the combination of all those individual clicks. This includes short click, medium click, long click and last click.

Technically, this is called the LCIC fraction (Long Clicks divided by Clicks). This is the plural of “clicks” because it involves making decisions based on the sum of many clicks (aggregates), not the individual click.

This fraction of clicks is an aggregate because:

  • Addition:
    The “top number” used for ranking is the sum of all of these individual clicks weighted for a specific query-document pair.
  • Normalization:
    It takes this sum and divides it by the total number of all clicks (the “second number”).
  • Statistical smoothing:
    The system applies “smoothing factors” to this overall number to ensure that a single click on a “rare” query does not unfairly skew the results, especially for spammers.

This 2006 patent describes its weighting formula like this:

“A basic LCC click fraction can be defined as:

LCC_BASE=(#WC(Q,D))/(#C(Q,D)+S0)

where iWC(QD) is the sum of weighted clicks for a query URL pair…, iC(QD) is the total number of clicks (ordinal number, unweighted) for the query URL pair, and S0 is a smoothing factor.

This formula describes summing and dividing data from many users to create a single partition for a document. The “query-URL” pair is a “set” of data that stores the click behavior of every user who has ever typed that specific query and clicked on that specific search result. The smoothing factor is the anti-spam part which includes not counting single clicks on rare search queries.

Even in 2006, clicks are just raw data that is transformed further up the chain through several aggregation stages, into a statistical measure of relevance before even reaching the ranking stage. In this patent, clicks themselves are not ranking factors that directly influence whether or not a site ranks. They were used globally as a measure of relevance, which in turn was fed into another ranking engine.

By the time the information reaches the ranking engine, the raw data has moved from individual user actions to an overall measure of relevance.

  • Thinking about clicks versus rankings isn’t as simple as clicks determine search rankings.
  • Clicks are just raw data.
  • Clicks are used to train AI systems like RankEmbedBert.
  • Clicks do not directly influence search results. It has always been raw data, the starting point for systems that use the data in aggregate to create a signal that is then fed into Google’s ranking decision-making systems.
  • So yes, like data from human raters, raw data is processed to create a signal or to train AI systems.

Read the DOJ memorandum in PDF format here.

Read about four research articles on CTR.

Read the 2006 Google patent, Changing the ranking of search results based on implicit user feedback.

Featured image by Shutterstock/Carkhe



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *