Google’s anti-spam update now reaches AI Answers. Application is difficult


Google has started rolling out the June spam updatethe second of the year. He applies documents anti-spam policiesand one of these policies now covers more ground than before.

Google’s anti-spam rules consider attempts to “manipulate generative AI responses” in search a violation, and this is one of the policies enforced by the update.

A Cornell Tech preprint picked up by 404 Media explains why the policy is more difficult to enforce than its wording suggests. The community pages that AI search agents rely on may also contain comments from third parties, and a comment may contain a recommendation that the author never wrote.

What Google describes as spam therefore involves the very recovery on which these agents rely. And research reveals that the obvious defenses all have drawbacks.

For anyone trying to push a brand toward AI-generated responses, know that the line between optimization and spam is redrawing.

The issues

AI Mode Tracking by SE Ranking Google found that Google was increasingly pointing to its own properties, with self-citations accounting for about a fifth of AI Mode’s citations in its latest report.

With more citations pointing to Google and fewer to external websites, the appeal of making one increases accordingly.

A gray market has already begun to form, and the Cornell authors point out that marketers are testing ways to push AI-generated responses.

Meanwhile, businesses don’t have the data they need to see what’s happening. Like our past coverage of agent research presented, there is no dashboard that tells a site whether it landed in an AI response, whether it was cited in a generated report, or whether it was ignored.

The result is a violation that Google can name, but the site involved often can’t see.

What the research found

The document, titled “Deep Search Agents Can Be Poisoned Via User-Generated Content“, which has not been peer-reviewed, probes a weakness in the way AI research tools collect their sources. These tools answer a question in run a batch of associated subqueriesgrabbing pages that keep coming back and putting together a report with citations.

The analysis revealed that the same community pages appeared repeatedly in these subqueries. Within a single topic group, a user-generated page appeared in up to 48% of queries, and user-generated platforms accounted for 17-23% of each URL retrieved. Edit one of these recurring pages and the change can ripple through reporting on an entire topic.

The authors found that approximately 13 words of text inserted on a recurring page were enough to insert an attacker’s chosen entity into the final report in 38% to 51% of page retrieval sessions.

Scatter the same text over a handful of pages and the figure jumps from 42% to 62%. Even buried in a full page, where it made up less than 4% of what the agent read, planted text still appeared in 30-53% of sessions.

Three open source search agents passed the tests, STORM, Co-STORM and OmniThink, all run in a simulation so nothing on the live web would be affected.

Where application is difficult

Google can label the manipulation of AI responses as spam and take action based on what it detects. Catching it is the hardest part. The planted text reads like real advice, and it’s on the same pages that the tools were always going to read, so distinguishing it from a normal message is the main problem.

The research team looked for a defense against planted text, but found none. They tried to remove user-generated sources, filter them with a language model before use, and search the final report for claims that didn’t hold water.

None of the three stopped the attack without worsening the results for the user. Remove user-generated sources and you lose the community details that are worth using for AI research tools.

The tools that most people use do not fall under this test. ChatGPT Deep Research and Gemini Deep Research do research that researchers couldn’t poison without crossing an ethical line, so they only measured citation habits. Gemini relied on user-generated content 12.1% of the time, which the authors call a hint of exposure and not a tested result. The OpenAI tool achieved much less.

Why it matters for research professionals

Actions that can help include a brand in AI responses are similar to manipulative tactics that Google calls “spam,” such as inserting mentions on sites read by these tools. We don’t know where Google’s line is between getting a mention and engineering.

For e-commerce and local brands, the danger comes from the other side.

The test cases were ordinary questions people ask, like what service to call, what product to buy, and where to eat. A rival or scammer can slip an unfamiliar name into these responses, right next to the legitimate options, and the excluded brand will never know.

For news publishers and big brands, the concern is to trust the answer to which their name corresponds. A quote from an AI tool is considered a win, but a quote only reflects what the tool extracted, not whether that page was correct, and the response may be biased by content the brand never wrote.

There is no silver bullet to all of this. AI visibility has become a surface you actively monitor, not just a channel you passively optimize for.

Looking to the future

The authors called user-generated manipulation an open problem that no single platform can solve. Reddit signaled its long-standing fight against coordinated manipulation, and Google added context labels to some Reddit-sourced material in AI previews. Neither touches on the recovery focus the article discusses.

Google has not indicated how it intends to enforce generative AI manipulation, whether through a dedicated update or through its SpamBrain system and manual reviews that it relies on for most violations.

For now, the policy considers this behavior prohibited, and review of the AI’s responses remains with whoever reads them.

More resources:


Featured Image: Bravo-J-ane/Shutterstock



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *