A new analysis of 858,457 sites hosted on the Duda platform shows how AI crawlers interact with websites at scale. The data provides a clearer view of the growth in crawl activity and what SEOs and businesses should do to increase traffic from AI search.
AI exploration has already reached scale
AI mining is growing rapidly, with more and more requests tied to real-time responses and most of this activity coming from a single provider. The data creates a model that shows which sites are crawled and, more importantly, why.
Year-on-year growth in LLM references
LLM referral traffic has grown sharply over the past year, with several platforms seeing significant gains from very different starting points.
AI Reference Traffic Models
- Total LLM references: 93,484 to 161,469 (+72.7%)
- ChatGPT: 81,652 to 136,095 (+66.7%)
- Claude: 106 to 2,488 (23x growth)
- Co-pilot: 22 to 9,560 (from near zero)
- Perplexity: 11,533 to 13,157 (+14.1%)
The growth is not uniform, but overall, referral traffic from AI systems is increasing. This makes AI-generated discovery a growing traffic source, not a marginal one.
Crawlers scrape more and more content to find answers
AI crawlers are no longer used primarily for indexing purposes, with most activity now related to real-time content retrieval to generate responses for users.
Most analytics now occur in response to user queries rather than to create an index, changing how content is accessed and used.
- User fetching (real-time responses): 56.9% of all bot activity, driven almost entirely by ChatGPT
- Training (model learning): 28.8%, split between GPTBot and other model crawlers
- Discovery (content indexing): 14.3%, spread across multiple systems
- ChatGPT user recovery volume: ~39.8 million visits
Trends are largely influenced by ChatGPT, which is responsible for almost all real-time recovery activity. This means that the move toward response-driven exploration is not evenly distributed, but concentrated on a single platform that determines how content is accessed. This trend could change with the new Google Agent Crawler.
Market Concentration in AI Exploration
AI crawler activity is highly concentrated, with OpenAI responsible for the vast majority of queries, reflecting its position as the primary tool users rely on to find and retrieve information.
- OpenAI: 55.8 million visits (81.0%)
- Anthropic (Claude): 11.5 million (16.6%)
- Perplexity: 1.3 million (1.8%)
- Google (Gemini): 380,000 (0.6%)
Most AI mining activity comes from OpenAI, which fits ChatGPT’s role as the primary information search and retrieval tool. Claude follows with a much smaller share, suggesting a different usage pattern, while the rest of the market represents a minimal share of crawler activity.
Scale and what it actually means
AI crawling already works across much of the web, reaching hundreds of thousands of sites and generating tens of millions of queries in a single month.
More than half of all sites in the dataset received at least one visit from an AI crawler, showing that this activity is not limited to a small subset of websites.
- Total sites analyzed: 858,457
- Sites with at least one AI bot visit: 506,910 (59%)
- Total AI crawler visits (February 2026): 68.9 million
AI crawling is not limited to high-profile or high-traffic sites. It is already widespread, with constant activity across the majority of the web.
The relationship between crawling and real traffic
Sites that allow AI systems to crawl them consistently show stronger engagement across multiple metrics.
What the data actually shows is:
- Sites that enable AI crawling receive significantly more human traffic
- Higher traffic sites are more likely to be crawled
Sites that allow crawling by AI systems receive significantly more human traffic, with an average of 527.7 sessions compared to 164.9 for sites that are not crawled. This does not establish causation, but shows a clear alignment between which sites attract human visitors and how often AI systems revisit them.
- Average human traffic (crawl by AI or not): 527.7 vs. 164.9 (3.2 times higher)
- Average of completed forms: 4.17 versus 1.57 (2.7 x higher)
- Average click-to-call: 8.62 versus 3.46 (2.5x higher)
- Sites with more than 10,000 sessions: 90.5% crawl rate
AI systems do not discover and flag weak or inactive sites. They return to sites that already attract human visitors. For marketers, this shifts the focus away from trying to “get searched” and focuses on creating real audience demand, since visibility in AI systems seems to follow it.
What correlates with more exploration
The study compared sites that include specific third-party integrations, structured features, and content depth with those that do not and identified which ones mattered most for AI crawler activity and referrals.
Across the entire dataset, 59% of sites received at least one visit from an AI crawler in February 2026. Sites that are crawled most often tend to have a combination of three types of signals: external integrations, structured business data, and content depth.
1. External integrations
These integrations connect the site to external systems that validate and distribute business information.
- Yext Integration: Exploration rate of 97.1% compared to ~58% without (+38.9 pp)
- Review integrations: Crawling rate of 89.8% versus 58.8% without, 376.9 average crawler visits
Sites connected to external data and review systems are being crawled with increasing frequency, indicating that AI systems are relying on these integrations as signals that a business is real, verifiable, and worth revisiting.
2. Structured site functionalities and commercial data
These are integrated into the site and help AI systems understand and verify the identity of the business.
- Google Business Profile Sync: 92.8% crawl rate vs. 58.9% without, 415.6 average crawl visits
- Local scheme: 72.3% vs 55.2% (+17.1pp), 22.3% adoption
- Dynamic pages: 69.4% versus 58.2% (+11.2 pp)
- E-commerce: 54.2% versus 59.2% (-5.0 pp)
Sites that clearly define their business identity and structure their information in a machine-readable manner are crawled more often, showing that AI systems favor sites from which they can easily interpret, verify, and extract information.
3. Content depth (volume of usable data)
Sites with more content provide more opportunities for AI systems to retrieve, reference, and reuse information in responses.
- Sites with more than 50 blog posts: 1,373.7 average visits by bots vs. 41.6 without a blog (~33 times higher)
Sites with more content are crawled much more often, indicating that AI systems can return to sources that offer a greater amount of usable information to rely on to generate responses.
Completeness of local business schema = more exploration
This part of the research focuses specifically on the local business schema, comparing how the entire implementation of the schema to communicate business details relates to the activity of AI crawlers. Fields measured include business name, phone number, address, hours, and social profiles.
- No local schema fields: 55.2% crawl rate
- 10 to 11 schema fields completed: 82% exploration rate
- Sites with a more complete local schema have a 26.8 percentage point higher crawl rate (82% vs. 55.2%).
Sites that provide more comprehensive local business information in structured form are crawled more often and receive more crawl visits. As more of these fields are populated, the crawl rate and crawl frequency increase.
Data shows that clearly defined local business data makes it easier for AI systems to identify, verify, and then review all prerequisites for receiving traffic from AI search.
Takeaways
AI crawling is a parallel method of content discovery and the research shows clear patterns for the sites visited most often by crawlers.
- AI crawling works alongside traditional search, changing the way content is accessed and reused
- Sites with structured local signals, deeper content, and more comprehensive schema are crawled more often
- Multiple reinforcing signals appear together at the same sites, not in isolation
- Data shows direction, not causation, but patterns are consistent
Data shows that sites that allow AI crawlers to easily index and revisit them tend to perform better. Interestingly, sites that present clear, structured, and verifiable information, while still generating real audience demand, are more likely to be revisited by AI systems and benefit from AI search-generated traffic.
Read the research: Duda Study Finds AI-Powered Websites Drive 320% More Traffic to Local Businesses
Featured image by Shutterstock/Preaapluem





