Meet the New Web Crawlers: AI Bots Are Closing in on Search Engine Bots

Meet the New Web Crawlers: AI Bots Are Closing in on Search Engine Bots

822 views

Today, more bots are crawling the web than before. But it’s not just search engines doing this work. AI bots are also crawling websites to collect data and train their AI systems.

The problem is that AI bots are new at this. They still make mistakes, waste resources, and go to places they don’t need to. Many popular websites have already started blocking these bots because of this behavior. We believe AI bots will improve over time, but it might take a while. Right now, most of them don’t even load or view full pages properly.

Bing Learned from Past Mistakes

Bing once faced criticism for crawling too aggressively, creating unnecessary strain on servers. Over the years, the team corrected this by adopting a more measured and considerate crawling approach.

Key Improvements Bing Introduced

  • Adaptive crawl control: Bing redesigned its logic to read server signals in real time. If a site slows down, the bot reduces its crawl rate to prevent overload.
  • IndexNow adoption: With IndexNow, websites can notify Bing whenever a page is created, updated or deleted. This eliminates blind crawling and reduces wasted server requests.
  • Cleaner, more predictable behaviour: Bing now follows robots.txt rules more carefully, avoids repetitive requests and maintains a consistent crawl footprint that is easier for developers to manage.

Data from Cloudflare Radar Gives Us a Clear Picture

Data from Cloudflare Radar Gives Us a Clear Picture
Cloudflare Radar observes roughly one fifth of global internet traffic, making it one of the most reliable public datasets for understanding crawler behaviour.

Why Cloudflare Radar Matters

  • Covers approximately 20 percent of global internet traffic: This scale ensures the patterns we see are representative of real internet behaviour rather than isolated datasets.
  • Tracks individual bots separately: Radar identifies each crawler individually rather than grouping them under vague categories like “AI bots” or “SEO bots.”
  • Shows long term trends and spikes: This allows us to see whether AI bot growth is steady or driven by temporary spikes.

Requests for AI, Search Engine, and SEO Bots

Requests for AI, Search Engine, and SEO Bots
Search engine crawlers still dominate overall crawling volume, but AI bots have quickly risen to the second position. SEO bots continue to operate at a predictable and stable pace.

Current Order of Dominance

  • Search Engine Bots: Googlebot, Bingbot and related crawlers continue to lead due to their responsibility for indexing the web.
  • AI Bots: GPTBot, Amazonbot and other model training crawlers already sit in second place despite being relatively new.
  • SEO Bots: Ahrefs, Semrush and similar tools follow in third position, with steady but limited growth.

Individual Bot Requests

When reviewing traffic bot by bot, a few clear leaders emerge. Google holds most of the crawling footprint across its search and AI systems.

Top Observers from the Dataset

  • Googlebot: Remains the most active crawler on the web, far ahead of any competitor.
  • Google’s AI focused crawlers: These include bots gathering data for AI Overviews, image recognition and document processing.
  • Amazonbot: Used heavily for product search, recommendations and internal AI model training.
  • GPTBot: Collects high quality text for OpenAI’s models and, in some datasets, nearly matches Bingbot in volume.
  • Ahrefsbot: The dominant SEO crawler, often surpassing other SEO tools by a wide margin.

We’re also very efficient (Your “we” = Ahrefs/Yep)

We’re also very efficient (Your “we” = Ahrefs_Yep)
Your system (as described in the original content) operates with purpose, drawing from both IndexNow and your own search engine signals.

Key Strengths of Your Crawling System

  • Intent driven crawling: You fetch pages based on actual need, freshness and relevance instead of crawling entire sites blindly.
  • IndexNow integration: Signals from Yep inform when to revisit updated or newly created pages.
  • Prioritised crawling logic: Pages with recent changes, strong linking patterns or structured data receive priority crawling.
  • Reduced redundancy: The crawler avoids repeated retrievals, respects site limits and keeps unnecessary server load low.

Requests Over Time for AI, Search Engine, and SEO Bots

Requests Over Time for AI, Search Engine, and SEO Bots
AI bot activity has surged noticeably throughout the year, indicating a structural shift in how data is gathered for AI models.

  • AI bot growth accelerated in early Q2: By May, AI bots were responsible for nearly one quarter of all bot traffic.
  • Search engine bots remain dominant but stabilising: Their volume now grows more slowly, showing that indexing systems are reaching maturity.
  • SEO bots remain steady: Their crawl patterns remain predictable, mostly tied to site audits and tool updates.

Why AI Bots Grew So Sharply

  • Frequent model updates and retraining cycles
  • Expansion of multimodal crawlers collecting text, images and structured data
  • New AI companies building their own data pipelines
  • Wide adoption of assistants that need real world datasets continuously

What This Signals for the Web

  • AI crawling is becoming permanent and will continue expanding
  • Websites must prepare for increased bot traffic and optimise resources
  • Bot management tools will become essential for performance and security

Final Thoughts

Every time bots crawl a site, it uses website resources. There is a kind of deal between search engines and websites: search engines send visitors, and websites let them crawl. Google still sends traffic, so people don’t block it. AI assistants should follow the same rule.

Partner with our Digital Marketing Agency

Ask Engage Coders to create a comprehensive and inclusive digital marketing plan that takes your business to new heights.

Contact Us

If AI bots want access, they should also send traffic to websites. They should also show search data and help website owners see how visible their site is. If not, they will likely be blocked more and more. This is already starting to happen.

Right now, AI search traffic makes up just 0.1% of all website traffic. To earn access, AI tools must send more traffic and give more value to websites.

FAQs:

AI crawlers gather broad datasets for model training, while search engine crawlers focus on indexing pages for ranking and retrieval.

They do not influence rankings directly, but frequent AI crawling can shape how your content appears in AI generated answers.

It depends on your goals. Allowing them can increase AI visibility, while blocking protects content from AI training.

No, unless that content is publicly accessible. AI crawlers follow similar restrictions as search engine bots.

Use server logs, bot detection tools, and analytics filters to monitor AI crawler patterns.

Content scraping, higher bandwidth usage, and the possibility of your material being used without attribution.

They can increase AI visibility, improve brand recognition, and help your content appear in model generated answers.

Use clear structure, strong internal linking, and consistent topical coverage to help AI interpret your pages.

Not entirely. They serve different roles, though AI crawlers will continue to grow.

Industries with high informational depth such as finance, health, education, software, and ecommerce.

Share this post