Google doesn't have unlimited time and resources to crawl every page on the web all the time, not all of them will be crawled. Optimizing the crawl budget can be the key to growing your business site.
When we talk about natural referencing, the first reflex is to talk about optimizing content, semantics, netlinking. Most often, the notion of how the crawl budget works is neglected , such as the structure of the site and its SEO optimization .
Many website webmasters don't realize that this is an important factor.
When you add new pages and update existing ones, you want search engines to detect them as soon as possible. The sooner Google does the indexing or refreshes, the sooner you can take advantage of it.
In this article we will see what the crawl budget is and give you the key elements and tips. Why is it important to optimize it? What factors can influence it?
These "spiders" also detect the links on the pages visited and also try to explore these new urls. It is by following the links that the Google robot called "Googlebot" travels the web and discovers new pages, new sites...
Not all of them are analyzed on the same day, some are added to a list of tasks to be carried out later.
The crawl frequency , also called exploration frequency, refers to the frequency at which a website is explored by robots and mainly Googlebot.
It can be related to the popularity and overall visibility of the site. The only way to have your content listed in organic results is to have them indexed, and the only way to index them is to have your site crawled by Googlebot .
A site with proper navigation helps in crawling and indexing your site.
How often you update your site affects how often it crawls. The popularity and authority of the domain also matters . Sites that drive lots of traffic and create really engaging content will be crawled more often than others.
How to improve the crawl rate?
- Have an optimum site structure.
- Update your site.
- Create an internal link.
- Troubleshoot errors.
- Focus exploration on important urls.
- Reduce site loading time.
- Create sitemaps.
- Block access to unwanted pages.
- Optimize the weight of the images.
An increase in crawl frequency will not necessarily lead to better positions in the results. Google uses hundreds of signals to rank results, and although crawling is necessary to appear in the results, it is not a ranking signal.
Google has to crawl billions of sites and an incredible amount of urls that make them up . Not being able to explore everything nor to do it permanently, it is necessary for the sake of saving server resources to set criteria and priorities.
So he needs a way to prioritize his efforts. Assigning a budget to each site helps the engine perform this task.
The number of times an engine spider crawls your site in a given amount of time is called the “crawl budget” . The frequency varies slightly from day to day, but overall it is relatively stable.
The number of pages crawled each day is determined by the size of your site, the “health” of your site and its popularity (number and quality of links made to your site).
IT is shared by all Google robots of which Googlebot is the main one.
- AdSense pour mobile: user-agent = Mediapartners-Google
- AdSense: user-agent = Mediapartners-Google
- AdsBot Web pour mobile Android: user-agent = AdsBot-Google-Mobile
- AdsBot Web pour mobile: user-agent = AdsBot-Google-Mobile
- AdsBot: user-agent = AdsBot-Google
- Google Images: user-agent = Googlebot-Image
- Google News: user-agent = Googlebot-News
- Google Videos: user-agent = Googlebot-Video
- Computer and mobile: user-agent = Googlebot
The crawl budget is allocated according to 2 factors:
- The limit: Engine crawlers are designed to avoid overloading a server with requests, so they adjust the frequency according to these limits.
- Planning: they explore a site in a variable way according to different criteria which are, its popularity, the number of requests for which it is positioned, the freshness of the updates, the most visited pages…
An optimum crawl rate will help your sites get indexed efficiently and quickly . If you waste this budget, Google will not be able to crawl your site effectively. It will spend time on parts of the site that don't matter, uninteresting pages, to the detriment of those you want to position.
This lack of optimization can result in important parts of your site not being discovered or updated, limiting their potential in terms of natural referencing. Wasting the budget harms its performance.
Crawl Budget can become a big issue for large sites with thousands of pages. Although smaller sites have less to worry about, its optimization and log analysis can still help you get better results.
The analysis of log files in SEO makes it possible to understand how engines crawl a site and its impact on SEO.
This information is a great help in improving crawling and SEO performance.
You can audit crawl behavior and determine interesting metrics:
- Is your crawl budget being spent efficiently?
- What accessibility errors were encountered?
- Are there any unknown pages?
- Does the site have 404 errors?
- Does the site have spider traps?
- What are the areas where the crawl is deficient?
- What are the most active pages on the site?
Many levers exist to optimize this budget and focus resources on the most interesting content.
Variables affecting the analysis budget:
- Duplicate content
- Schedule the Sitemap
- Low quality content
- 404 error and more
- Site architecture
- Site speed
- Redirect chains
- Internal links
Regardless of the source of the content, duplicate content can negatively impact your SEO efforts and waste budget on a major level.
Duplicate content can have various origins:
- Internal content duplication can occur when different URLs point to a single page.
- Base URLs added to parameters create X duplicate pages
- In e-com mainly, the sorting and filtering options create an unintentional internal duplication
It doesn't want to waste resources indexing the same content multiple times.
XML sitemaps should contain the most important URLs that GoogleBots should visit most often. Google admits that the XML sitemap is used in SEO in the process of creating a list of URLs to crawl for indexing. You must keep the XML sitemap up to date, without errors or redirects.
non-indexable pages and URLs returning 3xx, 4XX, and 5xx codes should not be included in your XML sitemap.
The more a page is rich and contains words, the more it is judged of quality and is explored regularly. Those with very little content are not interesting for the engines. Keep them to a minimum or avoid them altogether if possible.
If a site contains a large number of uncorrected 404 and 404 errors , it is absolutely necessary to correct them to optimize your SEO.
Check if the 404 error URL has an equivalent or similar page on the site that may be useful to users. If so, redirect the broken URL to the new one via a redirect.
Deep, complex site structures are not only unpleasant for users, but are also difficult to explore.
Always try to keep the most important pages as close to the homepage as possible.
A good method is to organize content horizontally within the site structure rather than vertically.
By performing a crawl of your site with the appropriate tools, you will be able to obtain a visual representation of your content and links, and more easily identify errors and blocking points.
A faster loading site means that Google can crawl more URLs in the same given time . Pages that take a long time to load negatively impact your crawl budget. This is a sign to crawlers that your site cannot handle the demand, and that your crawl limit needs to be adjusted lower.
High loading times and waiting times significantly affect the user experience of your visitors, which also reduces the conversion rate.
When your site has long redirect chains, i.e. a large number of consecutive 301 and 302 redirects , each URL you redirect wastes a bit of your crawl budget.
If your site has an unreasonable number of them, crawlers will stop following instructions and the final destination URL may be ignored. Each redirect wastes one unit of your allocated crawl budget. Avoid multiple URL changes.
The Robots.txt file is important in SEO because it tells what to scan and what not to scan. By telling the bots what to crawl and what not , you avoid wasting the precious resource.
Add to the robots.txt file any directories and URLs that you decide not to show. Do not block important pages by mistake.
According to numerous tests, there is a strong correlation between the number of visits from robots and the number of external links . Popularity and netlinking would therefore be important factors to increase the crawl rate of a site.
There is a pretty strong relationship between Page Authority and budget.
The links between the pages of your site play an important role in optimizing the budget. Pages that have few internal links get much less attention from search engines than the most linked ones.
Make sure your most important pages get lots of internal links . Recently crawled pages usually rank.
A well-maintained site structure with relevant internal linking makes your content easily identifiable by robots without wasting crawl budget.
Webmasters who want to track the indexing process and better understand Googlebot's interactions with their site can resort to using a server log audit tool.
The use of such a tool makes it possible to better understand this notion, to correct errors and to optimize the elements that impact it.
If you want advice on this and increase your online traffic, our SEO agency is at your disposal.