A log file is a log file (or several files) created and maintained automatically by a server, which consists of a list of activities it has performed. For SEO purposes, it is a web server log that contains a history of page requests to a website, both by humans and search engine crawlers.
Log analysis allows you to fully understand the behavior of the Google Bot on your site in order to put in place the most effective strategies to improve SEO performance and facilitate the work of Google's exploration tools.
The main task of Google Bots when they access a site is to crawl a specific number of pages defined by the site's crawl budget . After analysis, Google saves the urls that it has explored online in its database.
Server log analysis has evolved into a fundamental part of technical SEO audits. It can provide usable indicators that cannot be identified otherwise.
It allows you to optimize the indexing performance of your site by bots, to rank your site better in Google results, to obtain more traffic and to increase your sales.
Understanding crawler behavior and correcting errors that can harm SEO performance is a fundamental part of auditing.
Each connection and content request sent to your hosting web server is recorded in a log file, called a log file . These files usually exist for technical auditing and website troubleshooting, but can also be extremely valuable for your audits and optimizing certain SEO factors.
In order to carry out SEO-oriented analyses , you need the raw access logs of the server on which your domain is hosted, without filtering or modification. Ideally, you will need a large amount of data so that the analysis can be done on a sufficient volume of data. Depending on your volume of traffic and Google's crawl frequency, you will have to use the data over more or less long periods of time.
Through the analysis of this connection data, you will be able to examine and understand how Google crawls your site . All you have to do is export this data and filter the Googlebot connections (by user agent and IP range).
The data received is stored anonymously and includes information such as the time and date the connection was made, the IP address of the visitor or robot, the URL of the requested content and the user agent of the Navigator.
- Timestamp (date and time)
- Method (GET/POST)
- request url
- HTTP status code
- User agent
Log file analysis consists of downloading your files from your server and opening them via an analysis tool dedicated to SEO.
Simply filter by agents and customer IP addresses to access details by engines.
Search Console and third-party crawler tools do not paint the full picture of how Googlebot and other engines interact with a website. Only the analysis of the log files of access to your site makes it possible to know precisely the behavior of the explorer bots such as Googlebot.
The analysis of these files is a specialty that requires advanced technical knowledge and the use of tools that can sometimes be expensive. However, this technical data greatly helps SEO specialists to solve important technical problems. Problems that generally cannot be identified through other methods.
It provides us with a considerable amount of useful information:
- Define what needs to be explored first.
- Define what should not be explored.
- Determine the problems encountered during the exploration.
- Find out which parts of the site are appreciated by the engines.
- Optimize daily crawl budget ratio.
- Help improve accessibility errors such as 404s and 500s.
- Identify pages that are not often crawled.
Whichever method you choose to access and understand your log data, analyzing it is key to uncovering important technical issues that impact a website's SEO . Here are the main SEO problems that can be identified and solved by analyzing with the right tools.
Your website may contain pages that return error codes of different types . Those that do not respond or that return corresponding 301s, 400s or 500s must be analyzed as a priority and corrected.
It is important to repair the missing content, to redirect the obsolete ones to the correct ones so that GoogleBot can explore the site and discover the content without error messages.
It is recommended to look for those with 3xx, 4xx and 5xx status codes , to see any redirects or errors you send to crawlers.
Reducing issues and optimizing engine crawl will allow your SEO strategy to take effect more effectively.
The log analysis tool will allow you to identify irrelevant content that is still crawled, but also irrelevant duplicate content that can penalize your natural referencing
By identifying the resources that are not supposed to be indexed, you will be able to take appropriate action from a technical point of view.
Having many low value URLs indexed by Google can negatively impact a site's indexing. The waste of resources on these non-value-added pages will reduce activity on those that really have value, sometimes considerably delaying the discovery of content to be valued.
Low value URLs can fall into these categories:
- Faceted navigation and session IDs.
- Duplicate content.
- 404 errors on the server.
- Hacked pages.
- Poor quality content and spam.
It's important that search bots not only get to your site, but also that they crawl the pages that matter most to your conversions. Which are they exploring? What is their HTTP status? Does the crawler crawl the same or different pages? Does it find new content quickly?
If your most important pages are not among the first crawled, you can decide to put in place appropriate actions to stimulate visits.
Google may be ignoring urls or crucial parts of your website. The metrics will reveal the URLs and directories getting the most and least attention.
The analysis of the logs makes it possible to know on what date the bot passed. By optimizing your sites appropriately, you can influence the frequency of crawls of less often visited urls.
Google assigns a budget to each site based on many factors. If your ration is x pages per day, you want the x crawled by Google to be the most relevant and useful.
If you reach your site crawl limit too quickly, it will take Google longer to find content you want crawled more often to the benefit of non-priority content.
Google doesn't want to waste time and resources crawling low-quality websites.
Temporary 302 redirects do not pass popularity from the old URL to the new one . They should generally be changed to permanent 301 redirects. Chains of redirects from content whose URLs have changed several times in a row may no longer be followed after a certain number.
They waste crawl budget unnecessarily. The analysis therefore makes it possible to verify the correct organization of your permanent redirections.
The internal links that link your pages are initially there to facilitate the navigation of your visitors in the various sections of the site and also to create a continuity of navigation from subject to subject or from product to product.
The internal mesh is also decisive in allowing Googlebot to discover all the pages of a site and increasing the ratio of visits.
By analyzing the path taken by the robots to explore your site, you will be able to crawl towards certain pages or sections of your site, with the aim of favoring the contents of your website deemed to be the most important, or those too neglected by crawl spiders.
Site migrations and SEO redesign are conducive to errors (change of domain name, https, design redesign, etc.). It is important to carry out a complete audit in order to identify broken links, 404 errors and any other malfunction likely to impact natural referencing.