Crawl budget: What Does it Mean and How to Optimize it?

Spread the love
  •  
  •  
  •  
  •  
  •  
  •  
  •  

crawl budget

Most people have heard of Googlebot, crawling and indexing.

Still, it is a topic that is rarely discussed and is usually taken for granted. Nevertheless, it does play a role when it comes to SEO and increasing traffic.

In order for content to be crawled, indexed and presented to users in Google, the search engine needs to find it on the web.

This task is done by Googlebot (otherwise known as robots, crawlers or spiders).

Given that a billions of sites publish new content each and every day on their blog, it is imperative that their URLs are being crawled constantly.

This way, each post will be indexed almost instantly and will be able to gain traffic.

If Google crawl spiders don’t do their job well, website will lose potential traffic from the day an article was published until the day it was crawled and shown in the SERPs.

Over time this is something that can damage your ranking, traffic and business as a whole.

Based on this it is easy to tell why it’s best to optimize your crawl budget.

Yes, but how? And what does crawl budget mean in the first place?

Let’s answer these and more questions now.

What is crawl budget?

Crawl budget refers to the number of times Googlebot visit your pages on a daily basis.

Obviously, it is much better if Google crawls your pages more frequently if you publish posts more often (for example if you have a news site).

If you have a big website it’s mandatory to keep an eye on your crawl budget optimization as a lot of your pages might never be crawled, indexed and ranked in Google.

Tools like Google Webmaster Console are a good way to check your current budget in 2017.

What factors influence crawl budget?

Google’s Gary Illes wrote a great post in which he explained the major factors which affect your crawling budget.

He said that having many “low-value-add URLs” can influence your site budget in a bad way.

OK, but what are low-value-add URLs and how can we find them?

Here’s a short list to help your in classifying them easier:

  • Faceted navigation and session identifiers (all kinds of filters, especially if you’re an e-commerce site)
  • Duplicate, low quality and spam content
  • Soft error pages (web pages that return 404 error)
  • Hacked pages
  • Infinite spaces (huge numbers of links that provide little or no new content for Googlebot to index)
  • Proxies

Crawl rate limit and demand

Before we get to optimization tips, I need to say some words about crawl rate limit and crawl demand.

These factors have a direct impact on your budget.

What is crawl limit?

Crawl limit is based on responsiveness of your website and limit which you set in Google Search Console (GSC).

Basically,

If your site responds quickly to crawlers, their number will increase and vice versa.

Fast, responsive websites will get higher number of crawlers while server error (or errors) can diminish it.

If you wish to learn more about errors, you can access crawl errors report in GSC which shows you:

  • Site errors
  • URL errors

The second thing that can affect your crawl limit is to change the crawl limit you set in Google Search Console. As you can tell, this can be used to increase or reduce the number of crawl.

However, if your site has some technical issues it will not help you increase them.

What is crawl demand?

Crawl demand deals with your actual need for crawling.

In other words,

Websites with more pages that publish content frequently will get crawled more often.

According to Google, there are two main factors to be considered:

  • Popularity
  • Staleness

In terms of popularity, pages that have more traffic tend to get crawled more often.

On the other hand, Google also tries to keep its results as fresh as possible.

Based on this it is easy to tell that the company is trying to help the users by keeping popular pages fresh and up to date. At the same time, it prevents pages from going too old.

Summary

As you can see, in order to achieve higher crawl budget you need to avoid any technical errors or anything else that will prevent crawlers from reaching your URLs.

At the same time, your content needs to be popular.

Authority of a website as well as frequency of posting also play a significant role.

How to optimize your crawl budget?

Given that crawl budget optimization is an important part of your SEO and online marketing, you need to take extra care of it.

Treat it like the regular optimization where you’re doing everything in your power to increase rankings and thus traffic. It mainly comes down to fixing structure of your website and removing any issues you might have.

  1. Decide which pages you wish to be crawled

This process should be done regardless of your budget.

You will have to create a robots.txt file where you will add all the pages that you do not wish to be crawled.

This data will be very helpful for crawlers as it will limit their access and force it to focus on content that is actually important.

Every time a crawler visits a page, it consumes a unit of site’s crawl budget.

By preventing access to these pages (which shouldn’t be crawled in the first place) you are able to save budget for more important pages. This way you’ll also increase the speed which Google indexes your new pages with.

There are two more ways to save from your crawl budget:

  • by adding <noindex> attribute
  • by using rel=canonical tag

Make sure to use no-index tag, canonical tag and robots.txt file accordingly.

  1. Broken links and errors

All the categories and all the pages on your site are seen as resources.

As previously mentioned, not all of them should be indexed. This is especially true for pages which are not meant for human visitors (like your sitemap, which I will mention in a bit).

The same goes for every link. If there are a lot of broken links on your pages crawlers will be consuming units for nothing. By fixing these broken links you will be able to rectify the issue.

Various errors can also be problematic as they are confusing both visitors and bots.

  1. Fix your sitemap

Sitemaps are very important for robots.

The general rule is that every web page should be within 3 clicks away from your home page. This is very important for user experience.

Similar rule applies for robots.

Your task during this step is to improve your sitemap making it more accessible. Make sure to remove any pages that fall under any of these categories:

  • not important
  • blocked
  • with excessive redirects

In this case, redirects can be really expensive.

The more you have, the more budget will be spent on them so make sure to simplify things.

  1. Duplicate and thin content

Needless to say, duplicate content is one of the biggest issues for any website.

If you’re into SEO you know how harmful this type of content can be.

It diminishes your site reputation in the eyes of Google as you’re not seen as a useful resource. At the same time, it can cause issues with the crawling and indexing process.

You see, each page on your website has a unique URL address. This also goes for duplicate content.

Although these pages are absolutely (or almost) the same, Google will still employ crawlers to go through all of them.

This means you will consume twice or more of your crawling budget for nothing.

  1. Consider Accelerated Mobile Pages (AMP)

Accelerated Mobile Pages is something that every site should consider for better mobile performance.

Let’s say a few words about the AMP project first.

Basically, AMP pages are a lighter version of your pages that are shown when searched through mobile devices.

While this is a great way to become mobile-friendly and focus your pages on mobile devices, it can cause some big issues.

Each AMP page is unique and has a specific ULR. In that regard, just by having a mobile-friendly website, you will double the number of your crawlable pages.

You are not able to work around it. It is simply something you need to deal with if you want to have AMP version of your pages that is.

Best thing you can do in this case is to make sure that you have enough crawl budget before deciding to implement AMP. This way you will make a painless transition.

  1. Use RSS feeds for freshness

Feeds such as RSS are commonly visited by Google bots. Although this may seem as a problem at a first glance, it actually helps you.

Through feeds, users get updates regarding their favorite site.

Crawlers use it in a similar manner by keeping informed about new content that is posted on your site.

Google’s Feed Burner is a great tool that help you out in this case. Whenever you post something new make sure to submit it in this tool.

Another great way to make the search engine finds and indexes your new content is via Google Search Console. Just use the “Fetch as Google” feature and your content will be included in the index in terms of minutes.

To sum it up

Best way of managing your crawl budget is by having a clean, error-free and functional website.

Even if you have lots of content and post on a daily basis, if you’re meticulous enough you can prevent most of these issues from ever happening.

Website architecture is also very important. Convoluted or out of date sitemaps can be disastrous for crawlers (the same way they are for users) so make sure to fix them up.

When was the last time you checked your crawl budget? Are you satisfied with the speed Google indexes your new pages? Share your views in the comment section bellow and follow me on social media!

Leave a Reply

2 Comments on "Crawl budget: What Does it Mean and How to Optimize it?"

Notify of
avatar
Sort by:   newest | oldest | most voted
Ivan
Guest

Great article Nikolay!

wpDiscuz