Google’s Crawling Algorithm: Discovery & Refresh

google search indexGoogle has two types of web crawling – one is for discovering new content and one is for refreshing content that has already been published.

Google utilises two types of crawling methods when it goes through webpages — one to discover new content and one to refresh existing content.

Google’s Crawling Algorithm is used for two main purposes – discovering new content and refreshing existing content. For discovering new content, the Crawling Algorithm follows links from one page to another.

Googlebot starts crawling from seed urls and crawls pages following links on those pages.  When a link is found which doesn’t exist in Google’s index or for some reason isn’t accessible, it creates an entry in the list of URLs to be crawled next time. For refreshing existing content, Google goes through sitemaps and identifies URLs that haven’t been indexed yet and crawls them.

What are the differences between the two types of crawling?

Google has two types of crawling: discovery and refresh. Discovery crawling is when Googlebot discovers new URLs, while refresh crawling is when Googlebot revisits previously crawled URLs. Refresh crawling is an important part of how Google ensures that its index stays fresh.

But in order for refresh crawling to work, the refreshed URLs must be linked to from other pages on your site. Without these links, Googlebot will continue to discover your new URLs through discovery crawling instead of refresh crawling, and your newly crawled pages will not get indexed.

Google’s John Mueller made this point in a recent Google+ post, saying “refresh crawling is the process where we revisit URLs that are linked to from other pages on your site.”

This means if you have some new content on your site, but none of the rest of your pages link to it or reference it in any way then this new content will not get indexed. As John said “discovery crawling then becomes more important again,” because these pages will only be found through discovery crawling until they get linked to by another page. It can take months for Googlebot to return and discover new URLs which are not linked to any other page on the site.

The solution is to make sure all of your pages are linked to from other pages on your site. John said, “this would be a fairly typical example of what I’d expect people to do when creating new content – create the content and then link it into their existing system.” He added that this will obviously depend on the type of site being built, but in most cases this should work well.

Unfortunately, even if you have a good link structure with many internal links between them, some of the URLs on your site may never get indexed. The more dynamic or large your site is, the greater this problem becomes. Unfortunately, there’s not much you can do about it directly—if Google doesn’t already know about a URL of yours and cannot find any path on your site.

Are you looking for a new way to get your website found on Google?

Nua Search Engine Optimisation is an SEO company that can help you rank higher in search results. We’ve helped countless businesses increase their online visibility and grow their revenue through our proven SEO strategies. Whether it’s organic traffic, lead generation, or social media marketing, we have the expertise to make sure your site is found on google.

If you want more people visiting your site and buying from you then contact us today! Our team of experts will work with you one-on-one to create a customised plan that fits both your budget and business goals. You won’t find another company like ours out there – so don’t wait any longer! Get started now by clicking this ad right now!

Contact us today for a free consultation about how we can help drive more customers to your website!

Categories: Google.