Search

Tuesday, February 7, 2023

Understanding and resolving ‘Discovered – currently not indexed’

If you see “Discovered – currently not indexed” in Google Search Console, it means Google is aware of the URL, but hasn’t crawled and indexed it yet. 

It doesn’t necessarily mean the page will never be processed. As their documentation says, they may come back to it later without any extra effort on your part. 

But other factors could be preventing Google from crawling and indexing the page, including:

  • Server issues and onsite technical issues restricting or preventing Google’s crawl capability.
  • Issues relating to the page itself, such as quality.

You can also use Google Search Console Inspection API to queue URLs for their coverageState status (as well as other useful data points) en masse.

Request indexing via Google Search Console

This is an obvious resolution and for the majority of cases, it will resolve the issue.

Sometimes, Google is simply slow to crawl new URLs – it happens. But other times, underlying issues are the culprit. 

When you request indexing, one of two things might happen:

  • URL becomes “Crawled – currently not indexed”
  • Temporary indexing

Both are symptoms of underlying issues. 

The second happens because requesting indexing sometimes gives your URL a temporary “freshness boost” which can take the URL above the requisite quality threshold and, in turn, lead to temporary indexing.


Get the daily newsletter search marketers rely on.

Processing…Please wait.


Page quality issues

This is where vocabulary can get confusing. I've been asked, "How can Google determine the page quality if it hasn't been crawled yet?"

This is a good question, and the answer is that it can't.

Google is making an assumption about the page's quality based on other pages on the domain. Their classifications are likewise based on URL patterns and website architecture.

As a result, moving these pages from "awareness" to the crawl queue can be de-prioritized based on the lack of quality they have found on similar pages. 

It's possible that pages on similar URL patterns or those located in similar areas of the site architecture have a low-value proposition compared to other pieces of content targeting the same user intents and keywords.

Possible causes include:

  • The main content depth.
  • Presentation. 
  • Level of supporting content.
  • Uniqueness of the content and perspectives offered.
  • Or even more manipulative issues (i.e., the content is low quality and auto-generated, spun, or directly duplicates already established content).

Working on improving the content quality within the site cluster and the specific pages can have a positive impact on reigniting Google's interest in crawling your content with greater purpose.

You can also noindex other pages on the website that you acknowledge aren't of the highest quality to improve the ratio of good-quality pages to bad-quality pages on the site.

Crawl budget and efficiency

Crawl budget is an often misunderstood mechanism in SEO. 

The majority of websites don't need to worry about this. In fact, Google's Gary Illyes has gone on the record claiming that probably 90% of websites don't need to think about crawl budget. It is often regarded as a problem for enterprise websites.

Crawl efficiency, on the other hand, can affect websites of all sizes. Overlooked, it can lead to issues on how Google crawls and processes the website.

To illustrate, if your website: 

  • Duplicates URLs with parameters.
  • Resolves with and without trailing slashes.
  • Is available on HTTP and HTTPS.
  • Serves content from multiple subdomains (e.g., https://website.com and https://www.website.com).

…then you might be having duplication issues that impact Google's assumptions on crawl priority based on wider site assumptions.

You might be zapping Google's crawl budget with unnecessary URLs and requests. Given that Googlebot crawls websites in portions, this can lead to Google's resources not stretching far enough to discover all newly published URLs as fast as you would like.

You want to crawl your website regularly, and ensure that:

  • Pages resolve to a single subdomain (as desired).
  • Pages resolve to a single HTTP protocol.
  • URLs with parameters are canonicalized to the root (as desired).
  • Internal links don't use redirects unnecessarily.

If your website utilizes parameters, such as ecommerce product filters, you can curb the crawling of these URI paths by disallowing them in the robots.txt file.

Your server can also be important in how Google allocates the budget to crawl your website.

If your server is overloaded and responding too slowly, crawling issues may arise. In this case, Googlebot won't be able to access the page resulting in some of your content not getting crawled. 

Consequently, Google will try to come back later to index the website, but it will no doubt cause a delay in the whole process.

Internal linking

When you have a website, it's important to have internal links from one page to another. 

Google usually pays less attention to URLs that don't have any or enough internal links – and may even exclude them from its index.

You can check the number of internal links to pages through crawlers like Screaming Frog and Sitebulb.

Having an organized and logical website structure with internal links is the best way to go when it comes to optimizing your website. 

But if you have trouble with this, one way to make sure all of your internal pages are connected is to "hack" into the crawl depth using HTML sitemaps. 

These are designed for users, not machines. Although they may be seen as relics now, they can still be useful.

Additionally, if your website has many URLs, it's wise to split them up among multiple pages. You don't want them all linked from a single page.

Internal links also need to use the <a> tag for internal links instead of relying on JavaScript functions such as onClick()

If you're utilizing a Jamstack or JavaScript framework, investigate how it or any related libraries handle internal links. These must be presented as <a> tags.

The post Understanding and resolving ‘Discovered – currently not indexed’ appeared first on Search Engine Land.



from Search Engine Land https://ift.tt/FfPb4Lu
via IFTTT

No comments:

Post a Comment