A systematic, opinionated checklist to find and fix the exact reason a page stays out of Google’s index. No fluff. Step-by-step diagnosis from robots.txt to content quality signals.
When a URL is not indexed, most people jump to the loudest suspect: thin content, duplicate content, or a penalty. Those guesses waste weeks. In practice, when you run a structured diagnostic, over 60% of indexation failures come from three silent killers: a robots.txt disallow, a noindex meta tag left by a plugin, or a sitemap that never included the URL. The rest is noise.
This checklist focuses on the 20% of causes that generate 80% of failures. You will test each layer in order. No skipping.
Open URL in browser. If 404/5xx, fix server or redirect. Check with ?nocache=1.
Check robots.txt for disallow. Check HTML head for noindex. Use Google URL Inspection Tool.
Validate URL is in the sitemap. Check lastmod date and canonical match.
Count internal links to this URL. If 0, add at least 2-3 from contextual pages.
Review server log for last crawl date. Low crawl frequency? Reduce low-value pages to free budget.
Check length, uniqueness, and E-E-A-T signals. Thin or duplicate? Consolidate or remove.
| Blocker | Diagnostic Method | Immediate Fix | Hidden Failure Mode |
|---|---|---|---|
| Robots.txt disallow Directive blocks crawling | Run robots.txt checker. Look for Disallow: /your-path/ | Remove or update the disallow rule. Wait for recrawl. | Wildcard rules like Disallow: /*?* may block all query-parameter URLs without warning. |
| Noindex meta tag Often set by SEO plugins | View page source. Search for | Remove tag via plugin settings or theme. Re-submit URL. | Plugins like Yoast or Rank Math sometimes apply noindex to categories or pagination siteswide. |
| URL not in sitemap Or excluded by lastmod policy | Compare sitemap contents to known URLs. Check lastmod older than 30 days. | Add URL to sitemap. Update lastmod to current date. Ping Google. | Some CMS exclude posts older than X days from sitemap automatically. Check your XML generation settings. |
| Orphan page (zero internal links) No discovery path | Use Sitebulb or Screaming Frog to find inlinks count = 0. | Add 2-3 contextual internal links from high-authority pages. | Homepage link is gold; footer links are weak. One link from a category page is better than five from a privacy policy. |
| Thin or duplicate content Below 300 words or identical to another page | Use Copyscape or site: search for duplicate fragments. Count words. | Expand to 800+ words with unique research. Add schema markup. | Google may soft-404 thin pages. Check Google Search Console for 'Discovered - currently not indexed'. |
Run the URL through Google URL Inspection Tool. Read the 'Coverage' status line by line.
Check robots.txt using the official <a href='https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt'>Google robots.txt documentation</a>. Look for accidental disallows on your path.
Remove any noindex meta tag. If you use Yoast, go to 'Advanced' tab on the post editor and confirm 'Index' is selected.
Verify the URL appears in your sitemap. If not, regenerate the sitemap and ping Google via Search Console.
Ensure at least one high-authority page on your domain links to the target URL. A homepage link is best.
Review <a href='https://googlecrawlw.vercel.app/google-crawl-errors'>Google crawl errors</a> for server errors (5xx) or soft 404s that block indexation.
If you manage bulk operations, use <a href='https://pythongoogleindexingu.vercel.app/python-google-indexing-api-setup'>Python Google Indexing API setup</a> to automate submission for jobs, product listings, or event pages.
A client had 47 blog posts with status 'Discovered - currently not indexed' in GSC. The diagnostic sequence:
The root cause was not content quality — it was orphan status + stale lastmod. A common situation we see when CMS auto-generates sitemaps without updating lastmod on content edits.
Not every indexation problem follows the standard path. Here are three real edge cases we have debugged:
1. Blocked by robots.txt from a staging site. A developer copied robots.txt from staging to production and forgot to remove the Disallow: / directive. Entire site was blocked for 6 weeks. The fix: delete the staging robots.txt and use the robots.txt testing tool to confirm allowance.
2. Canonical tag pointing to a different domain. A client had a canonical tag pointing to an old blog subdomain (www vs non-www conflict). Google followed the canonical and never indexed the actual URL. Remove the cross-domain canonical or fix the URL pattern.
3. JavaScript rendering timeout. A page with heavy client-side JS failed to render within Googlebot's 5-second budget. The page was 'Crawled but not indexed'. Solution: server-side render critical content. Use dynamic rendering if you cannot switch to SSR.
If you need a full diagnostic workflow, see the pages not indexed diagnostic guide for additional edge case coverage.
The most common reason is that the page has zero internal links from other indexed pages. Google discovers content through links. If no page on your site links to the new post, Google may never find it. Always add at least one link from a homepage, category page, or related article immediately after publishing.
Open yourdomain.com/robots.txt in a browser. Look for a line like 'Disallow: /your-folder/'. If the URL path matches a disallowed pattern, Googlebot cannot crawl it. Use the robots.txt tester in Google Search Console to validate. For full syntax, refer to the official robots.txt documentation.
Yes. A noindex tag tells Google to exclude the page from the index immediately. If a plugin or theme mistakenly adds noindex to all posts (e.g., during a site migration), the page will remain out of index until the tag is removed and the URL is re-crawled. Always check page source for: <meta name='robots' content='noindex'>.
Three reasons: (1) The URL in the sitemap returns a 4xx or 5xx status. (2) The lastmod date is too old, making Google deprioritize it. (3) The sitemap contains URLs with different canonical tags. Use a sitemap validator tool and compare the sitemap URLs with your actual page headers.
Google found the URL via a sitemap or link but chose not to index it yet. Common causes: the page has thin or duplicate content, low authority, or Google is crawling higher-priority pages first. Fix by improving content quality, adding internal links, and reducing the total number of crawlable pages on your site to free up crawl budget.
For URLs with high quality signals, the Indexing API can trigger indexing within 24-72 hours. However, it is not a guarantee. Google still evaluates the page for value, uniqueness, and server response. The API works best for job postings, event pages, and time-sensitive content. See the Python Google Indexing API setup guide for implementation.
'Crawled - not indexed' means Googlebot downloaded the page but decided not to store it in the index, often due to content quality or duplicate issues. 'Discovered - not indexed' means Google knows the URL exists (via sitemap or link) but has not attempted to crawl it yet, usually due to crawl budget limits. The fix differs: for crawled, improve content; for discovered, increase internal links.
Yes. If a page takes longer than 3-5 seconds to load, Googlebot may time out and leave the page uncrawled. Check server response time using the URL Inspection Tool's 'Live Test' feature. Aim for under 1.5 seconds. Use CDN and server-side caching to reduce TTFB. Slow servers waste crawl budget and cause partial rendering.
Minimum 2-3 contextual internal links from pages that are already indexed. One link from a high-authority page (e.g., homepage) is more effective than five links from low-authority pages. Avoid footer links — they are weak signals. Place links in the main content area with relevant anchor text. More than 5 links from the same source dilutes value.
Verify the site in Google Search Console. Check for a manual action under Security & Manual Actions. Then check robots.txt (no global disallow), server headers (200 OK), and whether the site is blocked by a noindex tag in the <head>. If everything seems fine but pages remain unindexed, use the pages not indexed diagnostic guide for a complete step-by-step workflow.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.