Stop checking URLs one by one. This guide covers manual bulk methods and automated scripts to verify indexation at scale. Real numbers, failure modes, and a worked example included.
When you manage a site with thousands of pages, knowing which URLs are indexed and which are not is the difference between a healthy crawl budget and silent traffic loss. The core bottleneck is not the checking itself — it is the data hygiene before you even send the first request. Duplicate lists, wrong URL formats, and missing protocol prefixes will waste hours.
Google's ranking systems are built on indexation as the prerequisite. Without an indexed page, no ranking signal matters. As Google's ranking systems guide explains, the index is the foundation for all subsequent ranking algorithms. If you cannot confirm indexation at scale, you are flying blind.
In practice, when you start with a raw URL export from a crawler or a CMS, expect 15-30% of entries to be unusable. Redirect chains, relative paths, tracking parameters, and non-canonical duplicates are the usual suspects. A common situation we see is an agency importing a client's backlink list directly into an index checker — and wondering why half the results show 'not found'. The URLs were pointing to old domain versions.
Here is a worked example with concrete numbers: You export 5,000 URLs from Screaming Frog. After cleaning — removing 312 duplicates, 84 relative paths, and 126 URLs with query strings that 301-redirect — you have 4,478 clean URLs. You run a bulk check via the Google Indexing API. Result: 3,210 indexed, 1,101 not indexed, 167 returned errors (blocked by robots.txt or 4xx). The 1,101 not-indexed URLs become your diagnostic list. That is a 24.6% indexation gap — a direct hit to potential traffic.
| Method | Speed & Scale | Cost & Setup | Reliability & Risk | Best Fit |
|---|---|---|---|---|
| Manual site: operator Copy-paste each URL into Google search | ~50 URLs/hour No batch capability | Free No tools needed | Inconsistent results Captchas after ~20 queries | Quick sanity check Small lists under 100 URLs |
| Google Search Console API Performance report with URL inspection | ~2,000 URLs/day API quota limited | Free OAuth 2.0 setup required | Accurate for verified properties Quota errors on large batches | Agency clients with verified GSC Weekly monitoring |
| Python Google Indexing API Batch requests with metadata | ~5,000 URLs/minute Up to 10,000/day per project | Free tier + developer costs Service account setup | High reliability Quota limits if misconfigured | Large-scale audits Recurring automation |
| Hybrid: GSC API + Python fallback First pass via GSC, then remaining via Indexing API | ~4,000 URLs/hour Efficient for mixed lists | Moderate setup Two authentication flows | Best coverage Error handling needed for API timeouts | Production environments Daily indexation checks |
From Screaming Frog, Ahrefs, or CMS. Minimum: 1,000 URLs. Include protocol.
Remove duplicates, relative paths, non-canonicals. Expected loss: 10-20%.
Run Python script using Indexing API. Batch size: 100 URLs per request.
Map each URL to indexed / not indexed / error. Export to CSV with timestamps.
For not-indexed URLs, check robots.txt, noindex tags, 4xx status, orphan status.
Submit fixed URLs via API. Wait 24-48 hours. Recheck for verification.
No workflow survives contact with reality unchanged. Here are the failure modes we see most often:
Blocked URLs: A page blocked by robots.txt will never appear in the index. The API will return 'URL is not available to Google'. Do not confuse this with a crawl error. Google crawl errors include DNS, server connectivity, and redirect issues — blocked URLs are a different category entirely.
Wrong filters: The GSC API returns data based on the selected property. If you check an http URL against an https property, you get zero results. Always normalize your list to match the verified property.
Empty results: If your script returns 100% 'not indexed' and you know the pages are live, check your authentication. Expired service account keys are the number one cause of false negatives.
Slow vendors: Third-party bulk check tools often queue your list behind other users. A batch of 5,000 URLs can take 6-12 hours. A local script finishes in minutes.
Let us walk through a real scenario. You have a client site with 5,000 pages. After exporting from Screaming Frog and cleaning, you have 4,478 valid URLs. You set up a Python script using the Python Google Indexing API setup guide. Your script sends batches of 100 URLs per request with a 2-second delay between batches to stay under quota. Total runtime: 89 seconds. Results: 3,210 indexed (71.7%), 1,101 not indexed (24.6%), 167 errors (3.7%). You export the not-indexed list to a CSV and run a diagnostic on a subset of 50 URLs. You find 22 with noindex tags, 16 blocked by robots.txt, 8 returning 404, and 4 that are orphaned (no internal links). You fix 30 of those immediately. After 48 hours, you recheck the same 1,101 URLs. 740 are now indexed. The remaining 361 require content or technical fixes.
Normalize all URLs to absolute format with https:// and trailing slash removal.
Remove any URL with query strings that are not canonical.
Verify your Google service account has the Indexing API enabled and quota assigned.
Set a delay between API calls to avoid 429 rate limit errors.
Export results with three columns: URL, status (indexed/not indexed/error), and timestamp.
Filter out known redirect chains before checking to avoid false 'not found' results.
Once you have your list of not-indexed URLs, the real work begins. Do not resubmit blindly. Use the pages not indexed diagnostic to categorize each failure. Common causes: missing internal links, low content quality, duplicate content, or technical blocks. The fastest fix is often internal linking — a page with no inbound links from indexed pages has a 73% lower chance of being indexed within 30 days. For thin content, add at least 300 words of unique value and a clear internal link from a high-authority page on the same site.
Use the Google Search Console API with a separate property for each client. Aggregate results via a Python script that loops through all properties. Set up a service account per client or use domain-wide delegation. Expect quota limits: 2,000 queries per day per property. For larger volumes, combine GSC API with the Indexing API for verified sites.
A local Python script using the Google Indexing API is fastest — up to 5,000 URLs per minute. The API supports batch requests of 100 URLs each. Third-party web tools are slower because they queue requests and add network latency. For 10,000 URLs, expect under 3 minutes with a script versus 2-4 hours with a web tool.
Export your backlink list from Ahrefs, Majestic, or Semrush. Clean the URLs — remove tracking parameters and normalize to https. Run the list through the Google Indexing API. Only about 30-50% of backlinks are typically indexed. Use the results to identify which linking pages actually pass link equity.
Common errors: authentication failures (expired service account keys), quota exceeded (429), malformed URLs (relative paths, percent-encoding issues), and blocked pages (robots.txt). Fix: rotate keys every 30 days, implement exponential backoff for 429s, validate URLs with regex before sending, and pre-check robots.txt.
Yes. Collect the guest post URLs from your outreach tracker or backlink report. Normalize and run through the Indexing API. Guest posts on low-authority domains often take longer to index. If a post is not indexed after 2 weeks, check for noindex tags, thin content, or internal linking issues on the host domain.
The free tier allows 200 URL submissions per day and 2,000 inspection queries per day per property. For larger volumes, request quota increases via the Google Cloud Console. Some projects can get up to 10,000 queries per day. Exceeding quota returns 429 errors — implement a queue with retry logic.
Use the google-api-python-client library. Set up a service account with the Indexing API scope 'https://www.googleapis.com/auth/indexing'. Send POST requests to the URL inspection endpoint. Parse the JSON response for 'indexingStatus' field. Batch URLs in groups of 100. Add a 1-2 second delay between batches to avoid rate limits.
Alternatives include: (1) Google Search Console API — good for verified properties but quota limited. (2) Screaming Frog with custom extraction — slow but works without API access. (3) Third-party tools like Sitebulb or DeepCrawl — expensive but offer integrated diagnostics. The Indexing API remains the fastest for pure status checking.
Step 1: Validate and clean URLs. Step 2: Check robots.txt and status codes first. Step 3: Send batch requests via API with exponential backoff. Step 4: Log all errors to a separate CSV. Step 5: Retry failed requests after 24 hours. Step 6: Compare results week-over-week to track indexation rate changes.
It means Google has not added the URL to its main index. Causes: page blocked by robots.txt, noindex meta tag, canonical pointing elsewhere, low content quality, orphaned page (no internal links), or the page has not been crawled yet. Check the 'not indexed' list against your sitemap and internal link graph to prioritize fixes.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.