Google Index API Check URL: Bulk Status Verification Guide

On this page

Why the Google Indexing API for URL Status?Setting Up Authentication Bulk Index Status Workflow Worked Example: 200 URL Batch Check Edge Cases & Operational Failures Indexing API Response Codes & Actions Pre-Flight Checklist Before Running a Bulk Check Step-by-Step: Python Script for Bulk Status Check FAQ

Field notes

Why the Google Indexing API for URL Status?

The Google Indexing API was designed for job posting and live event markup, but its getUrlStatus method is a powerful tool for checking whether a URL is indexed. Unlike the Search Console API which returns data that can be 24-48 hours stale, the Indexing API returns a near-real-time status. The trade-off? You need a service account with ownership or full control over the property. In practice, when you manage a site with 50k URLs and need to verify indexation after a content refresh, this API saves hours. A common situation we see is teams building a daily cron job to check a batch of 200 URLs and automatically flagging non-indexed pages for re-inspection. The core bottleneck is not the API itself — it's handling the URL_NOT_FOUND and URL_UNAVAILABLE responses for blocked or soft-404 pages. You must pair this with a diagnostic tool like pages-not-indexed-diagnostic to understand why a URL is not indexed.

Field notes

Setting Up Authentication

The Google Indexing API uses OAuth 2.0 with a service account. You must grant the service account Owner permission in Search Console — Editor level will not work. Use the google.oauth2.service_account module. A common error is using the wrong scope: the Indexing API requires https://www.googleapis.com/auth/indexing. If you get a 403, check the service account email in Search Console. For a complete Python setup guide, see python-google-indexing-api-setup. Once authenticated, the getUrlStatus endpoint returns one of: URL_NOT_INDEXED, URL_INDEXED, URL_PENDING, or URL_DUPLICATE. Note: URL_DUPLICATE means Google chose a canonical different from what you submitted. This often points to weak internal linking or thin content — check google-crawl-errors for crawl diagnostics.

Workflow map

Bulk Index Status Workflow

Authenticate Service Account

Load JSON key; create credentials; request token with indexing scope.

Read URL List

CSV file with 200 URLs max. Filter out duplicates and known 4xx URLs.

Batch API Calls

For each URL, call getUrlStatus. Handle 429 rate limit with exponential backoff.

Parse Response

Map status to indexed/pending/not indexed/duplicate. Log errors.

Export Results

Write CSV with URL, status, timestamp. Flag non-indexed URLs for review.

Alert & Diagnose

Send non-indexed URLs to pages-not-indexed-diagnostic tool for root cause analysis.

Worked example

Worked Example: 200 URL Batch Check

We ran a test on a site with 10,500 blog pages. We selected 200 URLs that had been updated in the last 7 days. Using a Python script with the google-api-python-client library, the script completed in 112 seconds (0.56s/URL). Results: 162 indexed, 21 not indexed, 12 pending, 5 duplicate. The 21 non-indexed URLs all had HTTP 200 responses but were missing internal links — they were orphan pages. The 5 duplicate cases all pointed to a different canonical, which we traced to missing rel=canonical tags. We then ran those 21 URLs through a PageSpeed Insights check to confirm they were not blocked by noindex directives. None were blocked. The fix: added internal links from category pages and resubmitted via the Indexing API. Four days later, all 21 were indexed.

Field notes

Edge Cases & Operational Failures

You will hit edge cases. Blocked URLs (robots.txt, password-protected, or requiring login) return URL_UNAVAILABLE — the API cannot check them. Filter these out before your batch to avoid wasting quota. Wrong filters: if you pass a URL that belongs to a different Search Console property, you get a 403. Bad data: duplicate lists cause you to run the same URL multiple times, exhausting daily quota. Limits: the API has a hard limit of 200 URLs per day per project; you can request an increase via Google Cloud Console but it is rarely granted for single projects. Weak pages: URLs with near-zero content often return URL_INDEXED but have no traffic — this is a false positive for 'healthy'. Empty results: if you query a URL that has never been crawled, the API returns URL_NOT_INDEXED but does not tell you if the URL was even discovered. For that, use google-crawl-errors to check crawl logs. Slow vendors: third-party dashboard tools that wrap this API often cache results for hours, defeating the real-time advantage. Build your own script.

Data table

Indexing API Response Codes & Actions

Response Code	Meaning & Technical Detail	Recommended Action	Risk / False Positive Note
URL_INDEXED Google has indexed this URL.	The URL is in the index. Response includes `latestUpdate` timestamp.	No action needed. Optionally verify that canonical matches.	Can return true for soft-404 pages or pages with thin content. Cross-check with traffic data.
URL_NOT_INDEXED URL not in index.	Google has not indexed this URL. Could be blocked, noindex, or not discovered.	Check robots.txt, meta robots, and internal links. Use pages-not-indexed-diagnostic tool.	A URL with HTTP 200 can still be not indexed due to quality signals. Do not assume it is a technical block.
URL_PENDING Crawled but not yet indexed.	The URL was discovered and crawled, but indexing is queued. Can last hours to days.	Wait 24-48 hours. If status persists beyond 72 hours, resubmit via Indexing API.	Pending status is often transient. Do not trigger alerts for URLs pending less than 48 hours.
URL_DUPLICATE Canonical differs.	Google chose a different canonical URL than the one you checked. The response includes the chosen canonical.	Review canonical tags and internal linking. Ensure the preferred URL is linked from sitemap.	This is not an error per se, but it means the checked URL may not appear in SERPs. Confirm the canonical URL is indexed.
URL_UNAVAILABLE Cannot access the URL.	Googlebot could not fetch the URL. Usually due to robots.txt, auth, or server errors.	Check robots.txt, server logs, and firewall rules. Remove URL from batch if it is intentionally blocked.	If the URL is behind a login, it will always return this status. Exclude such URLs from your batch to save quota.

Pre-Flight Checklist Before Running a Bulk Check

1

Verify the service account email has Owner permission in Search Console, not Editor or Restricted.

2

Remove URLs that return 4xx, 5xx, or are blocked by robots.txt to avoid quota waste.

3

Deduplicate your URL list. Running the same URL twice consumes quota twice.

4

Set up exponential backoff for 429 rate limit errors. Google recommends at least 1 second between calls.

5

Export results to a CSV with timestamp so you can track changes over days.

6

Create a separate alert for URL_DUPLICATE responses — they indicate canonical misconfigurations that need fixing.

Step-by-Step: Python Script for Bulk Status Check

Install dependencies: <code>pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib</code>
Create a service account in Google Cloud Console, download the JSON key, and add the service account email as Owner in Search Console.
Write a function <code>get_index_status(service, url)</code> that calls <code>service.urlNotification().getMetadata(url=url).execute()</code>
Read your URL list from a CSV file. Limit to 200 URLs per run. Use <code>tqdm</code> to track progress.
For each URL, call the function and handle exceptions: 403 (auth), 429 (rate limit), 500 (server error). Store results in a list.
Write results to a new CSV with columns: URL, status, canonical (if present), timestamp. Log errors separately.
Schedule the script as a daily cron job. Send non-indexed URLs to a diagnostic tool for root cause analysis.

FAQ

How to check Google index status for multiple URLs using API for agencies?

Agencies managing multiple client sites should create one service account per Search Console property. Use a master script that loops through client lists, authenticates each property separately, and runs batch checks of 200 URLs per client per day. Aggregate results into a single dashboard. Be aware that the daily quota is per project, not per URL, so you cannot check 500 URLs for one client in a single day without requesting a quota increase.

What is the difference between Google Indexing API getUrlStatus and Search Console API for index coverage?

The Indexing API returns a near-real-time status (URL_INDEXED, URL_NOT_INDEXED, URL_PENDING, URL_DUPLICATE) for a single URL. The Search Console API returns aggregate index coverage data (valid, error, warning counts) for a property, which can be 24-48 hours stale. Use the Indexing API for spot-checking specific URLs and the Search Console API for property-level trends. Do not use the Indexing API for bulk reporting — that is what the Search Console API is for.

How to handle Google Indexing API errors like 403, 429, and 500 in bulk operations?

403: Check that the service account email is added as an Owner in Search Console and that the scope is correct. 429: Implement exponential backoff starting at 1 second, doubling each retry up to 60 seconds. Do not retry more than 5 times. 500: These are transient Google server errors. Retry up to 3 times with 5-second intervals. If the error persists, log the URL and skip it. Do not let one bad URL block the entire batch.

Can I use the Google Indexing API to check index status for guest posts on other domains?

No. The Indexing API can only check URLs that belong to a Search Console property where you are an Owner. For guest posts on other domains, you need the site owner to add your service account as an Owner. Alternatively, use the URL Inspection Tool in Search Console manually, or ask the host to share a read-only report. The Indexing API is not a universal index checker — it is tied to property ownership.

What is the daily quota for Google Indexing API and how to increase it for bulk URL checks?

The default quota is 200 URLs per day per Google Cloud project. You can request an increase via the Google Cloud Console Quotas page, but approval is not guaranteed and typically requires a business justification. If you need to check more than 200 URLs daily, consider using multiple Google Cloud projects (one per client) or combining the Indexing API with the Search Console API for aggregate trends.

Why does the Google Indexing API return URL_DUPLICATE for some URLs and how to fix it?

URL_DUPLICATE means Google chose a different canonical URL than the one you checked. Fix by ensuring the page has a self-referencing canonical tag, that internal links point to the correct URL, and that the sitemap only includes the preferred version. Also check for duplicate content patterns. After fixing, resubmit the preferred URL via the Indexing API and verify status changes to URL_INDEXED within 48 hours.

How to build a Google Indexing API dashboard for real-time index status monitoring?

Use a cloud function (e.g., Google Cloud Scheduler + Cloud Run) to run the batch script daily. Store results in BigQuery or a PostgreSQL database. Build a dashboard using Google Data Studio or Metabase with filters for status, date, and URL source. Set up alerts for URLs that remain URL_NOT_INDEXED for more than 7 days. Include a direct link to the pages-not-indexed-diagnostic tool for each non-indexed URL.

What are common mistakes when using the Google Indexing API for bulk URL status verification?

Most common: 1) Using Editor instead of Owner role in Search Console. 2) Forgetting to filter out non-crawlable URLs (robots.txt, login walls). 3) Not implementing rate limit handling — causing 429 errors that block the whole batch. 4) Running the same URL list multiple times and exhausting quota. 5) Treating URL_INDEXED as 'healthy' without checking for thin content or soft 404s. 6) Ignoring URL_DUPLICATE responses that indicate canonical problems.

How to use the Google Indexing API with Python for a checklist-based index status workflow?

Start with the authentication setup from the python-google-indexing-api-setup guide. Write a function that takes a URL and returns its status. Create a checklist: verify auth, load URL list, deduplicate, filter blocked URLs, call API with backoff, parse response, export CSV. Use <code>pandas</code> for data manipulation and <code>logging</code> for error tracking. Schedule the script to run daily and automatically email a summary of non-indexed URLs.

What alternatives exist to the Google Indexing API for checking URL index status in bulk?

Alternatives: 1) Search Console API — returns index coverage data for the whole property, but not per-URL real-time status. 2) Rank tracking tools like Ahrefs or SEMrush — they show index status but data can be days old. 3) Browser automation (Selenium) logging into Search Console — fragile and against ToS. 4) The URL Inspection Tool manually — only for small batches. The Indexing API is the only method for real-time, programmatic per-URL checks, but it requires property ownership.

Next reads

Related guides

↗

Main guide

↗

Why Is My URL Not Indexed? Diagnostic Checklist

↗

Free Tools to Check Google Index Status: Comparison

↗

Check Google Index with Search Console: Step-by-Step

Budget math

Estimate the cost of waiting

Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.

Expected monthly value, USD Average waiting time, days