The site: operator is the fastest way to check if Google has indexed a URL — but it lies. This guide gives you the exact syntax, the failure modes, and a practical workflow to diagnose indexation without getting fooled by partial results.
The site: search operator is the oldest trick in the SEO playbook. You type site:example.com/page into Google, and it tells you if that page is indexed. Simple. Wrong. In practice, when you work with sites that have more than a few thousand pages, the operator becomes unreliable. Google explicitly warns that the site: command returns a 'representative sample', not a complete index. A common situation we see is an SEO who runs site:example.com, sees 1,200 results, and assumes that is the full index. Meanwhile, the sitemap shows 18,000 submitted URLs. That gap is not a bug — it is the operator's design. Understanding this limitation is the core bottleneck. Most practitioners waste hours chasing phantom indexation issues because they treat site: as authoritative. It is not. It is a diagnostic clue, not a measurement tool.
For a deeper understanding of how search engines evaluate pages, refer to the broader context of search engine optimization theory. But for the raw mechanics of checking indexation, you need to know the syntax cold, the edge cases, and the fallback methods.
| Query Pattern | What It Actually Returns | Best Use Case | Hidden Risk / Failure Mode |
|---|---|---|---|
| site:example.com No space after colon | Sample of indexed pages from that domain (not full count) | Quick sanity check: is the domain in the index at all? | Results capped at ~1,000. Large sites see only a fraction. |
| site:example.com/page Full URL path | Shows if that specific URL has at least one indexed version | Confirm single URL status. Fastest check for one page. | May show a different canonical version. A 301 redirect URL can also show as indexed even if destination is not. |
| site:example.com inurl:keyword Combined operators | URLs containing keyword within the index sample | Find pages Google associates with a term on your domain | Operator stacking reduces result quality further. Google may ignore one operator if query is complex. |
| site:example.com -inurl:blog Exclusion filter | Indexed pages not containing 'blog' in URL path | Isolate core pages vs blog section for inventory audit | Exclusion can accidentally remove pages that have blog in path but are important (e.g., /blog/category/core-service). |
| site:example.com filetype:pdf File type filter | Indexed PDFs on the domain | Check if PDF assets are being indexed separately | PDF indexing counts toward total 'indexed pages' but may lack metadata. Can inflate perceived indexation. |
| site:example.com & site:example2.com Multiple domains | Not supported. Google ignores second site: operator. | N/A | You cannot compare domains in one query. Must run separate searches and manually compare. |
Type <code>site:example.com/your-page</code>. Wait for full load. Do not rely on the snippet count — look for the page in results.
Paste the URL into the URL Inspection tool. This is the only source of truth. It shows 'URL is on Google' or 'URL is not on Google' with crawl details.
Export your sitemap URLs. Cross-reference against GSC 'Submitted sitemaps' report. Any URL not listed as indexed needs deeper investigation.
In GSC, check the 'Crawl errors' section. Blocked resources, 404s, and soft 404s are common culprits. Fix these before re-checking indexation.
For large sites, use the Google Indexing API to programmatically check. <a href="https://pythongoogleindexingu.vercel.app/python-google-indexing-api-setup">Python Google Indexing API setup</a> can automate batch URL checks and flag discrepancies.
After fixes, wait 3 days. Google's recrawl cycle is not instant. Run the <code>site:</code> operator again and compare with GSC data to confirm resolution.
Let us walk through a concrete case. You have a client site, example.com/blue-widget. You run site:example.com/blue-widget — no results. Panic? Not yet.
Step 1: Open Google Search Console URL Inspection. Paste the URL. Result: 'URL is not on Google. Crawled but not indexed.' The reason: 'Crawled - currently not indexed'. This is a specific Google status that means the page was crawled but dropped from the index for quality or duplication reasons.
Step 2: Check Google crawl errors in GSC. You find that the page has a 'Noindex' meta tag inherited from a template. The tag was set on the category template and inadvertently applied to all child products.
Step 3: Remove the noindex tag. Submit the URL for indexing via GSC. Wait 3 days.
Step 4: Re-run site:example.com/blue-widget. Now the page shows. But the snippet count shows '1 of about 1 results'. That is correct in this case because only one URL matches.
Step 5: Run a broader site:example.com inurl:widget to check all widget pages. You find 18 out of 24 expected widget pages are showing. The missing 6 are likely still affected by the same template issue. This is where you use the pages not indexed diagnostic workflow to batch-identify all failing URLs.
Numbers: The site has 5,000 product pages. GSC shows 3,200 indexed. The sitemap has 4,800 submitted. The gap is 1,600 pages. Using the workflow above, you find that 1,200 have the noindex issue, 300 are soft 404s (empty product pages with no stock), and 100 are blocked by robots.txt. The site: operator alone would never have revealed this breakdown.
Run the query in an incognito window to avoid personalized results skewing the sample.
Scroll to the bottom of the search results page — Google hides results if you do not manually load more.
Check if your page has a canonical tag pointing to a different URL. The site: operator shows the canonical, not the original URL.
Verify that the page is not blocked by robots.txt, meta noindex, or X-Robots-Tag header.
Look for 'Crawled - currently not indexed' in GSC — this is the most common false negative for site: checks.
Use the URL Inspection tool in GSC as the definitive check, not the site: operator.
For bulk checks, export a list of URLs and use the Indexing API or a scraping tool — never manually verify more than 50 URLs.
Document the date and time of your site: check; indexation status changes frequently and you need a baseline.
Here is where most guides stop short. They give you the happy path. Let us talk about the failures.
Empty results for indexed pages. You run site:example.com/page and get zero results. Yet GSC says the page is indexed. This happens when the page is indexed but ranked so low that Google excludes it from the sample. The operator is not a comprehensive index — it is a search result. If no user query would plausibly surface that page, Google may omit it even from the site: results.
Blocked URLs that still show. A page with a noindex tag and a blocked robots.txt can still appear in site: results if Google has a cached copy from before the restrictions were applied. The cache can persist for weeks. You remove the page, block it, and it still shows. This is a stale data problem.
Duplicate lists in large sites. For domains with over 10,000 indexed pages, the site: operator returns a 'representative sample' that is not statistically representative. Google picks pages arbitrarily. You cannot use this sample to estimate your total index count — the variance is too high.
Wrong filters from URL parameters. If your CMS generates session IDs or tracking parameters in URLs, the site: operator may treat each unique parameter combination as a separate page. You could see thousands of 'indexed' URLs that are actually duplicates of the same page. This inflates your perceived indexation and masks the real issue.
Slow vendors. If you use a third-party SEO tool that wraps the site: operator, be aware that Google rate-limits these queries aggressively. Tools that claim to do 'bulk site: checks' often use cached data that is days or weeks old. You are better off running the query directly in a browser.
It is about 80-90% accurate for single URLs that are well-ranked and have no canonical issues. For URLs with low authority, redirect chains, or canonical tags pointing elsewhere, the false negative rate jumps to 30-40%. Always verify with GSC URL Inspection for critical pages.
If the page has never been crawled, it will not appear in site: results at all. If it was crawled before the block, Google may still show a cached snippet for days or weeks. The only way to confirm a block is to check the robots.txt file and use the GSC robots.txt tester.
Technically yes, but practically no. Google caps the visible results at around 1,000 and the sample is not representative. For large sites, use the Indexing API or export sitemap data and compare with GSC coverage reports. The site: operator will give you a misleading sense of completeness.
This happens when the page is indexed but ranked so low that Google excludes it from the site: sample. The operator is a search result, not an index dump. Pages with thin content, no backlinks, or high competition for their queries are often omitted even though they are technically in the index.
The site: operator is a manual, one-off search that returns a sample. The Indexing API is a programmatic interface that can check up to 200 URLs per day per property and returns definitive status (indexed, not indexed, error). For agencies managing multiple sites, the API is essential. See our <a href="https://pythongoogleindexingu.vercel.app/python-google-indexing-api-setup">Python Google Indexing API setup</a> guide for automation.
This status means Google crawled the page but chose not to index it, usually due to thin content, duplication, or low quality. Improve the page content to be unique and valuable, ensure internal links point to it, and submit it for indexing again. If it persists after 2-3 weeks, check for canonical issues or soft 404s.
The top three are: 1) Soft 404s (page returns 200 status but has no meaningful content), 2) Noindex meta tags inherited from templates, 3) Blocked JavaScript or CSS resources that prevent Google from rendering the page. Use the <a href="https://googlecrawlw.vercel.app/google-crawl-errors">Google crawl errors</a> report to identify these.
Indirectly. You can run site:example.com -inurl:blog to see a sample of indexed pages, but you cannot query for 'missing' pages. To find unindexed pages, you need to compare your sitemap or database against GSC coverage data. The <a href="https://websiteindexing6.vercel.app/pages-not-indexed-diagnostic">pages not indexed diagnostic</a> workflow provides a step-by-step method for this.
The site: operator can update within hours for high-authority pages, but typically takes 3-7 days for normal pages. Google's recrawl schedule is not real-time. Do not rely on site: for immediate feedback — use GSC URL Inspection which updates as soon as the crawl completes.
For single URLs: GSC URL Inspection tool. For bulk checks: Google Indexing API or third-party tools like Screaming Frog with GSC integration. For monitoring: set up indexation reports in GSC and track coverage over time. The site: operator is a quick glance, not a reliable diagnostic instrument.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.