How to diagnose technical indexing issues
Some pages you want to rank just aren't in Google's index. The Pages report in GSC tells you the "why" - but the reasons are cryptic. Here's a triage flow that walks a common set of statuses from "clear signal to fix" down to "accept and move on."
Triage non-indexed pages and identify the 2-3 fixes with highest upside.
15 minutes for triage + hours/days to ship fixes.
GSC Pages report access, developer help for some fixes.
Before you start: read the primer
This playbook assumes you've read Indexing explained and know the difference between "Discovered," "Crawled," and "Indexed." Also useful: GSC not indexing my site for a broader overview.
The triage flow
- Open GSC's Pages report. (This lives in GSC itself, not in GSC Wizard.) Note the total of non-indexed pages and open the list of statuses underneath.
- For each status, ask: "are these pages I actually want indexed?" Half the non-indexed count on any site is pagination, filters, tag pages, or admin URLs that shouldn't be indexed anyway. Move those into "accept" column.
- Prioritize by impact. A status affecting 50 of your money pages beats a status affecting 5,000 tag pages. Open each status bucket and check which URLs are affected - that's the real signal.
- Run the most-likely fix for each status (see table below).
- Validate with URL Inspection. Paste one representative URL into URL Inspection and read Google's reasoning. Confirm the fix worked before scaling.
- Re-crawl sparingly. Use "Request indexing" for individual money pages after a fix, not as a policy. It doesn't scale.
Common statuses and their fixes
Discovered - currently not indexed
Google found the URL but hasn't even crawled it. Usually a crawl-budget problem. Fix: improve internal linking to the affected pages, reduce low-value pages Google is wasting crawls on, submit a fresh sitemap. Deep dive: Discovered not indexed.
Crawled - currently not indexed
Googlebot downloaded the page but chose not to index it. Usually a quality or duplicate signal. Fix: make the page substantially more useful or different from existing indexed pages, or accept the non-indexing if the page is thin. See Crawled not indexed.
Soft 404
Page returns 200 OK but looks like an error page to Google. Fix: either return a real 404/410, or beef up the content so it doesn't look empty. Walkthrough: Soft 404 explained.
Duplicate, Google chose different canonical
Your page has a canonical tag pointing to itself, but Google indexed another URL as the canonical. Fix: consolidate with a 301 redirect, or add a real canonical pointing to Google's chosen URL. See Canonical URLs.
Excluded by noindex tag
Page has a noindex meta tag or header. If intentional, fine. If not, remove the tag. A lot of frameworks add noindex by accident in staging/production toggles. Verify with URL Inspection.
Blocked by robots.txt
robots.txt disallows crawling. Note: blocking doesn't remove already-indexed URLs; they may still show without a description. See robots.txt in GSC.
Sitemap errors
Sitemap points to URLs that 404, redirect, or are non-canonical. Fix: regenerate the sitemap so it lists only canonical 200-OK URLs. See Sitemap errors.
A worked example
A B2B SaaS site sees 1,800 URLs in "Discovered - currently not indexed." Drilling in, 1,500 of them are internal search-result URLs (/search?q=...) - those should be robots.txt-disallowed, not investigated. Of the remaining 300, 120 are real product pages launched last month. Fix: add those 120 pages to the main nav or an internal hub, regenerate the sitemap, submit it. Four weeks later 95 of the 120 have been crawled and indexed; the remaining 25 get manual review.
When to call it
Not every non-indexed page is a bug. If you've made a page measurably better and Google still won't index it after 60 days, the market signal is "this page isn't distinct enough." Consolidate it into an existing indexed page instead of chasing the status.
Example outcome
On a mid-sized site with thousands of URLs, this flow typically reduces the "Discovered - currently not indexed" count by 30-60% within 90 days - mostly by disallowing junk URLs and improving internal links to real content. The pages that graduate into the index are the ones that drive the long-tail traffic.
Next playbook
You've finished the playbooks track. Loop back to the Week 1 checklist → to turn this into a weekly habit, or browse the glossary for any term that still felt jargony.