Just In Time AI

Just In Time AI -- AI-SEO Research

How Did Just In Time AI Score 96 B2B SaaS Sites For AI Discoverability?

Just In Time AI built the 4-Layer Analyzer and curated a 100-site B2B SaaS dataset to measure exactly how visible companies are to AI agents like ChatGPT, Claude, and Perplexity. Every headline percentage in our AI-SEO content traces directly to this methodology. We publish it so you can reproduce the analysis, challenge the numbers, and apply the same measurement to your own site or dataset.

What Are the 4 Layers of AI-Discoverable Content?

AI agents use a different evaluation surface than traditional search engines. PageRank measures who links to you. AI agents measure whether your content directly answers the questions buyers are typing. The 4-Layer framework maps the four technical and content conditions a site must satisfy to appear in AI-agent citation results.

L1

Layer 1 -- Indexable Surface

Can a crawler and a bot user-agent reach the full page content? This covers the full set of standard crawl-permission and indexing signals, plus whether the server returns full HTML to non-browser agents. A site that blocks bots or renders only via client-side JavaScript fails Layer 1 before any content is evaluated.

Scoring approach: HTTP checks against standard crawl-permission signals; page-source parse for indexing directives; server response type verification. No LLM calls at this layer.

L2

Layer 2 -- Citable Substance

Does the content answer specific questions in a form an AI agent can lift verbatim and attribute? This measures structured-data signals, headline structure, answer-first writing pattern, and a quotability score derived from an LLM pass. Sites that pass Layer 1 but bury answers behind preamble or write only in keyword-phrase format fail Layer 2.

Scoring approach: Structured-data signal detection, H2/H3 question-density parsing, listicle pattern heuristics, and an LLM-judged quotability score (0-100).

L3

Layer 3 -- Authoritative Source

Do AI agents actually cite this site when asked relevant buyer questions? This is the direct citation-density check: we run real buyer-intent queries through AI agents (Claude with web search, Perplexity) and record whether the site's domain appears in the response with substantive context. A site can pass Layers 1-2 perfectly and still fail Layer 3 if it has no citation history in the AI-agent training and retrieval corpus.

Scoring approach: A curated set of vertical-specific buyer questions per site, queried via frontier LLMs with web-search capability. Citation rate = sites cited / questions asked.

L4

Layer 4 -- Conversational Match

Do this site's headlines and H1/H2 structure match how buyers actually ask questions in 2026? Keyword-phrase headlines ("Best CRM Software for Small Business") score near-zero on this layer because AI agents match queries to question-shaped content, not keyword-shaped content. This layer measures the semantic distance between a site's actual headlines and the buyer questions active in its vertical.

Scoring approach: H1/H2/title/meta scrape; LLM-judged semantic similarity scoring per (headline, buyer-question) pair; mean of best-match scores across all questions in the vertical set.

How Are the Layers Scored?

Each layer combines multiple signals weighted to balance objectivity (machine-checkable structural data) with judgment (LLM-evaluated content quality). Layers can be tuned independently as the AI-discovery surface evolves.

The analyzer is a Python pipeline. Each layer runs as an independent module with structured JSON output: scores, evidence captured, and a pass/fail verdict per layer.

Layer 1 (Indexable Surface) -- evaluated against standard crawl-permission signals and server-response checks. Machine-checkable, no LLM calls. Most sites pass.

Layer 2 (Citable Substance) -- combines structural signals with an LLM-judged answerability score. Tests whether AI agents can lift specific answers from the page or only encounter marketing argument.

Layer 3 (Authoritative Source) -- direct citation-density check. We probe AI agents with vertical-specific buyer questions and record whether the site's domain is cited in substantive context. The hardest layer to pass; citation history compounds slowly.

Layer 4 (Conversational Match) -- semantic-distance scoring between the site's headlines and active buyer questions in its vertical. Keyword-phrase headlines score near zero; question-phrased headlines pass.

Specific scoring weights, pass thresholds, and the buyer-question library are part of the proprietary methodology Just In Time AI shares with paying clients. The framework above is sufficient to apply the diagnosis to your own site -- run the free scan at jitai.co/ai-seo to see your scores.

How Was the 100-Site B2B SaaS Sample Curated?

The dataset is representative, not statistically random. Sites were selected before running the analyzer -- outcome-blind -- based on stage and category criteria. No site was added or removed after seeing its score.

25 Public / Late-Stage Sites

$1B+ valuation, established market leaders: Salesforce, HubSpot, Slack, Figma, Zoom, Stripe, Notion, Okta, Datadog, ServiceNow, Workday, and similar. These are the "expected standard" for discoverability. Including them anchors the dataset: if even category leaders are failing certain layers, the problem is structural, not just a resource gap at smaller companies.

25 Growth-Stage Sites (Series C-D)

$100M-$1B, rapid-growth category leaders: Loom, Linear, Retool, Vercel, Pitch, Front, Ramp, Zapier, and similar. This tier is most representative of the buyers and users of the Just In Time AI scanner -- companies large enough to have a content investment but small enough that AI-discovery gaps are common.

25 Early-Stage Sites (Series A-B)

Under $100M, venture-backed, emerging categories. Heavy AI-native representation: Anthropic, Together AI, ElevenLabs, PostHog, Cal.com, Descript, LangChain, Weights and Biases. This tier tests the hypothesis that founder-led early-stage companies have the largest content debt -- they are shipping product faster than content.

25 Niche / Vertical SaaS Sites

Specialized B2B categories often overlooked in general SaaS analysis: field service (ServiceTitan, Jobber), legal (CLIO, LawGeex), manufacturing (Plex, Acumatica), cybersecurity (CrowdStrike, Tenable, KnowBe4), and SEO tools (Ahrefs, SEMrush). These verticals represent Owner's direct addressable market and add industry diversity beyond the generic "tech SaaS" slice.

The full site list -- all 100 companies with stage and vertical labels -- is available on request. Individual scores are not published; see the FAQ below for why.

What Are the Pass Thresholds per Layer?

Each layer has a defined pass threshold calibrated against the 100-site sample. Thresholds are tuneable based on validation runs -- if a threshold produces clearly wrong pass/fail verdicts on hand-checked sites, it gets adjusted before scaling to the full 100. Specific threshold values are part of the proprietary methodology Just In Time AI shares with paying clients.

What you can observe from the dataset: a site must pass all four layers to be considered "AI-discoverable." A site that passes only Layer 1 is technically crawlable but not citable, not authoritative, and not conversationally matched -- it is invisible to AI agents in any practical sense.

What Are the Limitations and Caveats of This Methodology?

This section exists because the limitations are real and the methodology is more credible for naming them first. Critics who push on these points will find we already acknowledged them.

Sample Size and Representativeness

100 sites is a sample, not a census. The sample is representative of active B2B SaaS as of Q2 2026 but is not statistically random. It has a US/English-primary bias (all sites are English-language; most are US-founded). The findings should be understood as "this is what we observed in this sample" -- not as a universal law about all SaaS websites globally.

AI Agent Variability

Different AI agents return different citation sets for the same query. The primary Layer 3 signal uses one agent with web search; a site cited by a different agent may fail our Layer 3 test but pass an alternate-agent test. The citation-rate finding is real but model-dependent.

Layer 4 Semantic Scoring is Model-Dependent

The conversational match score for Layer 4 is produced by one LLM call per (headline, buyer-question) pair. The score reflects that model's judgment about semantic similarity. A different model or a different system prompt would produce different scores. We report the scores as produced; we do not claim they are objective ground truth.

JavaScript-Heavy Sites May Score Artificially Low on Layer 1

Sites that render full content only after JavaScript execution may show as partial HTML to the analyzer's bot user-agent fetch. We use a Playwright fallback for sites where the initial fetch returns a skeleton shell, but Playwright adds latency and some heavily bot-protected sites may still return less content than a logged-in browser session would see.

Sites with Paywalls or Aggressive Bot Blocking

Sites that require login to access substantial content, or that return Cloudflare or similar bot challenges to the analyzer's user-agent, will score low on Layer 1 even if their actual content is rich. We flag these cases in the evidence output rather than treating them as failures in the Layer 1 sense -- a paywalled site may be intentionally inaccessible to bots.

Scores Are a Point-in-Time Snapshot

The dataset reflects the state of each site as of the scan date (Q2 2026). Sites change -- content gets added, structured data gets added, citation history builds. Headline percentages are accurate for this snapshot; re-running the same sites in Q3 2026 may produce different numbers as the market responds to AEO guidance.

How Can I Reproduce This Analysis?

The methodology is open. The components you need to reproduce the analysis are:

  1. The 4-Layer framework -- the conceptual definitions for each layer are documented in the "What Are the 4 Layers?" section above. The framework is sufficient to build your own diagnostic for any site.
  2. The buyer-question sets per vertical -- a curated vertical-specific buyer-question library drives Layers 3 and 4. Available on request. Contact us at jitai.co to discuss your vertical.
  3. The 100-site sample frame -- full list with stage and vertical labels available on request. The selection criteria are documented on this page so you can independently curate a comparable sample.
  4. The analyzer implementation -- the Python pipeline is currently private. Implementation specifics (scoring weights, pass thresholds) are part of the proprietary methodology Just In Time AI shares with paying clients. Reach out via the discovery call link below to discuss applying them to your dataset.

If you run this methodology against your own dataset and find materially different results, we want to know. Counter-datasets make the methodology stronger. The contact for methodology questions is the discovery call at the bottom of this page.

Frequently Asked Questions About This Methodology

Are the headline percentages cherry-picked?

No. Headline percentages come from running the 4-Layer Analyzer against 100 B2B SaaS sites selected before the scan, not after. Sites were chosen by stage and category -- 25 public/late-stage, 25 growth, 25 early-stage, and 25 niche vertical -- so the sample reflects the actual market shape rather than the worst-case tail. The selection criteria are documented on this page. We report the numbers the data produces, not numbers chosen to fit a headline.

Why these 100 sites and not the top 100 by traffic?

A top-100-by-traffic list is dominated by public companies with large SEO teams, which would skew the pass rate upward and make the dataset less representative of the B2B SaaS market most founders and marketers operate in. The 25/25/25/25 stage distribution intentionally includes early-stage companies (Series A-B) and niche vertical SaaS -- the segments where AI-discoverability gaps are largest and where the findings are most actionable.

How does this differ from Ahrefs, SEMrush, or Profound?

Ahrefs and SEMrush measure PageRank-adjacent signals: backlinks, organic keyword rankings, domain authority. Those signals predict visibility in traditional search results. The 4-Layer Analyzer measures a different surface: whether AI agents (ChatGPT, Claude, Perplexity) actually cite a site when a buyer asks a question. Layer 3 (citation rate) and Layer 4 (conversational match) are not measured by any standard SEO tool because they require live AI-agent probing, not crawler data.

Why don't you publish individual site scores?

Three reasons. First, scores change quickly -- a site that scores poorly today may have shipped fixes by the time you read this. Publishing a snapshot as if it is permanent misleads. Second, the value of the dataset is the aggregate pattern, not individual shaming. Third, individual brands can run the free scanner at jitai.co/ai-seo and see their own score without us publishing it on their behalf.

Will this methodology be open-sourced?

The framework, sample frame, and limitations are documented here. The implementation specifics -- analyzer code, scoring weights, pass thresholds -- are proprietary. The implementation is currently private. Reach out at jitai.co if you want to discuss applying the methodology to your own dataset.

How often is the dataset refreshed?

Quarterly. Each refresh retires 5-10 sites that have been acquired, shut down, or pivoted out of B2B SaaS and replaces them with emerging AI-native and vertical SaaS companies. Annual full re-curation aligns the stage mix with current market conditions. The dataset version number and refresh date are noted in the methodology header so any claim traces to a specific snapshot.

Who Built This Methodology?

Dan Stolts has 30+ years in field engineering, including 14 years at Microsoft, where he became a frequent conference speaker and authored two Microsoft Press / Pearson books on architecting cloud solutions. He has consulted for Nasdaq, Fidelity Investments, Harvard, MIT, and many more. Founder of Just In Time AI and architect of JitNeuro -- the Directive Orchestration Execution (DOE) framework for AI-native operations. He designed the 4-Layer Analyzer and ran the 100-site B2B SaaS dataset analysis documented on this page. He is not an SEO influencer -- he is an engineer who measured a real signal and reported the data.

Run the Scan on Your Own Site

The free AI-SEO scanner at jitai.co/ai-seo runs Layers 1 and 2 against your site and returns a prioritized fix list in under 10 minutes. No signup required.