08 / AI Tool · Diagnostics Engine · 2026

Paste a page.
Get a verdict.
From two AIs, side by side.

Signal is a landing page auditor that scores clarity, conversion, SEO, accessibility, and trust from pasted copy, pasted or uploaded HTML, or a live URL. Two of those five scores come from static analysis of the actual markup — no AI, no drift, the same input always gives the same number. The other three come from an AI's qualitative judgement, and the tool lets you run that judgement through Gemini, Groq, or both side by side — turning a landing page audit into a live comparison of how two different models read the same page.

Next.js
TypeScript
CSS Modules
Gemini 2.5 / 3.x
Groq · Llama 3.3

View live project

Signal - an AI landing page diagnostics tool showing a side-by-side Gemini vs Groq audit — Fig. 01 — Signal · compare mode, Gemini and Groq auditing the same live URL

Not Every Score Should Come From the AI

If you're using AI to compare AI, the score has to mean something stable first.

The starting brief was simple: score a landing page on clarity, conversion, SEO, accessibility, and trust. The obvious approach is to hand all five to an LLM and let it judge. The problem surfaces the moment you add a second feature — letting people compare Gemini against Groq on the same page. If every score is a language model's opinion, then SEO and accessibility scores will drift between runs of the identical page, for reasons that have nothing to do with which model is actually better. You'd be comparing noise dressed up as signal.

The fix was to stop treating all five categories as the same kind of question. SEO and accessibility are checkable facts — is there a meta description, does every image have alt text, is there exactly one H1, do form fields have labels. Those are computed by a deterministic parser against the page's actual markup, with zero AI involvement. The same HTML produces the same score every time, with a plain-language reason attached to every point lost. Clarity, conversion, and trust are not checkable facts — they're judgement calls about whether the writing actually works on a human, which is exactly the kind of question worth asking two different models and comparing.

That split turned out to do more than fix the comparison feature. It made the AI's job narrower and better-defined — it's never asked to guess whether a meta description exists, only to read the words and decide if they land. And it made the comparison fairer: when Gemini and Groq disagree, the disagreement is interesting, because it's about something only a model can judge, not a fact a parser could have settled in milliseconds.

Technical implementation follows ↓

Technical Stack

Two providers, one contract, and an input layer built to take anything thrown at it.

Input Layer	Four entry points, one normalized shape Pasted copy, pasted HTML, an uploaded .html file, and a live URL all pass through a single extractor that produces one consistent ExtractedPage shape — visible text plus a structural fact sheet (headings, images, links, forms). Everything downstream, static analysis and AI prompts alike, reads from that shape and never needs to know where the content came from. Live URLs get their own path with SSRF guards: private/loopback/link-local addresses are blocked, redirects are re-validated, and non-HTML responses are rejected before any content is read.
Static Analysis	Deterministic SEO + accessibility scoring A jsdom-based parser checks real things — title tag presence and length, meta description, H1 count, heading hierarchy, alt text coverage, generic link text, unlabeled form fields — and produces a 0–100 score with a plain-English reason for every point deducted. No network call, no model, no variance between runs.
AI Providers	Gemini (model-chain fallback) + Groq Gemini calls walk a chain of models (2.5 Flash → 2.5 Flash-Lite → 3.1 Flash-Lite → 3.5 Flash), retrying on capacity errors with exponential backoff and silently advancing to the next model if one is deprecated or unavailable. Groq runs a single call against Llama 3.3 70B. Both return through the same validator and the same JSON contract, so the UI never needs to know which provider answered.
Run Modes	Single provider or live compare Single mode calls one provider and returns one report. Compare mode calls both concurrently via Promise.allSettled — if one provider fails, the other's full result still renders, with the failure isolated to its own panel rather than sinking the whole request.
Type System	TypeScript + a shared audit contract A single types/audit.ts defines the shape every layer agrees to: ExtractedPage, StaticAnalysisResult, AiAnalysisResult, and the combined AuditReport. The AI's JSON response is validated against this shape before anything renders — a malformed response from either provider is caught and reported, never silently passed through to the UI.

A Model Got Deprecated Mid-Build

The most useful bug in this project was one Google caused, not one I wrote.

Live-testing the tool against real sites, a compare-mode run came back with Groq's full result rendered cleanly — and a red error panel where Gemini's should have been: models/gemini-1.5-flash-8b is not found for API version v1beta, or is not supported for generateContent. Google had deprecated a model sitting in the middle of the fallback chain. The chain's own error-classification logic only knew how to recognise capacity errors and malformed JSON as "try the next model" — a 404 for a model that simply no longer exists fell through every check and was thrown as fatal.

The actual fix was less about updating the model list — Google's lineup changes on its own schedule, and the list will be stale again — and more about making that inevitability survivable. A model-not-found response, whether by status code or by the specific text Google returns, now gets its own error type and is treated exactly like a capacity error: skip silently, try the next model in the chain, and only surface a failure if every model in the chain is actually gone.

The fix was verified the same way the bug was found — not by reasoning about it, but by running it. A model that 404s mid-chain now produces a clean fallback to the next model with no user-visible failure; a model that succeeds outright still skips the fallback machinery entirely; and a genuinely unexpected error — one that has nothing to do with availability — still fails loudly instead of being silently absorbed.

// gemini.ts — a 404'd model is a reason to fall back, not to fail

const isModelNotFound =
  response.status === 404 ||
  errorMessage.includes('is not found for API version') ||
  errorMessage.includes('not supported for generateContent')

if (isModelNotFound) {
  throw new GeminiModelNotFoundError(
    `MODEL_NOT_FOUND:${modelName}:${errorMessage}`
  )
}

Deterministic scoring core · dual-provider compare mode · graceful model-chain fallback · SSRF-guarded URL intake

Continue

Back to selected work View live project