# Inside the AIO score: why subtractive penalties beat weighted pillars · Onto Blog
> The AIO score starts at 100 and subtracts. We chose this over a weighted-average formula because every point loss is traceable to a named structural cause — and traceability matters more than mathematical elegance.

**Source:** /blog/inside-the-aio-score
**Extracted:** 2026-06-24T17:42:54.795Z

---
When we first sketched the AIO score, the natural formula was a weighted average: take three pillars (signal-to-noise ratio, semantic richness, content-negotiation health), assign weights that add to one, multiply, sum, done. It looked clean on a whiteboard. It scored well across a sample of 226 sites. The draft PRD even shipped with weights pre-allocated: `0.4 × React_Tax + 0.35 × Semantic + 0.25 × Negotiation`.

We threw it out. The engine that's shipping today — and that has been running on every Onto scan since launch — is a subtractive penalty model. Every URL starts at 100. Each detected issue subtracts a specific, named number of points. The final score is plain arithmetic. No opaque weights, no hidden coefficients. This post explains why.

### The traceability problem

Weighted averages have a subtle failure mode: the user can't undo them. If a page scores 62/100 under `0.4×A + 0.35×B + 0.25×C`, the breakdown isn't obvious. Is it bad because A is 40, or because B is 30, or because C is 70 but the weight is too low to lift it? The weights themselves hide the cause. Even if you publish them, every score becomes a small inverse-problem to interpret.

The subtractive model has the opposite property. A page that scored 62/100 doesn't need explanation by formula — it needs explanation by line item. Every penalty in the engine is a named, atomic deduction:

score = 100
       - 25  // No Markdown layer detected
       - 10  // React Tax 15–30%
       -  3  // 5 images missing alt
                              = 62

That breakdown is what the API returns. `penalties[]` is a string array, one entry per deduction, traceable to the exact structural cause. Your agent can route on it. Your dev tooling can surface it. A user looking at the dashboard can read why their site scored what it scored without learning a formula.

### The math doesn't actually matter

Here's the uncomfortable truth: the scoring engine's specific penalty values aren't principled. They're empirical. We chose −25 for "no Markdown layer" and −20 for "no JSON-LD" because, after running the engine against hundreds of sites and looking at the resulting distribution, those values produced grade bands that matched human intuition about which sites were actually agent-ready.

A weighted average pretends to be principled — the weights look like coefficients in a regression. They aren't. They're also hand-tuned. The subtractive model is honest about being hand-tuned, and that honesty is what makes it useful.

If we ever discover that "no JSON-LD" should cost -15 instead of -20 because some downstream task gets better results with the new calibration — we can change one number in `aio.ts`, redeploy, and every existing score page on the dashboard still makes sense. The penalty was already a line item, not a weight inside a formula. The shape of the explanation doesn't change.

### Five grades, five hallucination tiers

The 0–100 score maps to five grade bands via `scoreToGrade()`:

*   **90–100 Excellent** — agent-ready, hallucination risk low
*   **75–89 Good** — minor penalties, hallucination risk low
*   **50–74 Needs work** — multiple penalties, medium risk
*   **25–49 AI-hostile** — heavy React Tax + structural gaps, high risk
*   **0–24 Invisible** — robots block or compound failures, high risk

Hallucination risk is a separate calculation — `getHallucinationRisk(score, insights)` — that considers not just the score but which structural signals passed or failed. A page can have a low score from React Tax alone but still score "low" risk if it ships JSON-LD and clean semantic hierarchy. The score and the risk band are correlated but not derivable from each other. Both are useful.

### What fires which penalty

The full penalty table is small, by design. Eight signals, one fatal short-circuit (robots disallow), and a handful of graded tiers (React Tax has two thresholds, image alt-text accumulates by batch). The complete list lives in [/scoring](/scoring) and in the SDK source at `onto-sdk/packages/core/src/score/aio.ts` — same code, same numbers, whether you call the API, install the SDK, or scan a competitor.

The penalties are deliberately blunt instruments. Each one corresponds to a structural property an agent can detect at scan time — not to subjective "quality." Did the page ship JSON-LD? Yes or no. Did the heading hierarchy have at least two levels? Yes or no. The score is a fingerprint of structure, not a judgment about content.

### What we got wrong

The first version of the docs described the engine as a weighted average — copy-pasted from the original PRD draft. That stayed in the docs for weeks after the actual engine had moved to subtractive penalties. The result was confusing for anyone reading the docs and then poking at the API: the math didn't match. We fixed it. (If you spot another inconsistency between the docs and the engine, tell us — the engine is canonical, and the docs should match it, full stop.)

We also initially exposed the score and the per-penalty list but not the insights object. People wanted to branch on specific structural facts ("has JSON-LD?", "robots allowed?") without re-parsing the penalty strings. That's now in `insights` on every scoring response.

### What we'd change next

The scoring model is stable enough that the next interesting work is around it, not in it. Specifically: per-tier confidence ranges (one URL scored 62/100 means something different than ten URLs averaging 62), score drift over time, and the relationship between score and downstream LLM accuracy on concrete tasks (not just "does the model hallucinate less" but "does retrieval-augmented Q&A get more factual answers").

We'll write about each of those once we have data worth showing. In the meantime: drop any URL into the [scanner](/scanner) and see the full breakdown. Or hit `/v1/score` from your terminal — same engine, same answer.