# Onto has no AI inside — and that's the whole point · Onto Blog
> Firecrawl's JSON mode and context.dev's aiQuery run an LLM in their own pipeline — and bill you for it. Onto runs none. Putting a model inside a data layer is self-defeating; here's the economics, and how our new batch and extract endpoints stay true to it.

**Source:** /blog/no-ai-inside
**Extracted:** 2026-06-24T17:42:54.789Z

---
A developer asked us a sharp question last week: _“If you're using AI behind the scenes to give me clean data, what's the point of you?”_

It's the right question, and the answer is the most important design decision we've made: **there is no AI inside Onto.** No model cleans your HTML. No model writes your score. No model decides what a page “means.” And that isn't a limitation we're apologizing for — it's the entire reason the product is worth using.

### The trap: paying for the thing you're trying to avoid

Here's the job Onto does. An agent without us fetches a page — call it 600 KB of HTML — and feeds the whole thing to its own frontier model just to _read_ it. That's roughly 150,000 tokens spent on nav bars, utility classes, and analytics scripts to find the few kilobytes that matter. Onto hands the agent the clean version instead: often 10× smaller, sometimes far more.

Now look at what happens if Onto used an LLM to produce that clean version. We'd be running a model over the same 600 KB — paying frontier-model prices — and then charging you for it. You'd be paying for an LLM to save yourself from paying for an LLM. The economics don't just fail to improve; they invert. A data layer with a model inside is a more expensive version of the problem it claims to solve.

This is the line between Onto and tools like Firecrawl's JSON extraction or context.dev's `aiQuery`. Those run a model in their pipeline — genuinely useful, but you pay for that inference whether you see the line item or not. We don't run one, so there's nothing to pass on.

### What Onto actually is

The cleaning step is deterministic parsing — strip the chrome, walk the DOM, convert the semantic content to Markdown. It runs in tens of milliseconds and costs effectively nothing. The [AIO score](/scoring) is a rule engine, not a judge: it starts at 100 and subtracts named penalties — no [headings](/scoring), no content negotiation, JSON-LD missing, payload bloated. Same URL in, same score out, every time. You can cache it, audit it, and reason about it, because nothing in the path is probabilistic.

That's the property an LLM can never give you: a model that cleans a page is a model that can _hallucinate_ a page. Determinism is the feature.

### The new endpoints stay on the right side of that line

We just shipped three more primitives, and every one of them is the same deterministic fetch-and-process — no model, no third-party service:

`POST /v1/batch` takes a list of URLs, or just a base URL whose pages it discovers from the sitemap, and reads, scores, or extracts every one in a single billable call. Pull fifty pages of a site without spending fifty credits. `POST /v1/map` returns a site's URLs so you can plan a crawl. `POST /v1/extract` returns the structured data a page _already declares_ — its JSON-LD, OpenGraph, and meta tags.

That last one is the tell. A model-backed “extract” would happily invent a `price` field for a page that never stated one. Ours returns what the page published and nothing else. If the data isn't there, we don't make it up — we hand your model the clean Markdown and let _it_ reason, on your terms, at 10× fewer tokens.

### The intelligence lives in the caller

The cleanest way to think about Onto: it's a dumb primitive, on purpose. Either the thing calling it is a deterministic pipeline — in which case it already knows which URLs it wants — or it's an AI agent, in which case it does its own reasoning. Neither one needs Onto to be smart. Both need it to be fast, cheap, and predictable. So Onto provides; it doesn't decide.

That principle is also why we said no to some obvious-looking features. A search endpoint would need a third-party index with its own rate limits and bill — and deciding _what_ to fetch is the caller's job anyway. A JavaScript-rendering mode would need a third-party headless-browser farm. LLM extraction would need a paid model. Each one would have made Onto depend on someone else's meter. We'd rather ship fewer endpoints that we fully control and can scale natively than a longer feature list propped up on services we don't own.

### Why this is the durable bet

Models will keep getting smarter and cheaper. That makes some AI tooling disposable — but it makes a deterministic, near-free data layer _more_ valuable, not less, because the cheapest thing in the stack is the thing that survives. Onto's job is to make sure your expensive model never touches raw HTML. The only way that math works is if the layer doing it isn't expensive itself.

So: no AI inside. That's not us falling behind. That's us refusing to sell you the problem twice.