We use cookies to understand how you use this site and improve your experience.

226 Milliseconds

Alexandru Mareș

Published17/02/2026

Read time3 min

Topics

Architecture Performance YON

Actions

00

Comments

Loading comments…

The Wrong Number

I stared at the benchmark results. The parse speed was ten times slower than JSON. I should have been horrified.

Instead I was smiling.

I was looking at the wrong number. Everyone looks at parse speed. How many operations per second can the parser consume a complete document? JSON wins this race easily. A hundred thousand operations per second versus roughly eleven thousand for YON.

But this metric is a relic of the batch era. It measures how fast you can eat a meal that has already been cooked. In the world of AI agents, the meal is being cooked while you eat.

The Physics of Waiting

An LLM generates text slowly. It produces perhaps fifty to a hundred tokens per second. A JSON parser sits idle during this generation. It buffers the entire response. It waits for the closing brace. It creates a pause, a dead silence, that breaks the illusion of intelligence.

I realized that the metric that matters is not parse speed. It is Time to First Action.

How long from the moment the first byte arrives until the system can do something useful?

For YON, that number is 0.226 milliseconds.

Not 226 milliseconds. 0.226.

The Architecture

The trick is the newline.

In block formats, context is nested. To understand a node, you must know its parent. You track depth. The parser maintains a stack frame. You must see the whole tree before you can read a single leaf.

In YON, context is local. Every record is self-contained. The architecture is simple:

Input. A stream of bytes from the LLM.
Buffer. Accumulate bytes until \n.

Action. Parse the line. Dispatch the record. Clear the buffer.

This loop executes in microseconds. The 0.226ms metric represents the time from receiving the first byte to structuring the first usable object.

The Shadow

This architecture enables something I call shadow execution.

Consider a five-step agent plan. In a block format, the system must generate all five steps. The document closes. The runner begins executing step one.

In a YON stream, the system generates step one. The runner executes step one while the system generates step two. Execution happens in the shadow of generation.

The total wall-clock time shrinks. The user sees activity instantly. The waiting spinner disappears.

The Trade-Off

I disclose the trade-off because I believe in honesty.

YON is verbose. It carries a 13% token overhead compared to minified JSON. It is slower to parse in bulk. If you have a complete document sitting on disk, JSON is the faster read.

I accept this cost because I am not optimizing for storage. I am optimizing for perception. I am trading raw CPU cycles for perceived latency. I am trading density for resilience.

If a stream fails at 99%, JSON loses everything. The document is syntactically invalid. YON preserves the 99%. Every completed line is a valid record. Partial failure does not destroy the whole.

The Rules

When building YON parsers, I follow three rules:

Never buffer the whole stream. If you wait for EOF, you have failed the architecture.

Dispatch on newline. The newline is the transaction boundary.

Handle partials gracefully. If the stream ends without a final newline, the last record is incomplete. Discard it. Keep the rest.

Speed is not throughput. Speed is responsiveness. The stream moves. The agent acts. The user knows.