We use cookies to understand how you use this site and improve your experience.

Alexandru Mareș@allemaar
Alexandru Mareș
  1. Home
  2. Writing
  3. 226 Milliseconds
Email
RSS
YounndAIYou and AI, unifiedBuilt withNollamaNollama

226 Milliseconds

Alexandru Mareș

On this page

  • The Wrong Number
  • The Physics of Waiting
  • The Architecture
  • The Shadow
  • The Trade-Off
  • The Rules
PreviousAgent Economy: Trust and Transactions
NextHybrid Workflows: The Context Switch
Related
Monetizing Truth
07/03/2026
Future-Proofing: The Discipline of the Unknown06/03/2026
Code for Mars: Why High-Latency Agents Need YON02/03/2026
Published17/02/2026
Read time3 min
Topics
ArchitecturePerformanceYON
Actions
00
Comments

Loading comments…

Leave a comment

0/2000

The Wrong Number

I stared at the benchmark results. The parse speed was ten times slower than JSON. I should have been horrified.

Instead I was smiling.

I was looking at the wrong number. Everyone looks at parse speed. How many operations per second can the parser consume a complete document? JSON wins this race easily. A hundred thousand operations per second versus roughly eleven thousand for YON.

But this metric is a relic of the batch era. It measures how fast you can eat a meal that has already been cooked. In the world of AI agents, the meal is being cooked while you eat.

The Physics of Waiting

An LLM generates text slowly. It produces perhaps fifty to a hundred tokens per second. A JSON parser sits idle during this generation. It buffers the entire response. It waits for the closing brace. It creates a pause, a dead silence, that breaks the illusion of intelligence.

I realized that the metric that matters is not parse speed. It is Time to First Action.

How long from the moment the first byte arrives until the system can do something useful?

For YON, that number is 0.226 milliseconds.

Not 226 milliseconds. 0.226.

The Architecture

The trick is the newline.

In block formats, context is nested. To understand a node, you must know its parent. You track depth. The parser maintains a stack frame. You must see the whole tree before you can read a single leaf.

In YON, context is local. Every record is self-contained. The architecture is simple:

  1. Input. A stream of bytes from the LLM.
  2. Buffer. Accumulate bytes until \n.
  • Action. Parse the line. Dispatch the record. Clear the buffer.
  • This loop executes in microseconds. The 0.226ms metric represents the time from receiving the first byte to structuring the first usable object.

    The Shadow

    This architecture enables something I call shadow execution.

    Consider a five-step agent plan. In a block format, the system must generate all five steps. The document closes. The runner begins executing step one.

    In a YON stream, the system generates step one. The runner executes step one while the system generates step two. Execution happens in the shadow of generation.

    The total wall-clock time shrinks. The user sees activity instantly. The waiting spinner disappears.

    The Trade-Off

    I disclose the trade-off because I believe in honesty.

    YON is verbose. It carries a 13% token overhead compared to minified JSON. It is slower to parse in bulk. If you have a complete document sitting on disk, JSON is the faster read.

    I accept this cost because I am not optimizing for storage. I am optimizing for perception. I am trading raw CPU cycles for perceived latency. I am trading density for resilience.

    If a stream fails at 99%, JSON loses everything. The document is syntactically invalid. YON preserves the 99%. Every completed line is a valid record. Partial failure does not destroy the whole.

    The Rules

    When building YON parsers, I follow three rules:

    Never buffer the whole stream. If you wait for EOF, you have failed the architecture.

    Dispatch on newline. The newline is the transaction boundary.

    Handle partials gracefully. If the stream ends without a final newline, the last record is incomplete. Discard it. Keep the rest.

    Speed is not throughput. Speed is responsiveness. The stream moves. The agent acts. The user knows.