mdz_stream_parser.ts

Streaming opcode parser for mdz.

Fed chunks of text (e.g., from LLM output), emits opcodes as rendering instructions. Makes optimistic assumptions about ambiguous syntax and emits revert opcodes to correct when wrong. Never re-parses — corrections are bounded, explicit opcodes (revert, wrap, trim_text).

The design is derived from {@link https://bsky.app/profile/pngwn.at/post/3mi527zntb22n @pngwn.at}'s ideas: restrict the syntax so streaming is tractable, render optimistically and correct when wrong, emit serializable opcodes to avoid re-parsing, and keep the opcodes target-agnostic so any renderer can consume them. mdz diverges in one respect: the Svelte consumer (MdzStreamState) does build a reactive tree from opcodes — the platform dictates this — but mutations are fine-grained via $state, not diffed.

The parser is split across sibling modules: this file holds the public MdzStreamParser class and the process_loop / process_inline orchestrators. Per-category handlers (block / inline / link / url / text) live in mdz_stream_parser_*.ts as free functions taking the shared MdzStreamParserState as first argument.

Usage:

const parser = new MdzStreamParser(); parser.feed('hello **bold'); const ops1 = parser.take_opcodes(); // open Paragraph, text "hello ", open Bold, text "bold" parser.feed('** world'); const ops2 = parser.take_opcodes(); // close Bold, text " world" parser.finish(); const ops3 = parser.take_opcodes(); // close Paragraph
view source

Declarations
#

MdzStreamParser
#

mdz_stream_parser.ts view source

import {MdzStreamParser} from '@fuzdev/mdz/mdz_stream_parser.js';

Streaming opcode parser for mdz content. Feed chunks via feed(), retrieve opcodes via take_opcodes(), call finish() at end.

The opcode sequence is not deterministic across chunk boundaries — the same input fed in different chunk sizes may produce different text/append_text splits and different optimistic/revert sequences. The final tree (via mdz_opcodes_to_nodes) matches the one-shot result: optimistic opens are corrected by revert opcodes at the first failed closer, at run close (paragraph, heading, or list-item run), or at EOF, and at EOF the delimiter-paired opens (bold/italic/strikethrough) and inline-code candidates are gated on a closer scan of the complete final buffer, so held tails parse like the one-shot parse. The residual divergence classes are:

  • Backtick-adjacent chunking — an inline-code candidate held across chunks can make its text-vs-code decision bounded by a wrongly-optimistic italic (opened before its failed closer was visible), where the one-shot parse greedy-rejects that italic and scans unbounded — `` ___ `` chunked at the italic stays flat text where one-shot parses the code span. Italic is the only wedge (the only delimiter whose one-shot form rejects on a failed first closer); see try_code's hold and code_search_limit.
  • Optimistic inline code unclosed at EOF — a `` ` that opened optimistically consumes its tail as raw code text, so formatting inside it never forms (`` h x **b** z ` stays flat where one-shot parses the bold); the parser never re-parses, so the EOF revert can only flatten it to text.
  • Link/tag opens at EOFfinish() doesn't open links or tags, so a held tail containing complete [text](url) / <Tag>…</Tag> syntax parses flat.
  • Block elements interrupt optimistic inlines — at column 0 a heading/HR/fence/list/blockquote line interrupts the open paragraph even when an optimistic inline container spans it (**a\n# h\nb** parses the heading here; the one-shot parse, knowing the closer exists, swallows the line as bold text). Inherent: the swallow is only correct when a closer eventually arrives, which streaming can't know — interrupting matches the one-shot parse on the no-closer flip side (**a\n# h) and renders blocks promptly.

Lifecycle: one parser instance per stream — feed() any number of times, finish() exactly once, then a final take_opcodes(). There is no reset(); calling feed() or finish() after finish() is undefined. To restart, construct a new parser (and a new consumer — see MdzStreamState).

feed

Feed a chunk of text to the parser. Opcodes are accumulated and retrieved via take_opcodes().

type (chunk: string): void

chunk

type string
returns void

finish

Signal end of input. Resolves all pending state: closes open blocks, reverts unclosed optimistic opens, trims trailing newlines.

Trailing-newline trimming is handled in one place: trim_trailing_newline() called at the top of close_paragraph() and close_codeblock_at_eof(), before either function reverts its inner stack. The trim sees the just-flushed text node's last_text_id (or a still-accumulated \n) and emits a trim_text opcode. Revert opcodes only fire after.

Optimistic-container revert is handled by close_paragraph and close_heading (each reverts everything above its own block frame via revert_above), so no separate revert pass is needed — optimistic containers can only exist inside an open Paragraph or Heading (parser invariant).

type (): void

returns void

take_opcodes

Drain and return all accumulated opcodes. Destructive — empties the internal queue. The returned array is owned by the caller.

type (): MdzOpcode[]

returns MdzOpcode[]

Depends on
#