mdz Formal Grammar

This document is the normative reference for mdz (strict markdown dialect) syntax, using Rust-inspired grammar notation. It is verified against the parser source (mdz_lexer.ts, mdz_helpers.ts) and the fixture suite.

For a guided tour with live examples, see /docs/usage. For the streaming model, see /docs/streaming. The full table of divergences from CommonMark and GFM follows the grammar at the end of this page.

Grammar Notation

Notation              Examples                      Meaning
--------              --------                      -------
CAPITAL               TEXT, NEWLINE, BACKTICK       A terminal token
_ItalicCamelCase_     _Paragraph_, _Heading_        A non-terminal production
`literal`             `**`, `#`, `---`              The exact character(s)
x?                    _LanguageHint_?               An optional item
x*                    _InlineNode_*                 0 or more of x
x+                    _InlineNode_+                 1 or more of x
x{a..b}               BACKTICK{3..}                 a to b repetitions of x
Rule1 Rule2           `#` SPACE _Text_              Sequence of rules in order
|                     _Bold_ | _Italic_             Either one or another
[ ]                   [`a`-`z`]                     Any character in the set
~[ ]                  ~[`\n`]                       Any character except those listed
( )                   (`*` _InlineNode_)?           Groups items
<column 0>            <must start at column 0>      Special constraint
<followed by>         <followed by newline or EOF>  Lookahead constraint

Sequences have higher precedence than | alternation.

Character Classes

NEWLINE           = `\n`
CR                = `\r`
SPACE             = ` `
TAB               = `\t`
LINE_WS           = SPACE | TAB | CR
WHITESPACE        = SPACE | TAB | NEWLINE
LETTER            = [`a`-`z`] | [`A`-`Z`]
DIGIT             = [`0`-`9`]
WORD_CHAR         = LETTER | DIGIT
URI_CHAR          = WORD_CHAR | [`-` `.` `_` `~` `!` `$` `&` `'` `(` `)` `*` `+` `,` `;` `=` `:` `@` `/` `?` `#` `%`]

SPACE is the literal space character only. Where a block production requires SPACE (after heading hashes), a tab does not qualify — a tab there makes the construct literal text. Trailing LINE_WS on an HR or fence line is tolerated and never significant.

URI_CHAR is the RFC 3986 path character set (unreserved + sub-delims + :@/?#%). It is what URL and path scans accept; notably it excludes whitespace, ], <, >, and quotes.

Document Structure

_Document_ = _Node_*

_Node_ = _Paragraph_
       | _Heading_
       | _HorizontalRule_
       | _List_
       | _Blockquote_
       | _Codeblock_
       | _Table_

_Paragraph_ = _InlineNode_+ (_ParagraphBreak_ | EOF)

_ParagraphBreak_ = NEWLINE _BlankLine_+

_BlankLine_ = LINE_WS* NEWLINE

A document is zero or more block-level nodes. Block elements (headings, horizontal rules, lists, blockquotes, codeblocks, tables) are recognized only at column 0 — at the start of the input or immediately after a newline. They can interrupt a paragraph without a blank line. Everything else accumulates into paragraphs, separated by one or more blank lines. A line is blank when it's empty or contains only whitespace (spaces, tabs, \r) — an invisible trailing space never changes document structure. Leading and trailing blank lines around the document are skipped.

Inline Formatting

_InlineNode_ = _Bold_
             | _Italic_
             | _Strikethrough_
             | _InlineCode_
             | _Link_
             | _Component_
             | _Element_
             | _Text_
             | NEWLINE

_Bold_ = `**` _InlineNode_+ `**`

_Italic_ = `_` _InlineNode_+ `_`

_Strikethrough_ = `~~` _InlineNode_+ `~~`

_InlineCode_ = BACKTICK ~[BACKTICK | NEWLINE]+ BACKTICK

Matching:

An opening delimiter pairs with the first matching closing delimiter after it (greedy-first). Children are then parsed only within that range, so nested formatting cannot consume a parent's closer. A paragraph break (a blank line) between opener and closer invalidates the pair — the opener renders as literal text.

All inline formatting nests: **~~_text_~~** is bold containing strikethrough containing italic. Bold, italic, and strikethrough may span single newlines (soft breaks). Inline code may not — a newline before the closing backtick makes both backticks literal.

Validation:

Empty formatting renders as literal text: ****, __, ~~~~, and two adjacent backticks (empty inline code) all stay literal. If any check fails (no closer, bad boundary, paragraph break), the delimiter renders as literal text — mdz prefers false negatives over false positives.

Word boundaries:

Bold and strikethrough have no boundary restrictions — their doubled delimiters are unambiguous enough to pair anywhere, so foo**bar**baz and foo~~bar~~baz format intraword. A single ~ is always literal text (~/dev/paths and ~approximations never open a span); only a paired ~~ delimits, and the first ~~ after the opener closes it.

Italic checks two characters: the character immediately before the opening delimiter, and the character immediately after the closing delimiter. Each must not be a WORD_CHAR and must not be _ itself. This keeps snake_case_identifiers literal while allowing word _emphasis_ word.

Other formatting delimiters are transparent to these checks (they don't count as word characters), so adjacent and flush-nested formatting works:

**bold**_italic_ → Bold then Italic
_~~x~~_          → Italic containing Strikethrough

But _ is opaque to itself — it counts as a word character in its own boundary checks — so underscore runs never pair:

__init__ → literal text (dunder identifiers never half-match)

This keeps the doubled-underscore bold convention from other dialects (__bold__) fully literal instead of half-matching. Doubled tildes are mdz's own strikethrough delimiter, so ~~struck~~ pairs.

Inline code has no boundary checks — backticks work anywhere, including intraword.

Links

_Link_ = _MarkdownLink_
       | _AutoUrl_
       | _AutoPath_

_MarkdownLink_ = `[` _LinkText_ `]` `(` _LinkRef_ `)`
_LinkText_     = _InlineNode_*        <no bare `]` between children — inside a child (e.g. a code span) `]` is content>
_LinkRef_      = URI_CHAR+

_AutoUrl_ = (`https://` | `http://`) URI_CHAR+
            <scheme match is case-insensitive>
            <preceded by a non-WORD_CHAR or start of input>

_AutoPath_ = _AbsolutePath_ | _RelativePath_
             <preceded by SPACE, TAB, NEWLINE, or start of input>
_AbsolutePath_ = `/` URI_CHAR+        <second character must not be `/` or whitespace>
_RelativePath_ = (`./` | `../`) URI_CHAR+
                 <character after the prefix must not be `/` or whitespace>

Markdown links:

Display text can contain inline formatting and may be empty. Only ] delimits it — parentheses in display text are ordinary content, balanced or not ([see (note)](/u), [a ) b](/u)). The reference must be non-empty and consist entirely of URI_CHARs — any other character (including whitespace) invalidates the link. The reference ends at the first ) after ](, so parentheses never nest in references ([a](/u(1)) links to /u(1 and the trailing ) is text). References with unsafe schemes (javascript:, data:, anything with : that isn't http(s)://) are rejected at parse time. In every rejection case, no link forms: the leading [ renders as literal text and the rest of the input re-parses as ordinary content (formatting inside the would-be display text still applies).

Auto-detected URLs:

The scheme match is case-insensitive (RFC 3986 §3.1) — Https:// from mobile auto-capitalization works; original casing is preserved in the reference. The character before the scheme must not be a word character: xhttps://... stays plain text. The scan consumes URI_CHARs and stops at whitespace or any other character.

Auto-detected paths:

Must be preceded by whitespace or start of input. // is rejected (protocol-relative/comment-like), as is a prefix followed immediately by whitespace.

Trailing punctuation trim:

After scanning, trailing . , ; : ! ? characters are trimmed from the end of the reference (GFM convention), so See https://fuz.dev. links the URL without the period. Unbalanced trailing ) characters are also trimmed, but balanced parentheses are kept: https://en.wikipedia.org/wiki/Markdown_(markup_language) keeps its closing paren. The trimmed characters render as ordinary text after the link.

Link types and rendering:

Link nodes carry link_type: 'external' for http(s):// references, 'internal' for paths. Absolute paths resolve through SvelteKit's resolve(); relative paths resolve against the base prop/context when provided, or pass through as raw hrefs otherwise.

As defense in depth, the renderers also strip href from any reference that contains : without an http(s):// prefix — see mdz_is_safe_reference.

Headings

_Heading_ = <column 0> `#`{1..6} SPACE _InlineNode_+ <followed by newline or EOF>

Must start at column 0, have 1-6 hashes, a literal space after the hashes (a tab does not qualify), and at least one non-whitespace character on the rest of the line. Heading content can include any inline formatting.

Invalid forms render as paragraph text: #No space, # Indented, ####### Seven, ## (no content).

Each heading gets a lowercase slugified id for fragment links — inline formatting is stripped, then the text is lowercased, non-word characters dropped, and whitespace/hyphen runs collapsed to single hyphens (## Async Patterns → id="async-patterns"). See mdz_heading_id.

Horizontal Rules

_HorizontalRule_ = <column 0> `---` LINE_WS* <followed by newline or EOF>

Exactly three hyphens at column 0; only whitespace (spaces, tabs, \r) may follow on the line. ---, ----, and ---text are paragraph text, and - - - is a one-item list containing literal - - (it fails the three-hyphen rule, and - + space is a valid list marker).

mdz has no setext headings, so --- after a paragraph is always a horizontal rule — unlike CommonMark, where text\n--- becomes an <h2>.

Lists

_List_ = _ListItem_+                      <all items share one marker type>

_ListItem_ = (_MarkerLine_ | _EmptyMarkerLine_) _ItemContent_*

_MarkerLine_      = _Marker_ SPACE LINE_WS* _InlineNode_+
_EmptyMarkerLine_ = _Marker_ LINE_WS* <followed by newline or EOF>
_Marker_          = `-` | DIGIT{1..9} `.`

_ItemContent_ = _ContinuationLine_ | _Paragraph_ | _List_ | _Codeblock_ | _Blockquote_ | _Table_

_ContinuationLine_ = NEWLINE LINE_WS+ _InlineNode_+
                     <indent deeper than the item's marker;
                      not a marker, fence opener, or quote opener line>

Markers. Unordered items use -, ordered items use 1–9 digits and . — each followed by exactly one literal space (a tab does not qualify). Extra whitespace after the marker space is trimmed content slop, not a deeper content column. Ten or more digits, */+ bullets, and N) markers are literal text. Leading zeros parse to the integer (007. is 7) and 0. is valid.

Starting a list. A marker line at column 0 starts a list; indented markers only have meaning while a list is open. The marker line must have content — a lone - or - outside a list is literal text, so empty items can never start a list. Lists can interrupt a paragraph without a blank line, with one guard adopted from CommonMark and applied uniformly: against a current inline run (a paragraph's or an item's), only a 1. marker opens an ordered list — a hard-wrapped line beginning 2024. The… stays prose. After a blank line or block boundary, any number starts a list.

Nesting is strict-relative. Each open level remembers its marker's column. A marker indented past the deepest level's marker opens a nested list inside that level's current item (subject to the ordered guard when the item's run is current); any other marker closes levels deeper than it and joins the nearest surviving level as a sibling. A sibling marker of the other type closes the list and opens a new one at the same indent. A tab counts as one column. Unlike CommonMark there is no content-column arithmetic: one space of extra indent nests.

Continuation — no lazy form. A non-marker, non-blank line indented deeper than an open item's marker continues that item: it joins the item's current inline run across a soft break, with the structural indent stripped from content. If the item's run is no longer current (a blank line or block child intervened), the line opens a new Paragraph block child instead. A column-0 non-marker line always ends the list — there is no lazy continuation.

Blank lines contain rather than terminate. A blank run inside an open list is structural noise while what follows is still list-shaped: a marker at an open level continues as a sibling, a deeper marker nests, a deeper non-marker line becomes a new block child of the item it attaches to (a Codeblock, Blockquote, or Table when it's a fence opener, quote opener, or table start, else a Paragraph), and anything else (a column-0 non-marker line, EOF) closes the list. - a, blank, - b is one list.

Empty items are valid mid-list only: a bare - or N. (trailing whitespace tolerated) at an indent exactly matching an open level of the same marker type. Anywhere else a bare marker is literal text or continuation content.

Code blocks inside items. A line deeper than an open item's marker whose shape is a complete fence opener starts a Codeblock block child — no blank line needed. The fence's raw mode is absolute (see Codeblocks); its closer matches at any indent at or below the opener's (a deeper-indented backtick run is content), and content lines strip min(opener indent, line indent) leading whitespace, never non-whitespace.

Blockquotes inside items. A line deeper than an open item's marker that is a quote opener (a valid prefix with content after it — see Blockquotes) starts a Blockquote block child, also without a blank line. Continuation lines carry any indent deeper than the item's marker before their prefix (the pre-prefix indent is structural slop). A deeper bare prefix is continuation text, never an empty quote.

Tables inside items. A line deeper than an open item's marker that begins a table — a pipe row whose next line is a likewise-deeper delimiter row — starts a Table block child, no blank line needed (the fence/quote precedent, with a two-line lookahead). Body rows continue while each stays a pipe row indented past the item's marker; a line that dedents to the marker, a blank line, or a non-row line ends the table and is re-classified by the list. Cells stay inline-only, exactly as at the top level.

Marker-line content is inline-only. The remainder of a marker line is the item's inline run, never block constructs: - - a is an item containing literal - a, and a fence or pipe row on a marker line is literal text (`` - | a | `` is the item content | a |, not a table header). Nested lists, fences, quotes, and tables all require their own lines.

Inline scoping. Each inline run — a marker line plus its continuation lines, or one Paragraph child — is a paragraph-equivalent region: delimiter pairs are bounded to the run, so **a in one item never pairs with ** in the next, while bold does span an item's own soft breaks.

Numbering and rendering. The AST preserves each item's authored number; rendering follows GFM — <ol start> from the first item, the browser numbers the rest. Items render tight, per item: the first run renders bare and every subsequent block child renders as a block. There is no CommonMark loose/tight distinction (it's retroactive, which streaming forbids).

Blockquotes

_Blockquote_ = _QuoteLine_+      <the stripped line remainders form a _Document_>

_QuoteLine_   = <column 0> _QuotePrefix_ ~[NEWLINE]* (NEWLINE | EOF)
_QuotePrefix_ = `>` (SPACE? `>`)* (SPACE | <only LINE_WS to the line end>)

A quote's content is a mini-document behind a per-line prefix. Strip one prefix level from each line and the remainders parse by the full document grammar, with the quote's content column as column 0: paragraphs, headings, horizontal rules, lists (with all their machinery, including the ordered interrupt guard), code blocks, and nested blockquotes all work inside by recursion.

The prefix. A run of > markers — tight (>>) or separated by single spaces (> >), interchangeably, even line to line — terminated by exactly one space or the end of the line. Depth is the count of >. Like every mdz block marker, the space is required: >a is literal text, and a tab does not qualify (> followed by a tab is literal unless only whitespace follows to the line end). Extra whitespace after the terminating space is content. A > not followed by a space, another >, or the line end is content, and the separator space before it re-serves as the terminator: > >x is a depth-1 quote containing literal >x, and >>x has no valid prefix at all.

Starting a quote. A prefix line at column 0 whose line has non-whitespace content after the full prefix starts a quote — contentless prefix lines never start one (no empty start, mirroring lists), so a lone > outside a quote is literal text. Multi-level prefixes open all their levels at once. Quotes interrupt paragraphs and item runs without a blank line, like every block construct.

Continuation — no lazy form. A line continues an open quote only if it carries a prefix at column 0. The line's depth resolves structure: deeper opens nested quotes (interrupting a current paragraph), equal continues the deepest level's content, shallower closes inner levels — and the closed level's paragraph never continues (the remainder classifies fresh at the surviving level). A line with no prefix ends the quote entirely, along with everything open inside it; there is no lazy continuation. An indented prefix line also ends it — quote prefixes have no meaning at an indent except inside an open list item.

Empty prefix lines. A bare prefix (just > markers, trailing whitespace tolerated) is a blank line in the quote's content document — the in-quote paragraph separator: > a, >, > b is one quote with two paragraphs. A bare prefix deeper than the open depth is content, not structure (no-empty-start means it can't open a level).

A blank line ends the quote — the whole stack. Two prefix runs separated by a blank line are two adjacent blockquotes, exactly as in CommonMark; the in-band bare-prefix form above is how a single quote holds multiple paragraphs. A partial-depth bare prefix composes by recursion: at depth 2, a bare > line is a blank in depth 1's content, closing the depth-2 quote and breaking the paragraph.

Code blocks inside quotes follow the document rules on the stripped content, with one boundary consequence: a fence binds to the innermost marker-delimited container, so an unclosed fence inside a quote consumes to the quote's end, not the input's. A non-prefix line ends the quote and the fence together — there is no lazy continuation into fences.

Rendering. <blockquote> wrapping block children; content is uniformly block-level, so paragraphs render as <p>. Headings inside quotes get their slugified ids as anywhere else.

Codeblocks

_Codeblock_ = _OpenFence_ _CodeLine_* (_CloseFence_ | EOF)

_OpenFence_  = <column 0> BACKTICK{3..} _LanguageHint_? LINE_WS* NEWLINE
_CloseFence_ = <column 0> BACKTICK{n..} LINE_WS* <followed by newline or EOF>
               <where n is the opening fence length>

_CodeLine_     = ~[NEWLINE]* NEWLINE  <any line that is not a _CloseFence_>
_LanguageHint_ = ~[LINE_WS | NEWLINE]+

The opening fence is 3 or more backticks at column 0. The language hint runs to the first whitespace or newline — it cannot contain whitespace, and only whitespace may follow it on the fence line. The closing fence is a line of at least as many backticks as the opening fence, optionally followed by whitespace — a longer run still closes.

Codeblocks may be empty: an immediately-closed fence pair is a codeblock with empty content. Raw mode is absolute: a complete opening fence line always opens a codeblock, and a fence with no closing fence consumes the rest of the input as content — streaming-first, the fence decision is made from the opener line alone, never from input that may arrive later. A fence binds to the innermost marker-delimited container: the document at top level, or the enclosing blockquote (an unclosed fence inside a quote ends at the quote's end). An opening fence cut off at end of input without its newline stays paragraph text. The content is raw text — nothing inside is parsed. The closing fence must be its own line: a backtick run at the end of a content line never closes the block, and a fence-length run with anything but whitespace after it is content.

Shorter fences nest inside longer ones:

```
Use ``` for codeblocks.
```

The inner ``` does not close the outer fence because a closing run must be at least the opener's length. Keep the outer fence the longest — a run longer than the opener closes it.

Tables

_Table_        = _Row_ _DelimiterRow_ _Row_*
                 <header row, delimiter row with a matching column count, then body rows>
_Row_          = <column 0> `|` _Cell_ (`|` _Cell_)* `|` LINE_WS* (NEWLINE | EOF)
_Cell_         = <trimmed inline content; a literal `|` only via `\|` or inside an inline-code span>
_DelimiterRow_ = `|` _DelimCell_ (`|` _DelimCell_)* `|` LINE_WS*
_DelimCell_    = LINE_WS* `:`? `-`+ `:`? LINE_WS*
                 <`:`…`:` center, `:`… left, …`:` right, … none>

GFM-style pipe tables, tightened to the streaming grain: a header row, a delimiter row, then zero or more body rows.

Required outer pipes. Every row starts and ends with | — | a | b |, not the GFM loose form a | b. The leading pipe makes a row self-identifying at column 0 (and tells a one-cell | --- | apart from a horizontal rule); a row missing either pipe is not a table row. Trailing whitespace after the last pipe is tolerated.

The delimiter row is mandatory and fixes the columns. The second line must be a delimiter row — each cell is one or more - with an optional leading and/or trailing :. Its column count must equal the header's, or the construct is not a table and the held header line flushes to a paragraph (false negatives over false positives). This is the grammar's only two-line lookahead; the streaming parser holds the header until its delimiter line resolves, a bounded one-line hold.

Alignment comes from the delimiter colons: :-- left, :-: center, --: right, --- none. The default rendering applies it per cell as style="text-align:…"; the alignment array is also on the Table node, so a consumer that avoids inline styles (e.g. under a strict CSP) can re-theme from it.

Cells are inline-only. A cell is the trimmed span between pipes, parsed as inline content (text, bold, italic, strikethrough, code, links) — never block constructs, so a leading # or - in a cell is literal text. A literal | reaches a cell two ways: inside an inline-code span (`` a | b `` keeps its pipe — mdz protects code spans, unlike GFM), or via \|. That \| is the only escape mdz recognizes and is scoped to table cells — \*, \_, and a lone \ stay literal everywhere, tables included.

Body rows are tolerant. The AST preserves each row's authored cells; the renderer normalizes to the column count, padding short rows with empty cells and dropping cells past the count (their content survives in the tree for tooling). A header/delimiter mismatch rejects the table, but a ragged body row does not.

Boundaries. A table interrupts a paragraph without a blank line, like every block construct, and ends at a blank line, a line that is not a valid | … | row, or EOF — the ending line then re-parses as a fresh block.

Rendering. <table> with a <thead> of <th> cells for the header and a <tbody> of <td> cells for the body rows (the <tbody> is omitted when there are none); each cell carries the column's text-align when set.

Components and Elements

_Component_ = _SelfClosingTag_ | _TagWithChildren_     <name starts [`A`-`Z`]>
_Element_   = _SelfClosingTag_ | _TagWithChildren_     <name starts [`a`-`z`]>

_SelfClosingTag_  = `<` _TagName_ SPACE* `/>`
_TagWithChildren_ = `<` _TagName_ SPACE* `>` _InlineNode_* `</` _TagName_ `>`

_TagName_ = LETTER (LETTER | DIGIT | `-` | `_`)*

Components start with an uppercase letter (<Alert>), elements with lowercase (<aside>). Attributes are not supported — a tag with anything other than optional spaces between the name and >//> is literal text.

A non-self-closing tag commits only if its exact closing tag exists later in the input with no paragraph break in between; otherwise the < renders as literal text. Children can include any inline formatting and nested tags.

Tag scoping. A tag's children are a bounded region, like a heading line or a list-item run: an inline delimiter inside the tag never pairs across the closing tag (in ``<b>`a</b>x , the backtick is literal — a code span can't swallow </b>), and a nested same-name tag consumes the nearest closer, leaving the outer tag to find its own later (<b>x<b>y</b></b> nests; when no closer remains for the outer tag, it degrades to literal <` and the rest re-parses).

Inline nesting is bounded. Link and tag containers nest only so deep — past a fixed cap, a further [ or <tag> renders as literal text instead of opening a new container. It's a strict untrusted-content bound (CommonMark permits such a limit), set deliberately far above any real content, so only pathological input reaches it. Delimiters (**, _, ~~, `` `) don't count toward the cap — first-closer-wins keeps them from deep-nesting on their own.

Components and elements must be registered (via MdzRoot or the contexts directly) to render; unregistered tags render as inert <Name /> placeholders. See /docs/usage for the registration API.

HTML void elements (br, img, hr, …) always render self-closing. The grammar still parses the open/close form (<br>x</br> is an element with children — the parser doesn't know void-ness), but renderers drop the children, matching HTML, where a </br> closer is ignored.

MDX convention: a paragraph consisting of a single component or element (plus only whitespace) is not wrapped in <p>.

Whitespace

Within a paragraph, whitespace is preserved in text nodes exactly as authored — single newlines stay literal \n characters and no <br> nodes are created. A single newline is a soft break: the default rendering applies no white-space style, so it displays as a space. Consumers opt into other presentations via the whitespace prop (pre-line, pre-wrap).

Structural whitespace is markup: a blank line ends a paragraph (a line of only spaces, tabs, or \r is blank), any longer blank run is still a single paragraph break, and trailing newlines at the end of a paragraph are trimmed. Whitespace-only paragraphs are dropped. Both definitions are ASCII-only: a line or paragraph of Unicode whitespace (U+00A0 NBSP, U+3000, etc.) is content, not blank — matching CommonMark. Trailing whitespace on a horizontal rule or fence line is tolerated but never significant — nothing in the grammar depends on end-of-line whitespace. \r counts as line-end whitespace, so CRLF input parses; mdz does no other CRLF handling (a \r before the newline stays in text content).

Block constructs must start at column 0 — leading whitespace makes them paragraph text.

Parsing Model

Single forward pass, no backtracking. Delimiter matching is greedy and bounded: an opener finds its first closer, then children parse only within that window. Invalid syntax never errors — it renders as literal text (false negatives over false positives). All nodes carry start/end source offsets.

The same grammar is implemented twice: the synchronous lexer pipeline behind mdz_parse, and the incremental MdzStreamParser (see /docs/streaming). Parity tests assert both produce identical trees for the same input.

AST

mdz_parse produces MdzNode trees: Text, Code, Codeblock, Bold, Italic, Strikethrough, Link, Paragraph, Hr, Heading, List, ListItem, Blockquote, Table, TableRow, TableCell, Element, and Component nodes. The type definitions live in mdz.ts — see its API docs for the authoritative shapes rather than a copy here.

CommonMark and GFM Divergences

mdz takes after CommonMark and GFM — their behavior is the baseline wherever it fits streaming. But mdz is a dialect, not a subset: it deliberately supports one syntax per feature and drops constructs whose ambiguity costs more than they're worth:

Feature	CommonMark/GFM	mdz
bold	`text` or `__text__`	`text` only
italic	`text` or `_text_`	`_text_` only — single `*` is literal
strikethrough	`~~text~~`, and `~text~` on github.com	`~~text~~` only — a single `~` is always literal, so `~/dev/paths` never strike
doubled delimiters	`__text__` bold, `~~text~~` strike	`~~text~~` strikes; `__text__` stays literal — underscore runs never pair, so `__init__` is safe
horizontal rule	`---`, `***`, `___`, any length	exactly `---`
setext headings	`text` + `---` = h2	none — `---` is always an HR
indented code	4-space indent	none — fenced only
block indentation	up to 3 leading spaces	column 0 required for all block starts (list nesting indents within an open list)
list markers	`-` `*` `+`, `N.` `N)`	`-` and `N.` only, exactly one space
spaces after list marker	1–4, moves the content column	fixed — extras are trimmed content
empty list items	allowed anywhere (trailing-space sensitive)	mid-list bare `-`/`N.` only; never starts a list
list nesting	content-column arithmetic, 4-column tab stops	indent past the deepest level nests, dedent snaps to the nearest level; tab = 1 column
lazy continuation	yes — a dedented line continues the item	none — continuation must be indented past the marker
list end	blank-line + indent arithmetic	column-0 non-marker line, non-list-shaped blank, or EOF
loose/tight lists	per-list, retroactive — one blank wraps every item in `<p>`	always tight, per item — blank-separated block children render as blocks
block constructs on a marker line	`- - a` nests; a fence, quote, or table can open mid-marker-line	marker-line remainder is always inline content (a table must start on its own indented line)
dedent inside a list item's fence	force-closes the fence and list	raw mode absolute — content until the closer or EOF
task lists	`- [ ]` checkboxes (GFM)	literal `[ ]` for now (matches CommonMark core)
blockquote marker	`>`, space optional, tab ok, 0–3 indent slop	a `>` run (`>>` or `> >`) + exactly one space or end of line; `>a` and an indented `>` stay literal
empty blockquotes	`>` alone is an empty blockquote; leading bare lines absorbed	never starts — a bare `>` outside a quote is literal text
blockquote lazy continuation	yes — unprefixed lines continue paragraphs at any depth	none — every quote line carries its prefix
blank line after a quote	ends the quote (two quotes); bare `>` lines separate paragraphs	same
unclosed fence in a quote	closes at the quote's end	same — a fence binds to the innermost marker-delimited container
tables	pipe tables; outer `\|` optional (`a \| b`)	leading and trailing `\|` required (`\| a \| b \|`)
literal pipe in a table cell	`\\|` escape; a `\|` inside `code` still splits the cell	`code` protects its pipes; `\\|` also escapes one
tag scoping	a code span can contain raw-HTML closers — backtick scans ignore tags	inline delimiters never pair across the enclosing tag's closer, so <b>`a</b>x` keeps the tag; nested same-name tags consume the nearest closer first
reference links	`[text][ref]`	none — inline `[text](url)` only
link text	balanced `[ ]` nest	the first bare `]` between children ends it (inside a code span it's content); parens are plain text, balanced or not
link destinations	balanced parens, `<url>` form, optional `"title"`	the first `)` after `](` ends the reference — no nesting, no titles; any whitespace invalidates the link
inline nesting depth	no depth limit	link/tag containers cap out — deeply nested `[`/`<tag>` render literal (a strict untrusted-content bound)
hard breaks	trailing double-space or `\`	`<br />` (registered element)
relative path links	not auto-linked	`./path` and `../path` auto-link