Schema

The schema defines the structure of documents in ProseKit. It specifies what types of nodes and marks exist, their attributes, and how they can be composed.

What is a Schema?

A schema is a ProseMirror Schema object that defines:

Nodes: The types of content blocks (paragraph, heading, image, etc.)
Marks: Formatting that can be applied to text (bold, italic, link, etc.)
Content Rules: What content is allowed where
Attributes: Data attached to nodes and marks
Serialization: How to convert to/from HTML/DOM

In ProseKit, schemas are built automatically from extensions.

Schema Construction

Schemas are constructed through the facet system: Extensions contribute to the schema through schemaSpecFacet (packages/core/src/facets/schema-spec.ts):

import type { SchemaSpec } from '@prosekit/pm/model'

type SchemaSpec = {
  nodes?: OrderedMap<NodeSpec> | { [name: string]: NodeSpec }
  marks?: OrderedMap<MarkSpec> | { [name: string]: MarkSpec }
  topNode?: string
}

The schema facet creates the actual schema (packages/core/src/facets/schema.ts):

import { Schema, type SchemaSpec } from '@prosekit/pm/model'
import { defineFacet, type Facet } from './facet.ts'
import { rootFacet, type RootPayload } from './root.ts'

export const schemaFacet: Facet<SchemaSpec, RootPayload> = defineFacet({
  reducer: (specs) => {
    assert(specs.length <= 1)
    const spec = specs[0]
    const schema = spec ? new Schema(spec) : null
    return { schema }
  },
  parent: rootFacet,
  singleton: true,
})

The schema facet is a singleton - only one schema can exist per editor. This is because changing the schema requires recreating the entire editor.

Node Specifications

Nodes are defined using defineNodeSpec() (packages/core/src/extensions/node-spec.ts):

export interface NodeSpecOptions<
  NodeName extends string = string,
  Attrs extends AnyAttrs = AnyAttrs,
> extends NodeSpec {
  /** The name of the node type */
  name: NodeName

  /** Whether this is the top-level node type */
  topNode?: boolean

  /** The attributes that nodes of this type get */
  attrs?: {
    [key in keyof Attrs]: AttrSpec<Attrs[key]>
  }

  // From ProseMirror NodeSpec:
  content?: string       // Content expression
  marks?: string         // Allowed marks
  group?: string         // Node groups
  inline?: boolean       // Inline vs block
  atom?: boolean         // Atomic node
  selectable?: boolean   // Can be selected
  draggable?: boolean    // Can be dragged
  code?: boolean         // Preserves whitespace
  defining?: boolean     // Structure boundary
  isolating?: boolean    // Isolates content
  parseDOM?: ParseRule[] // HTML parsing rules
  toDOM?: (node: Node) => DOMOutputSpec // DOM rendering
  // ... and more
}

Content Expressions

Content expressions define what content is allowed inside a node:

// Basic expressions
'text*'           // Zero or more text nodes
'inline*'         // Zero or more inline nodes
'block+'          // One or more block nodes
'paragraph+'      // One or more paragraphs

// Sequences
'heading paragraph+' // Heading followed by one or more paragraphs

// Alternatives
'(paragraph | heading)+' // One or more paragraphs or headings

// Groups
'block*'          // Any node in the 'block' group

// Attributes
'paragraph{id}'   // Paragraph with id attribute

Content expressions use a regular expression-like syntax. See ProseMirror’s content expression guide for full details.

Example: Document Node

import { defineNodeSpec } from '@prosekit/core'

export function defineDoc() {
  return defineNodeSpec({
    name: 'doc',
    topNode: true,          // This is the document root
    content: 'block+',      // Contains one or more block nodes
  })
}

Example: Paragraph Node

import { defineNodeSpec } from '@prosekit/core'

export function defineParagraph() {
  return defineNodeSpec({
    name: 'paragraph',
    content: 'inline*',     // Contains inline content
    group: 'block',         // Is a block node
    parseDOM: [{ tag: 'p' }],
    toDOM() {
      return ['p', 0]       // Render as <p> tag, 0 is content slot
    },
  })
}

Example: Image Node

import { defineNodeSpec } from '@prosekit/core'

export function defineImage() {
  return defineNodeSpec({
    name: 'image',
    inline: true,           // Inline node
    atom: true,             // Cannot be directly edited
    group: 'inline',        // Is an inline node
    draggable: true,        // Can be dragged
    attrs: {
      src: { default: '' },
      alt: { default: '' },
      title: { default: null },
    },
    parseDOM: [
      {
        tag: 'img[src]',
        getAttrs(dom) {
          if (typeof dom === 'string') return false
          return {
            src: dom.getAttribute('src'),
            alt: dom.getAttribute('alt'),
            title: dom.getAttribute('title'),
          }
        },
      },
    ],
    toDOM(node) {
      return ['img', node.attrs]
    },
  })
}

Mark Specifications

Marks are defined using defineMarkSpec() (packages/core/src/extensions/mark-spec.ts):

export interface MarkSpecOptions<
  MarkName extends string = string,
  Attrs extends AnyAttrs = AnyAttrs,
> extends MarkSpec {
  /** The name of the mark type */
  name: MarkName

  /** The attributes that marks of this type get */
  attrs?: { [K in keyof Attrs]: AttrSpec<Attrs[K]> }

  // From ProseMirror MarkSpec:
  inclusive?: boolean       // Mark extends when typing
  excludes?: string         // Marks that can't coexist
  group?: string            // Mark groups
  spanning?: boolean        // Can span nodes
  parseDOM?: ParseRule[]    // HTML parsing rules
  toDOM?: (mark: Mark, inline: boolean) => DOMOutputSpec
  // ... and more
}

Example: Bold Mark

import { defineMarkSpec } from '@prosekit/core'

export function defineBold() {
  return defineMarkSpec({
    name: 'bold',
    parseDOM: [
      { tag: 'strong' },
      { tag: 'b' },
      {
        style: 'font-weight',
        getAttrs: (value) => /^(bold(er)?|[5-9]\d{2,})$/.test(value as string) && null,
      },
    ],
    toDOM() {
      return ['strong', 0]  // Render as <strong> tag
    },
  })
}

Example: Link Mark

import { defineMarkSpec } from '@prosekit/core'

export function defineLink() {
  return defineMarkSpec({
    name: 'link',
    attrs: {
      href: { default: '' },
      title: { default: null },
    },
    inclusive: false,       // Doesn't extend when typing
    parseDOM: [
      {
        tag: 'a[href]',
        getAttrs(dom) {
          if (typeof dom === 'string') return false
          return {
            href: dom.getAttribute('href'),
            title: dom.getAttribute('title'),
          }
        },
      },
    ],
    toDOM(mark) {
      return ['a', mark.attrs, 0]
    },
  })
}

Node and Mark Attributes

Attributes store data on nodes and marks:

Defining Attributes

import { defineNodeSpec } from '@prosekit/core'

const heading = defineNodeSpec({
  name: 'heading',
  content: 'inline*',
  group: 'block',
  attrs: {
    level: {
      default: 1,
      validate: (value) => value >= 1 && value <= 6,
    },
  },
  parseDOM: [
    { tag: 'h1', attrs: { level: 1 } },
    { tag: 'h2', attrs: { level: 2 } },
    { tag: 'h3', attrs: { level: 3 } },
    { tag: 'h4', attrs: { level: 4 } },
    { tag: 'h5', attrs: { level: 5 } },
    { tag: 'h6', attrs: { level: 6 } },
  ],
  toDOM(node) {
    return ['h' + node.attrs.level, 0]
  },
})

Adding Attributes to Existing Types

Use defineNodeAttr() or defineMarkAttr() to add attributes:

import { defineNodeAttr } from '@prosekit/core'

const paragraphId = defineNodeAttr({
  type: 'paragraph',
  attr: 'id',
  default: null,
  parseDOM: (node) => node.getAttribute('id'),
  toDOM: (value) => value ? ['id', value] : null,
})

Attribute extensions must be defined after the base node/mark. They modify existing schema entries rather than creating new ones.

Splittable Attributes

Node attributes can be marked as splittable to persist when splitting:

import { defineNodeAttr } from '@prosekit/core'

const paragraphAlign = defineNodeAttr({
  type: 'paragraph',
  attr: 'align',
  default: 'left',
  splittable: true,  // Preserves alignment when splitting paragraph
  parseDOM: (node) => node.style.textAlign || 'left',
  toDOM: (value) => ['style', `text-align: ${value}`],
})

Schema Merging

When multiple extensions define the same node or mark, their specs are merged:

const base = defineNodeSpec({
  name: 'paragraph',
  content: 'inline*',
  group: 'block',
})

const withId = defineNodeAttr({
  type: 'paragraph',
  attr: 'id',
  default: null,
})

const withAlign = defineNodeAttr({
  type: 'paragraph',
  attr: 'align',
  default: 'left',
})

// Resulting schema has paragraph with both id and align attributes
const extension = union(base, withId, withAlign)

Merge Process

From packages/core/src/extensions/node-spec.ts:142-197:

Collect Specs: All node/mark specs are collected
Merge Specs: Specs with the same name are merged using mergeSpecs()
Add Attributes: Attribute extensions modify existing specs
Wrap DOM Methods: toDOM and parseDOM are wrapped to handle new attributes
Build Schema: Final merged specs create the schema

The order matters: later extensions override earlier ones. Use the priority system to control precedence.

Node Groups

Node groups allow content expressions to reference multiple node types:

// Define nodes in groups
const paragraph = defineNodeSpec({
  name: 'paragraph',
  group: 'block',
  // ...
})

const heading = defineNodeSpec({
  name: 'heading',
  group: 'block',
  // ...
})

// Reference the group in content expressions
const doc = defineNodeSpec({
  name: 'doc',
  topNode: true,
  content: 'block+',  // Allows any node in 'block' group
})

Common groups:

block: Block-level nodes (paragraph, heading, etc.)
inline: Inline nodes (text, image, etc.)
list: List nodes (bullet list, ordered list, etc.)

Schema Validation

ProseMirror validates documents against the schema:

const doc = editor.schema.node('doc', null, [
  editor.schema.node('paragraph', null, [
    editor.schema.text('Hello'),
  ]),
])

// Validate the document
doc.check() // Throws if invalid

Validation checks:

Content matches content expressions
Attributes have valid values
Marks are allowed on the nodes they’re applied to
Structure is internally consistent

Invalid documents can cause editor crashes. Always validate user-provided content before setting it.

Working with Schemas

Accessing the Schema

const editor = createEditor({ extension })

// Access schema from editor
const schema = editor.schema

// Access node types
const paragraphType = schema.nodes.paragraph
const headingType = schema.nodes.heading

// Access mark types
const boldType = schema.marks.bold
const linkType = schema.marks.link

Creating Nodes

const schema = editor.schema

// Create a paragraph
const para = schema.node('paragraph', null, [
  schema.text('Hello world'),
])

// Create a heading with attributes
const heading = schema.node('heading', { level: 1 }, [
  schema.text('Title'),
])

// Or use node actions
const para2 = editor.nodes.paragraph('Hello world')
const heading2 = editor.nodes.heading({ level: 1 }, 'Title')

Creating Marks

const schema = editor.schema

// Create a bold mark
const bold = schema.mark('bold')

// Create a link mark with attributes
const link = schema.mark('link', { href: 'https://example.com' })

// Apply to text
const text = schema.text('Hello', [bold, link])

// Or use mark actions
const [text2] = editor.marks.bold('Hello')

Schema Immutability

Schemas are immutable after editor creation. You cannot:

Add new node or mark types
Remove existing types
Change content expressions
Modify attributes

To change the schema, you must create a new editor:

// ❌ Bad - can't add nodes dynamically
const editor = createEditor({ extension: basicExtension })
editor.use(defineTable()) // Error! Schema cannot be changed

// ✅ Good - include all nodes upfront
const fullExtension = union(
  basicExtension,
  defineTable(),
)
const editor = createEditor({ extension: fullExtension })

Common Schema Patterns

Minimal Document Schema

import { union, defineDoc, defineText, defineParagraph } from '@prosekit/core'

const extension = union(
  defineDoc(),         // Root node
  defineText(),        // Text node
  defineParagraph(),   // At least one block type
)

Rich Text Schema

import { union } from '@prosekit/core'

const extension = union(
  // Structure
  defineDoc(),
  defineText(),
  defineParagraph(),
  defineHeading(),
  defineBlockquote(),
  defineCodeBlock(),
  defineHorizontalRule(),
  
  // Lists
  defineBulletList(),
  defineOrderedList(),
  defineListItem(),
  
  // Inline
  defineImage(),
  defineHardBreak(),
  
  // Marks
  defineBold(),
  defineItalic(),
  defineCode(),
  defineLink(),
  defineStrike(),
  defineUnderline(),
)

Nested Content Schema

const table = defineNodeSpec({
  name: 'table',
  content: 'tableRow+',
  group: 'block',
  // ...
})

const tableRow = defineNodeSpec({
  name: 'tableRow',
  content: 'tableCell+',
  // ...
})

const tableCell = defineNodeSpec({
  name: 'tableCell',
  content: 'block+',  // Cells contain block content
  // ...
})

Best Practices

Always define doc and text: These are required for any schema
Use content expressions carefully: Invalid expressions can break the editor
Group related nodes: Use groups to simplify content expressions
Validate attributes: Use the validate option to prevent invalid values
Test schema changes: Schema changes can break existing documents
Document your schema: Clearly document custom nodes and their purpose

What is a Schema?

Schema Construction

1. Schema Spec Facet

2. Schema Facet

Node Specifications

Content Expressions

Example: Document Node

Example: Paragraph Node

Example: Image Node

Mark Specifications

Example: Bold Mark

Example: Link Mark

Node and Mark Attributes

Defining Attributes

Adding Attributes to Existing Types

Splittable Attributes

Schema Merging

Merge Process

Node Groups

Schema Validation

Working with Schemas

Accessing the Schema

Creating Nodes

Creating Marks

Schema Immutability

Common Schema Patterns

Minimal Document Schema

Rich Text Schema

Nested Content Schema

Best Practices

Next Steps

Architecture

Extensions

​What is a Schema?

​Schema Construction

​1. Schema Spec Facet

​2. Schema Facet

​Node Specifications

​Content Expressions

​Example: Document Node

​Example: Paragraph Node

​Example: Image Node

​Mark Specifications

​Example: Bold Mark

​Example: Link Mark

​Node and Mark Attributes

​Defining Attributes

​Adding Attributes to Existing Types

​Splittable Attributes

​Schema Merging

​Merge Process

​Node Groups

​Schema Validation

​Working with Schemas

​Accessing the Schema

​Creating Nodes

​Creating Marks

​Schema Immutability

​Common Schema Patterns

​Minimal Document Schema

​Rich Text Schema

​Nested Content Schema

​Best Practices

​Next Steps

Architecture

Extensions

What is a Schema?

Schema Construction

1. Schema Spec Facet

2. Schema Facet

Node Specifications

Content Expressions

Example: Document Node

Example: Paragraph Node

Example: Image Node

Mark Specifications

Example: Bold Mark

Example: Link Mark

Node and Mark Attributes

Defining Attributes

Adding Attributes to Existing Types

Splittable Attributes

Schema Merging

Merge Process

Node Groups

Schema Validation

Working with Schemas

Accessing the Schema

Creating Nodes

Creating Marks

Schema Immutability

Common Schema Patterns

Minimal Document Schema

Rich Text Schema

Nested Content Schema

Best Practices

Next Steps