Skip to main content
The schema defines the structure of documents in ProseKit. It specifies what types of nodes and marks exist, their attributes, and how they can be composed.

What is a Schema?

A schema is a ProseMirror Schema object that defines:
  • Nodes: The types of content blocks (paragraph, heading, image, etc.)
  • Marks: Formatting that can be applied to text (bold, italic, link, etc.)
  • Content Rules: What content is allowed where
  • Attributes: Data attached to nodes and marks
  • Serialization: How to convert to/from HTML/DOM
In ProseKit, schemas are built automatically from extensions.

Schema Construction

Schemas are constructed through the facet system:

1. Schema Spec Facet

Extensions contribute to the schema through schemaSpecFacet (packages/core/src/facets/schema-spec.ts):
import type { SchemaSpec } from '@prosekit/pm/model'

type SchemaSpec = {
  nodes?: OrderedMap<NodeSpec> | { [name: string]: NodeSpec }
  marks?: OrderedMap<MarkSpec> | { [name: string]: MarkSpec }
  topNode?: string
}

2. Schema Facet

The schema facet creates the actual schema (packages/core/src/facets/schema.ts):
import { Schema, type SchemaSpec } from '@prosekit/pm/model'
import { defineFacet, type Facet } from './facet.ts'
import { rootFacet, type RootPayload } from './root.ts'

export const schemaFacet: Facet<SchemaSpec, RootPayload> = defineFacet({
  reducer: (specs) => {
    assert(specs.length <= 1)
    const spec = specs[0]
    const schema = spec ? new Schema(spec) : null
    return { schema }
  },
  parent: rootFacet,
  singleton: true,
})
The schema facet is a singleton - only one schema can exist per editor. This is because changing the schema requires recreating the entire editor.

Node Specifications

Nodes are defined using defineNodeSpec() (packages/core/src/extensions/node-spec.ts):
export interface NodeSpecOptions<
  NodeName extends string = string,
  Attrs extends AnyAttrs = AnyAttrs,
> extends NodeSpec {
  /** The name of the node type */
  name: NodeName

  /** Whether this is the top-level node type */
  topNode?: boolean

  /** The attributes that nodes of this type get */
  attrs?: {
    [key in keyof Attrs]: AttrSpec<Attrs[key]>
  }

  // From ProseMirror NodeSpec:
  content?: string       // Content expression
  marks?: string         // Allowed marks
  group?: string         // Node groups
  inline?: boolean       // Inline vs block
  atom?: boolean         // Atomic node
  selectable?: boolean   // Can be selected
  draggable?: boolean    // Can be dragged
  code?: boolean         // Preserves whitespace
  defining?: boolean     // Structure boundary
  isolating?: boolean    // Isolates content
  parseDOM?: ParseRule[] // HTML parsing rules
  toDOM?: (node: Node) => DOMOutputSpec // DOM rendering
  // ... and more
}

Content Expressions

Content expressions define what content is allowed inside a node:
// Basic expressions
'text*'           // Zero or more text nodes
'inline*'         // Zero or more inline nodes
'block+'          // One or more block nodes
'paragraph+'      // One or more paragraphs

// Sequences
'heading paragraph+' // Heading followed by one or more paragraphs

// Alternatives
'(paragraph | heading)+' // One or more paragraphs or headings

// Groups
'block*'          // Any node in the 'block' group

// Attributes
'paragraph{id}'   // Paragraph with id attribute
Content expressions use a regular expression-like syntax. See ProseMirror’s content expression guide for full details.

Example: Document Node

import { defineNodeSpec } from '@prosekit/core'

export function defineDoc() {
  return defineNodeSpec({
    name: 'doc',
    topNode: true,          // This is the document root
    content: 'block+',      // Contains one or more block nodes
  })
}

Example: Paragraph Node

import { defineNodeSpec } from '@prosekit/core'

export function defineParagraph() {
  return defineNodeSpec({
    name: 'paragraph',
    content: 'inline*',     // Contains inline content
    group: 'block',         // Is a block node
    parseDOM: [{ tag: 'p' }],
    toDOM() {
      return ['p', 0]       // Render as <p> tag, 0 is content slot
    },
  })
}

Example: Image Node

import { defineNodeSpec } from '@prosekit/core'

export function defineImage() {
  return defineNodeSpec({
    name: 'image',
    inline: true,           // Inline node
    atom: true,             // Cannot be directly edited
    group: 'inline',        // Is an inline node
    draggable: true,        // Can be dragged
    attrs: {
      src: { default: '' },
      alt: { default: '' },
      title: { default: null },
    },
    parseDOM: [
      {
        tag: 'img[src]',
        getAttrs(dom) {
          if (typeof dom === 'string') return false
          return {
            src: dom.getAttribute('src'),
            alt: dom.getAttribute('alt'),
            title: dom.getAttribute('title'),
          }
        },
      },
    ],
    toDOM(node) {
      return ['img', node.attrs]
    },
  })
}

Mark Specifications

Marks are defined using defineMarkSpec() (packages/core/src/extensions/mark-spec.ts):
export interface MarkSpecOptions<
  MarkName extends string = string,
  Attrs extends AnyAttrs = AnyAttrs,
> extends MarkSpec {
  /** The name of the mark type */
  name: MarkName

  /** The attributes that marks of this type get */
  attrs?: { [K in keyof Attrs]: AttrSpec<Attrs[K]> }

  // From ProseMirror MarkSpec:
  inclusive?: boolean       // Mark extends when typing
  excludes?: string         // Marks that can't coexist
  group?: string            // Mark groups
  spanning?: boolean        // Can span nodes
  parseDOM?: ParseRule[]    // HTML parsing rules
  toDOM?: (mark: Mark, inline: boolean) => DOMOutputSpec
  // ... and more
}

Example: Bold Mark

import { defineMarkSpec } from '@prosekit/core'

export function defineBold() {
  return defineMarkSpec({
    name: 'bold',
    parseDOM: [
      { tag: 'strong' },
      { tag: 'b' },
      {
        style: 'font-weight',
        getAttrs: (value) => /^(bold(er)?|[5-9]\d{2,})$/.test(value as string) && null,
      },
    ],
    toDOM() {
      return ['strong', 0]  // Render as <strong> tag
    },
  })
}
import { defineMarkSpec } from '@prosekit/core'

export function defineLink() {
  return defineMarkSpec({
    name: 'link',
    attrs: {
      href: { default: '' },
      title: { default: null },
    },
    inclusive: false,       // Doesn't extend when typing
    parseDOM: [
      {
        tag: 'a[href]',
        getAttrs(dom) {
          if (typeof dom === 'string') return false
          return {
            href: dom.getAttribute('href'),
            title: dom.getAttribute('title'),
          }
        },
      },
    ],
    toDOM(mark) {
      return ['a', mark.attrs, 0]
    },
  })
}

Node and Mark Attributes

Attributes store data on nodes and marks:

Defining Attributes

import { defineNodeSpec } from '@prosekit/core'

const heading = defineNodeSpec({
  name: 'heading',
  content: 'inline*',
  group: 'block',
  attrs: {
    level: {
      default: 1,
      validate: (value) => value >= 1 && value <= 6,
    },
  },
  parseDOM: [
    { tag: 'h1', attrs: { level: 1 } },
    { tag: 'h2', attrs: { level: 2 } },
    { tag: 'h3', attrs: { level: 3 } },
    { tag: 'h4', attrs: { level: 4 } },
    { tag: 'h5', attrs: { level: 5 } },
    { tag: 'h6', attrs: { level: 6 } },
  ],
  toDOM(node) {
    return ['h' + node.attrs.level, 0]
  },
})

Adding Attributes to Existing Types

Use defineNodeAttr() or defineMarkAttr() to add attributes:
import { defineNodeAttr } from '@prosekit/core'

const paragraphId = defineNodeAttr({
  type: 'paragraph',
  attr: 'id',
  default: null,
  parseDOM: (node) => node.getAttribute('id'),
  toDOM: (value) => value ? ['id', value] : null,
})
Attribute extensions must be defined after the base node/mark. They modify existing schema entries rather than creating new ones.

Splittable Attributes

Node attributes can be marked as splittable to persist when splitting:
import { defineNodeAttr } from '@prosekit/core'

const paragraphAlign = defineNodeAttr({
  type: 'paragraph',
  attr: 'align',
  default: 'left',
  splittable: true,  // Preserves alignment when splitting paragraph
  parseDOM: (node) => node.style.textAlign || 'left',
  toDOM: (value) => ['style', `text-align: ${value}`],
})

Schema Merging

When multiple extensions define the same node or mark, their specs are merged:
const base = defineNodeSpec({
  name: 'paragraph',
  content: 'inline*',
  group: 'block',
})

const withId = defineNodeAttr({
  type: 'paragraph',
  attr: 'id',
  default: null,
})

const withAlign = defineNodeAttr({
  type: 'paragraph',
  attr: 'align',
  default: 'left',
})

// Resulting schema has paragraph with both id and align attributes
const extension = union(base, withId, withAlign)

Merge Process

From packages/core/src/extensions/node-spec.ts:142-197:
  1. Collect Specs: All node/mark specs are collected
  2. Merge Specs: Specs with the same name are merged using mergeSpecs()
  3. Add Attributes: Attribute extensions modify existing specs
  4. Wrap DOM Methods: toDOM and parseDOM are wrapped to handle new attributes
  5. Build Schema: Final merged specs create the schema
The order matters: later extensions override earlier ones. Use the priority system to control precedence.

Node Groups

Node groups allow content expressions to reference multiple node types:
// Define nodes in groups
const paragraph = defineNodeSpec({
  name: 'paragraph',
  group: 'block',
  // ...
})

const heading = defineNodeSpec({
  name: 'heading',
  group: 'block',
  // ...
})

// Reference the group in content expressions
const doc = defineNodeSpec({
  name: 'doc',
  topNode: true,
  content: 'block+',  // Allows any node in 'block' group
})
Common groups:
  • block: Block-level nodes (paragraph, heading, etc.)
  • inline: Inline nodes (text, image, etc.)
  • list: List nodes (bullet list, ordered list, etc.)

Schema Validation

ProseMirror validates documents against the schema:
const doc = editor.schema.node('doc', null, [
  editor.schema.node('paragraph', null, [
    editor.schema.text('Hello'),
  ]),
])

// Validate the document
doc.check() // Throws if invalid
Validation checks:
  • Content matches content expressions
  • Attributes have valid values
  • Marks are allowed on the nodes they’re applied to
  • Structure is internally consistent
Invalid documents can cause editor crashes. Always validate user-provided content before setting it.

Working with Schemas

Accessing the Schema

const editor = createEditor({ extension })

// Access schema from editor
const schema = editor.schema

// Access node types
const paragraphType = schema.nodes.paragraph
const headingType = schema.nodes.heading

// Access mark types
const boldType = schema.marks.bold
const linkType = schema.marks.link

Creating Nodes

const schema = editor.schema

// Create a paragraph
const para = schema.node('paragraph', null, [
  schema.text('Hello world'),
])

// Create a heading with attributes
const heading = schema.node('heading', { level: 1 }, [
  schema.text('Title'),
])

// Or use node actions
const para2 = editor.nodes.paragraph('Hello world')
const heading2 = editor.nodes.heading({ level: 1 }, 'Title')

Creating Marks

const schema = editor.schema

// Create a bold mark
const bold = schema.mark('bold')

// Create a link mark with attributes
const link = schema.mark('link', { href: 'https://example.com' })

// Apply to text
const text = schema.text('Hello', [bold, link])

// Or use mark actions
const [text2] = editor.marks.bold('Hello')

Schema Immutability

Schemas are immutable after editor creation. You cannot:
  • Add new node or mark types
  • Remove existing types
  • Change content expressions
  • Modify attributes
To change the schema, you must create a new editor:
// ❌ Bad - can't add nodes dynamically
const editor = createEditor({ extension: basicExtension })
editor.use(defineTable()) // Error! Schema cannot be changed

// ✅ Good - include all nodes upfront
const fullExtension = union(
  basicExtension,
  defineTable(),
)
const editor = createEditor({ extension: fullExtension })

Common Schema Patterns

Minimal Document Schema

import { union, defineDoc, defineText, defineParagraph } from '@prosekit/core'

const extension = union(
  defineDoc(),         // Root node
  defineText(),        // Text node
  defineParagraph(),   // At least one block type
)

Rich Text Schema

import { union } from '@prosekit/core'

const extension = union(
  // Structure
  defineDoc(),
  defineText(),
  defineParagraph(),
  defineHeading(),
  defineBlockquote(),
  defineCodeBlock(),
  defineHorizontalRule(),
  
  // Lists
  defineBulletList(),
  defineOrderedList(),
  defineListItem(),
  
  // Inline
  defineImage(),
  defineHardBreak(),
  
  // Marks
  defineBold(),
  defineItalic(),
  defineCode(),
  defineLink(),
  defineStrike(),
  defineUnderline(),
)

Nested Content Schema

const table = defineNodeSpec({
  name: 'table',
  content: 'tableRow+',
  group: 'block',
  // ...
})

const tableRow = defineNodeSpec({
  name: 'tableRow',
  content: 'tableCell+',
  // ...
})

const tableCell = defineNodeSpec({
  name: 'tableCell',
  content: 'block+',  // Cells contain block content
  // ...
})

Best Practices

  1. Always define doc and text: These are required for any schema
  2. Use content expressions carefully: Invalid expressions can break the editor
  3. Group related nodes: Use groups to simplify content expressions
  4. Validate attributes: Use the validate option to prevent invalid values
  5. Test schema changes: Schema changes can break existing documents
  6. Document your schema: Clearly document custom nodes and their purpose

Next Steps

Architecture

Understand ProseKit’s overall architecture

Extensions

Learn more about the extension system