Skip To Article

🛠 Throughout the tutorial, whenever you’re supposed to do something you will see a 🛠

Cloning the Repository

The rest of this tutorial requires that you have basic knowledge of using Git and running commands in a terminal/shell (using one of the major operating systems). Although you can author a new MyST feature on any (supported) operating system, we will assume that you are using a typical Linux distribution for simplicity.

🛠 Clone the mystmd repository

$ git clone https://github.com/jupyter-book/mystmd

This will populate a new mystmd directory in the working directory with the current checkout (snapshot) of the MyST source code. This checkout may include new features that have yet to be released to the public, or new bugs that have yet to be identified!

🛠 Change to the mystmd directory

$ cd mystmd

From this point in the tutorial, terminal sessions will show the current working directory before the $ prefix, excluding the path to the mystmd directory itself, for example:

$ echo MyST is cool!
MyST is cool!
$ cd packages
(packages)$ echo MyST is cool!
MyST is cool!

Defining a Role

The core specification for the MyST markup language is defined in the MyST spec. Most features in MyST should, over time, be incorporated into this specification so that consumers of MyST documents (such as myst-parser from the Jupyter Book software stack) agree on the manner in which the content should be parsed and rendered. The process of adding features to the MyST Spec is more formalized, and is described in the MyST Enhancement Proposals. This tutorial does not cover updating the MyST Spec.

We should begin by asking the question “What is a role?” The spec defines roles as:

similar to directives, but they are written entirely in one line.

One such role is the {underline} role, which can be used to format text:

We want to create a new word-count role that injects the total word count into a document. It should accept a format-string that allows us to format the resulting text, i.e.

This is a lengthy document ...

{word-count}`The number of words in this document is {number} words`

should become

This is a lengthy document ...

The number of words in this document is 5 words

Many of the “core” roles in mystmd (including {underline}) are implemented in the myst-roles package. Although a word-count role might not be considered a “core” feature, we will pretend it is for this tutorial. Let’s start by looking at the existing {abbreviation} role in packages/myst-roles/src/abbreviation.ts

We can see that abbreviationRole is annotated with the type RoleSpec. This is the basic type of a role declaration defined by the MyST specification. There are a number of important fields, such as the name, alias, and body.

🛠 Add a new source file[1] word-count.ts in the myst-roles package, and write the following.

packages/myst-roles/src/word-count.ts
import type { RoleSpec, RoleData, GenericNode } from 'myst-common';

export const wordCountRole: RoleSpec = {
  name: 'word-count',
  body: {
    type: String,
    required: true,
  },
  run(data: RoleData): GenericNode[] {
    // TODO!
    return [];
  }
};

This defines a simple role called word-count, whose body (the raw text between the backticks in {word-count}`<BODY>`) is a string.

With an empty run function, this role doesn’t do anything! In order to determine what should our run function should do, we must understand how MyST documents are built. MyST generates a website or article from a MyST Markdown file in roughly three phases, shown in the diagram below.

At the heart of MyST is the AST, defined by the MyST Specification, which serves as a structured representation of a document. Whilst directives, roles, and fragments of Markdown syntax are individually processed to build this AST, transforms are performed on the entire tree, i.e. over the entire document. As computing the word-count requires access to the entire document, it is clear that all of the logic of our new feature will need to be implemented as a transform. Therefore, our role definition will be very simple - generating a simple AST node that our yet-unwritten transform can later replace.

🛠 Modify word-count.ts

packages/myst-roles/src/word-count.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import type { RoleSpec, RoleData, GenericNode } from 'myst-common';

export const wordCountRole: RoleSpec = {
  name: 'word-count',
  body: {
    type: String,
    required: true,
  },
  run(data: RoleData): GenericNode[] {
    return [
        {
          type: 'wordCount',
          value: data.body as string
        }
    ];
  }
};

Our new role is not-yet ready to be used. We next need to tell MyST that it should be included in the main program.

🛠 Import the role in packages/myst-roles/src/index.ts

packages/myst-roles/src/index.ts
import { wordCountRole } from './word-count.js';

Notice the .js extension instead of .ts; it is important, this is just how typescript works.

Next, we must instruct MyST to load our role when parsing a document.

🛠 Add wordCountRole to defaultRoles:

packages/myst-roles/src/index.ts
1
2
3
4
5
6
7
export const defaultRoles = [
  wordCountRole,
  abbreviationRole,
  chemRole,
  citeRole,
  ...
];

Finally, we should export the role from myst-roles, so that other packages may use it (should they need to!).

🛠 Add an export statement from the myst-roles package

packages/myst-roles/src/index.ts
export { wordCountRole } from './word-count.js';

In order to try out our new word-count role, we need to build the myst application.

🛠 Install packages, build and link myst. See Installing & Building MyST Locally for more details.

$ npm install
$ npm run build
$ npm run link

After running these steps, the MyST CLI (as described in Scientific Articles) can be used.

Create a Demo

With our custom role now included in a development build of MyST, we can see it in action. First, we’ll create a playground directory in which we can build a MyST project.

🛠 Create a new demo directory, outside of the mystmd source.

$ mkdir demo
$ cd demo

and add a new file main.md in which we will write the following:

demo/main.md
# Demo

This document is not very long.
{word-count}`It is {number} words long`.

🛠 Initialize a simple MyST project with:

(demo)$ myst init

Once myst init has finished setting up the project, it will ask you whether to run myst start, we don’t want to start the server now as we are going to create a single build.

🛠 Press n to exit myst init[2]

Investigate the AST

Now we can run myst build in the demo/ directory to run MyST.

🛠 Build the AST for the entire project

(demo)$ myst build

MyST outputs the final AST in the _build/site/content directory. Running ls, we can see

(demo)$ ls _build/site/content
main.json

The contents of main.json are the MyST AST. We can pretty-print the file with the jq utility

(demo)$ jq . _build/site/content/main.json

The new wordCount node generated by our wordCount role can clearly be seen:

{
    ...
    {
      "type": "wordCount",
      "value": "{number} words long"
    },
    ...
}

Writing a Transform

As discussed in Defining a Role, the logic of our word count feature needs to be implemented as a transform, so that we can view the entire document. Most transforms in MyST are defined in the myst-transforms package, such as the image alt text transform which generates an image alt text from figure captions. Our transform will need to visit every text-like node and perform a basic word-count.

Let’s make a start. First, we need to implement a function that accepts a MyST AST tree, and modifies it in-place.

🛠 Create a wordCountTransform in a new file packages/myst-transforms/src/word-count.ts

packages/myst-transforms/src/word-count.ts
import type { GenericParent } from 'myst-common';


export function wordCountTransform(tree: GenericParent) {

}

The MyST AST is inspired by (and re-uses parts of) the MDAST specification for a Markdown abstract syntax tree. MDAST, like MyST, implements the unist specification, which has only three node types:

These nodes form the basic building blocks of any abstract syntax tree, and unist defines some utility functions to manipulate trees composed from them.

Given that we want to count meaningful words, we must look at the MyST specification to determine which nodes we need to look at. As MyST AST is a unist AST, and only Literal unist nodes can hold values, we can start by only considering Literal MyST nodes. The MyST specification contains a list of all node types, and it can be seen that there are only a few Literal types, such as Text or HTML.

To begin with, let’s count the words only in Text nodes. To do this, we’ll need to pull out a list of all of the Text nodes from the AST. We can use the unist-util-select package to find all nodes with a particular type.

🛠 Modify the transform to select all the 'text' nodes

packages/myst-transforms/src/word-count.ts
1
2
3
4
5
6
7
import { selectAll } from 'unist-util-select';
import type { GenericParent, GenericNode } from 'myst-common';


export function wordCountTransform(tree: GenericParent) {
    const textNodes = selectAll('text', tree) as GenericNode[];
}

It’s conventional to also define a “plugin” that makes it easy to include this transform in a suite of transformations.

🛠 Export a plugin to invoke the transform

1
2
3
4
5
6
7
8
9
10
11
12
13
import type { Plugin } from 'unified';
import { selectAll } from 'unist-util-select';
import type { GenericParent, GenericNode } from 'myst-common';


export function wordCountTransform(tree: GenericParent) {
    const textNodes = selectAll('text', tree) as GenericNode[];
}

export const wordCountPlugin: Plugin<[], GenericParent, GenericParent> =
  () => (tree) => {
    wordCountTransform(tree);
  };

Now that we have the text nodes, let’s split them by whitespace, and count the total of words.

🛠 Implement the logic to count the words

packages/myst-transforms/src/word-count.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import type { Plugin } from 'unified';
import { selectAll } from 'unist-util-select';
import type { GenericParent, GenericNode } from 'myst-common';


export function wordCountTransform(tree: GenericParent) {
  const textNodes = selectAll('text', tree) as GenericNode[];
  const numWords = textNodes
    // Split by space
    .map(node => (node.value as string).split(" "))
    // Filter punctuation-only words
    .map(words => words.filter(word => word.match(/[^.,:!?]/)))
    // Count words in each `Text` node
    .map(words => words.length)
    // Sum together the counts
    .reduce(
      (total, value) => total + value,
      0
    );
}

export const wordCountPlugin: Plugin<[], GenericParent, GenericParent> =
  () => (tree) => {
    wordCountTransform(tree);
  };

Having computed the total number of words, let’s replace our word-count nodes with text formatted with this value. In the same way that we selected all Text nodes

🛠 Select all nodes with a ‘type’ of wordCount

packages/myst-transforms/src/word-count.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import type { Plugin } from 'unified';
import { selectAll } from 'unist-util-select';
import type { GenericParent, GenericNode } from 'myst-common';


export function wordCountTransform(tree: GenericParent) {
  const textNodes = selectAll('text', tree) as GenericNode[];
  const numWords = textNodes
    // Split by space
    .map(node => (node.value as string).split(" "))
    // Filter punctuation-only words
    .map(words => words.filter(word => word.match(/[^.,:!?]/)))
    // Count words in each `Text` node
    .map(words => words.length)
    // Sum together the counts
    .reduce(
      (total, value) => total + value,
      0
    );

  const countNodes = selectAll('wordCount', tree) as GenericParent[];
}

export const wordCountPlugin: Plugin<[], GenericParent, GenericParent> =
  () => (tree) => {
    wordCountTransform(tree);
  };

Now we can use the value attribute of each wordCount node to transform this into a known node type of a text node.

🛠 Change the node to text and replace {number} with the word count

packages/myst-transforms/src/word-count.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import type { Plugin } from 'unified';
import { selectAll } from 'unist-util-select';
import type { GenericParent, GenericNode } from 'myst-common';


export function wordCountTransform(tree: GenericParent) {
  const textNodes = selectAll('text', tree) as GenericNode[];
  const numWords = textNodes
    // Split by space
    .map(node => (node.value as string).split(" "))
    // Filter punctuation-only words
    .map(words => words.filter(word => word.match(/[^.,:!?]/)))
    // Count words in each `Text` node
    .map(words => words.length)
    // Sum together the counts
    .reduce(
      (total, value) => total + value,
      0
    );

  const countNodes = selectAll('wordCount', tree) as GenericParent[];
  countNodes.forEach(node => {
    // Change the node type to text
    node.type = 'text';
    // Replace the number with the word count
    node.value = (node.value as string).replace('{number}', `${numWords}`);
  });
}

export const wordCountPlugin: Plugin<[], GenericParent, GenericParent> =
  () => (tree) => {
    wordCountTransform(tree);
  };

This pattern, of mutating existing nodes using a transform, is commonly used in the MyST ecosystem. If you want to keep information about the node’s source, for example, to style it differently, you could store a flag on data or kind. If you are introducing a new node into the final rendered document, that will require you to add a new renderer. By staying within the existing node types, all existing renderers will work out of the box.

If we build MyST with npm run build, we’d notice that this transform never runs. Like our earlier word-count role, we need to include the wordCountTransform in the set of MyST transforms that run during building.

🛠 Add a new export line in packages/myst-transforms/src/index.ts

packages/myst-transforms/src/index.ts
export { wordCountTransform, wordCountPlugin } from './word-count.js';

Then we must use this transform in the myst-cli package, which contains much of the myst build logic.

🛠 Import the wordCountPlugin in packages/myst-cli/src/process/mdast.ts

packages/myst-cli/src/process/mdast.ts
1
2
3
4
import {
  ...,
  wordCountPlugin,
} from 'myst-transforms';

Finally, we’ll use this plugin as part of the MyST transformations in the same file

🛠 Add the wordCountPlugin to the unified pipe of transformations

packages/myst-cli/src/process/mdast.ts
1
2
3
4
5
6
export async function transformMdast(...) {
  ...
  const pipe = unified()
    .use(...)
    .use(wordCountPlugin);
}

Having modified all of the source files required to implement our word count feature.

🛠 Run npm run build to build the myst package

Now let’s see what happens over in our demo!

🛠 In your demo directory, run myst start

(demo)$ myst start

This will result in the following page with the word count that excludes it’s own text!

The result of running myst start with support for our new word-count role in our document.

The result of running myst start with support for our new word-count role in our document.

Contributing

The next steps to bring this into being a core feature would be adding documentation and running npm run changeset to add a description of what you have completed. You can then open a pull request, and the developers of MyST will aim to get this into MyST and released so everyone can use it!

A full, unmerged pull request of this feature is available in #1027 to see the end-result.

Thanks for your contributions! 🥳

Footnotes
  1. Source files are files that are added under the src/ directory of a package, e.g. packages/myst-roles/src/abbreviation.ts

  2. If you have already started the server, use Ctrl-C to kill the process.

MyST MarkdownMyST Markdown
Community-driven tools for the future of technical communication and publication, part of Jupyter.