Commonmarker
Ruby wrapper for Rust’s comrak crate.
It passes all of the CommonMark test suite, and is therefore spec-complete. It also includes extensions to the CommonMark spec as documented in the GitHub Flavored Markdown spec, such as support for tables, strikethroughs, and autolinking.
> [!NOTE]
> By default, several extensions not in any spec have been enabled, for the sake of end user convenience when generating HTML.
>
> For more information on the available options and extensions, see the documentation below.
Installation
Add this line to your application’s Gemfile:
gem ‘commonmarker’
And then execute:
$ bundle
Or install it yourself as:
$ gem install commonmarker
Usage
This gem expects to receive UTF-8 strings. Ensure your strings are the right encoding before passing them into Commonmarker
.
Converting to HTML
Call to_html
on a string to convert it to HTML:
require 'commonmarker' Commonmarker.to_html('"Hi *there*"', options: { parse: { smart: true } }) # => <p>“Hi <em>there</em>”</p>\n
(The second argument is optional–see below for more information.)
Generating a document
You can also parse a string to receive a :document
node. You can then print that node to HTML, iterate over the children, and do other fun node stuff. For example:
require 'commonmarker' doc = Commonmarker.parse("*Hello* world", options: { parse: { smart: true } }) puts(doc.to_html) # => <p><em>Hello</em> world</p>\n doc.walk do |node| puts node.type # => [:document, :paragraph, :emph, :text, :text] end
(The second argument is optional–see below for more information.)
When it comes to modifying the document, you can perform the following operations:
insert_before
insert_after
prepend_child
append_child
delete
You can also get the source position of a node by calling source_position
:
doc = Commonmarker.parse("*Hello* world") puts doc.first_child.first_child.source_position # => {:start_line=>1, :start_column=>1, :end_line=>1, :end_column=>7}
You can also modify the following attributes:
url
title
header_level
list_type
list_start
list_tight
fence_info
Example: Walking the AST
You can use walk
or each
to iterate over nodes:
walk
will iterate on a node and recursively iterate on a node’s children.each
will iterate on a node’s direct children, but no further.
require 'commonmarker' # parse some string doc = Commonmarker.parse("# The site\n\n [GitHub](https://www.github.com)") # Walk tree and print out URLs for links doc.walk do |node| if node.type == :link printf("URL = %s\n", node.url) end end # => URL = https://www.github.com # Transform links to regular text doc.walk do |node| if node.type == :link node.insert_before(node.first_child) node.delete end end # => <h1><a href="\%22#the-site\%22"></a>The site</h1>\n<p>GitHub</p>\n
Example: Converting a document back into raw CommonMark
You can use to_commonmark
on a node to render it as raw text:
require 'commonmarker' # parse some string doc = Commonmarker.parse("# The site\n\n [GitHub](https://www.github.com)") # Transform links to regular text doc.walk do |node| if node.type == :link node.insert_before(node.first_child) node.delete end end doc.to_commonmark # => # The site\n\nGitHub\n
Options and plugins
Options
Commonmarker accepts the same parse, render, and extensions options that comrak does, as a hash dictionary with symbol keys:
Commonmarker.to_html('"Hi *there*"', options:{ parse: { smart: true }, render: { hardbreaks: false} })
Note that there is a distinction in comrak for “parse” options and “render” options, which are represented in the tables below. As well, if you wish to disable any-non boolean option, pass in nil
.
Parse options
Name | Description | Default |
---|---|---|
smart |
Punctuation (quotes, full-stops and hyphens) are converted into ‘smart’ punctuation. | false |
default_info_string |
The default info string for fenced code blocks. | "" |
relaxed_tasklist_matching |
Enables relaxing of the tasklist extension matching, allowing any non-space to be used for the “checked” state instead of only x and X . |
false |
relaxed_autolinks |
Enable relaxing of the autolink extension parsing, allowing links to be recognized when in brackets, as well as permitting any url scheme. | false |
Render options
Name | Description | Default |
---|---|---|
hardbreaks |
Soft line breaks translate into hard line breaks. | true |
github_pre_lang |
GitHub-style <pre lang="xyz"> is used for fenced code blocks with info tags. |
true |
full_info_string |
Gives info string data after a space in a data-meta attribute on code blocks. |
false |
width |
The wrap column when outputting CommonMark. | 80 |
unsafe |
Allow rendering of raw HTML and potentially dangerous links. | false |
escape |
Escape raw HTML instead of clobbering it. | false |
sourcepos |
Include source position attribute in HTML and XML output. | false |
escaped_char_spans |
Wrap escaped characters in span tags. | true |
ignore_setext |
Ignores setext-style headings. | false |
ignore_empty_links |
Ignores empty links, leaving the Markdown text in place. | false |
gfm_quirks |
Outputs HTML with GFM-style quirks; namely, not nesting <strong> inlines. |
false |
prefer_fenced |
Always output fenced code blocks, even where an indented one could be used. | false |
tasklist_classes |
Add CSS classes to the HTML output of the tasklist extension | false |
As well, there are several extensions which you can toggle in the same manner:
Commonmarker.to_html('"Hi *there*"', options: { extension: { footnotes: true, description_lists: true }, render: { hardbreaks: false } })
Extension options
Name | Description | Default |
---|---|---|
strikethrough |
Enables the strikethrough extension from the GFM spec. | true |
tagfilter |
Enables the tagfilter extension from the GFM spec. | true |
table |
Enables the table extension from the GFM spec. | true |
autolink |
Enables the autolink extension from the GFM spec. | true |
tasklist |
Enables the task list extension from the GFM spec. | true |
superscript |
Enables the superscript Comrak extension. | false |
header_ids |
Enables the header IDs Comrak extension. from the GFM spec. | "" |
footnotes |
Enables the footnotes extension per cmark-gfm . |
false |
description_lists |
Enables the description lists extension. | false |
front_matter_delimiter |
Enables the front matter extension. | "" |
multiline_block_quotes |
Enables the multiline block quotes extension. | false |
math_dollars , math_code |
Enables the math extension. | false |
shortcodes |
Enables the shortcodes extension. | true |
wikilinks_title_before_pipe |
Enables the wikilinks extension, placing the title before the dividing pipe. | false |
wikilinks_title_after_pipe |
Enables the shortcodes extension, placing the title after the dividing pipe. | false |
underline |
Enables the underline extension. | false |
spoiler |
Enables the spoiler extension. | false |
greentext |
Enables the greentext extension. | false |
subscript |
Enables the subscript extension. | false |
alerts |
Enables the alerts extension. | false |
For more information on these options, see the comrak documentation.
Plugins
In addition to the possibilities provided by generic CommonMark rendering, Commonmarker also supports plugins as a means of
providing further niceties.
Syntax Highlighter Plugin
The library comes with a set of pre-existing themes for highlighting code:
"base16-ocean.dark"
"base16-eighties.dark"
"base16-mocha.dark"
"base16-ocean.light"
"InspiredGitHub"
"Solarized (dark)"
"Solarized (light)"
code = <<~CODE ```ruby def hello puts "hello" end ``` CODE # pass in a theme name from a pre-existing set puts Commonmarker.to_html(code, plugins: { syntax_highlighter: { theme: "InspiredGitHub" } }) # <pre lang="ruby"><code> # <span>def </span><span>hello # </span><span>puts </span><span>"hello" # </span><span>end # </span> # </code></pre>
By default, the plugin uses the "base16-ocean.dark"
theme to syntax highlight code.
To disable this plugin, set the value to nil
:
code = <<~CODE ```ruby def hello puts "hello" end ``` CODE Commonmarker.to_html(code, plugins: { syntax_highlighter: nil }) # <pre lang="ruby"><code>def hello # puts "hello" # end # </code></pre>
To output CSS classes instead of style
attributes, set the theme
key to ""
:
code = <<~CODE ```ruby def hello puts "hello" end CODE Commonmarker.to_html(code, plugins: { syntax_highlighter: { theme: "" } }) # <pre class="syntax-highlighting"><code><span class="source ruby"><span class="meta function ruby"><span class="keyword control def ruby">def</span></span><span class="meta function ruby"> # <span class="entity name function ruby">hello</span></span> # <span class="support function builtin ruby">puts</span> <span class="string quoted double ruby"><span class="punctuation definition string begin ruby">"</span>hello<span class="punctuation definition string end ruby">"</span></span> # <span class="keyword control ruby">end</span>\n</span></code></pre>
To use a custom theme, you can provide a path
to a directory containing .tmtheme
files to load:
Commonmarker.to_html(code, plugins: { syntax_highlighter: { theme: "Monokai", path: "./themes" } })
Output formats
Commonmarker can currently only generate output in one format: HTML.
HTML
puts Commonmarker.to_html('*Hello* world!') # <p><em>Hello</em> world!</p>
Developing locally
After cloning the repo:
script/bootstrap bundle exec rake compile
If there were no errors, you’re done! Otherwise, make sure to follow the comrak dependency instructions.
Benchmarks
❯ bundle exec rake benchmark input size = 11064832 bytes ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23] Warming up -------------------------------------- Markly.render_html 1.000 i/100ms Markly::Node#to_html 1.000 i/100ms Commonmarker.to_html 1.000 i/100ms Commonmarker::Node.to_html 1.000 i/100ms Kramdown::Document#to_html 1.000 i/100ms Calculating ------------------------------------- Markly.render_html 15.606 (±25.6%) i/s - 71.000 in 5.047132s Markly::Node#to_html 15.692 (±25.5%) i/s - 72.000 in 5.095810s Commonmarker.to_html 4.482 (± 0.0%) i/s - 23.000 in 5.137680s Commonmarker::Node.to_html 5.092 (±19.6%) i/s - 25.000 in 5.072220s Kramdown::Document#to_html 0.379 (± 0.0%) i/s - 2.000 in 5.277770s Comparison: Markly::Node#to_html: 15.7 i/s Markly.render_html: 15.6 i/s - same-ish: difference falls within error Commonmarker::Node.to_html: 5.1 i/s - 3.08x slower Commonmarker.to_html: 4.5 i/s - 3.50x slower Kramdown::Document#to_html: 0.4 i/s - 41.40x slower