Markdown parser, done right. 100% CommonMark support, extensions, syntax plugins & high speed
?
. I can't say right now why it's missed.
?
is bad idea, it's not used. But you can modify it for your needs.
I've written this test, but even after upgrading linkify-it it still fails:
should not terminate on slash underscore
.
https://www.riotinto.com/documents/_Iron%20Ore/Port_Handbook_Final_-_July_2016(2).pdf
.
<p><a href="https://www.riotinto.com/documents/_Iron%20Ore/Port_Handbook_Final_-_July_2016(2).pdf">https://www.riotinto.com/documents/_Iron%20Ore/Port_Handbook_Final_-_July_2016(2).pdf</a></p>
.
Any pointers to where in the library I should start looking into this?
Does anyone know of a quick markdown pre-parser that'd split a large document into chunks which can be rendered separately on their own (for perf reasons). I.e., something that's not as naive as splitting on newlines because that'd result in lists/codeBlocks being split midway. Could markdown-it be repurposed to do this?
The idea is to lazy-render in chunks and render the chunks to DOM continually in separate frames. But also because of using a mark.js
lib which takes hundreds of ms to highlight a large DOM tree, so the idea is to split the markdown document into chunks, and then render and highlight each, separately.
block: true
, level: 0
(if self-closing, else level: -1
) or sth like that. Are there gonna be issues with this approach? If I pass same env object to all the split token arrays, there shouldn't be a problem?
{ level: 0, nesting: -1 }
(nesting 0
pretty much happens only for hr
so let's disregard it). WTBS, this approach works only if the document isn't one (or a few) huge level-0 elem, such as a list or codeBlock - in which case we'd need to split those big blocks up. Anyway, I ended up not doing this at all, so I'm not aware if this approach has any unintended consequences. Thanks @puzrin for your time.
strong
and em
marks in different places of word:p
). I'm sure there would be cases where a block image wouldn't work, such as in tables, on inside links --- so perhaps force block-level image only if it's valid?
image
output block-level tokens at all?
yea, I ended up doing just that, and implemented two major cases where the image is separated from a paragraph by single newline ([ image, softbreak, ...inline-content ]
and [ ...inline-content, softbreak, image ]
). Doing this makes it non-CM-compliant, but we weren't CM-compliant to begin with (using softbreaks).
For anyone interested, here's a partial implem:
function createToken ( type = "inline", tag = "", nesting = 0, level = 0, children ) {
const token = new md.core.State.prototype.Token( type, tag, nesting );
token.level = level;
if ( children ) token.children = children;
return token;
}
let skipNext = false;
state.tokens = state.tokens.reduce((acc, token, idx, tokens) => {
if ( skipNext ) {
skipNext = false;
return acc;
}
if (
token.type === "inline" &&
tokens[idx - 1].type === "paragraph_open" &&
tokens[idx + 1].type === "paragraph_close" &&
token.children.length > 2
) {
// case: [ image, softbreak, ...inline-content ]
if (
token.children[0].type === "image" &&
token.children[1].type === "softbreak"
) {
skipNext = true;
const imageToken = token.children[0];
token.children = token.children.slice(2);
return acc.slice(0, -1).concat([
createToken("paragraph_open", "p", 1, token.level),
createToken("inline", "", 0, 1, [ imageToken ]),
createToken("paragraph_close", "p", -1, token.level),
acc[acc.length - 1],
token,
tokens[idx + 1],
]);
// case: [ ...inline-content, softbreak, image ]
} else if (
token.children[token.children.length - 1].type === "image" &&
token.children[token.children.length - 2].type === "softbreak"
) {
skipNext = true;
const imageToken = token.children[token.children.length - 1];
token.children = token.children.slice(0, -2);
return acc.concat([
token,
tokens[idx + 1],
createToken("paragraph_open", "p", 1, token.level),
createToken("inline", "", 0, 1, [ imageToken ]),
createToken("paragraph_close", "p", -1, token.level)
]);
}
}
acc.push(token);
return acc;
}, [] );
^Note:
and put this in aside element
I came across the ParseDown PHP package on Packagist. It did converted everything to spaces and the alignment was proper.
I have been using the marked NPM package for parsing markdown. I found the issue that they are converting the tab character to 4 spaces every time, irrespective of the characters included in the current tab block. This causes disruption in the alignment of parsed markdown.
Because of it, I had to write a custom code to convert tabs to appropriate number of spaces before parsing with marked NPM.