Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List items wrapped in <p> tags due to trailing space #114

Open
4 tasks done
scottf3 opened this issue Jul 19, 2022 · 3 comments
Open
4 tasks done

List items wrapped in <p> tags due to trailing space #114

scottf3 opened this issue Jul 19, 2022 · 3 comments
Labels
👍 phase/yes Post is accepted and can be worked on 🙆 yes/confirmed This is confirmed and ready to be worked on

Comments

@scottf3
Copy link

scottf3 commented Jul 19, 2022

Initial checklist

Affected packages and versions

3

Link to runnable example

No response

Steps to reproduce

In chrome's console run:

const mm = await import('https://esm.sh/micromark@3?bundle');
console.log(mm.micromark('List1\n* item1\n* item2\n\n\n\n'));
console.log('------');
console.log(mm.micromark('List1\n* item1\n* item2\n\n\n \n'));

Note the only difference between the two examples is a single space some blank lines away from the list. Those two examples return different html, the latter has the list elements wrapped in <p>

<p>List1</p>
<ul>
<li>item1</li>
<li>item2</li>
</ul>
------
<p>List1</p>
<ul>
<li>
<p>item1</p>
</li>
<li>
<p>item2</p>
</li>
</ul>

Expected behavior

I'm not clear enough on the markdown spec to say which case is actually correct. Certainly other markdown parsers I've tried (though that is not a long list) render it like the first example.

Regardless I'd expect it to be the same between the two. In most markdown editors the trailing space is impossible to see and it can take a long time to track down why some list elements render with increased padding.

Actual behavior

See repro steps. Two examples output visually different HTML whereas I feel they should render the same.

Runtime

No response

Package manager

No response

OS

No response

Build and bundle tools

No response

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Jul 19, 2022
@wooorm
Copy link
Member

wooorm commented Jul 19, 2022

Thanks for raising this issue.
I am working on a port of micromark in another language, which improves a couple of architectural things, likely also including this problem.
It's going to take a bit but I am planning to port those back.
In the meantime, perhaps you can find out how to fix this here now.

@wooorm wooorm added the 🙆 yes/confirmed This is confirmed and ready to be worked on label Jul 19, 2022
@github-actions github-actions bot added 👍 phase/yes Post is accepted and can be worked on and removed 🤞 phase/open Post is being triaged manually labels Jul 19, 2022
@github-actions
Copy link

Hi! This was marked as ready to be worked on! Note that while this is ready to be worked on, nothing is said about priority: it may take a while for this to be solved.

Is this something you can and want to work on?

Team: please use the area/* (to describe the scope of the change), platform/* (if this is related to a specific one), and semver/* and type/* labels to annotate this. If this is first-timers friendly, add good first issue and if this could use help, add help wanted.

@sisp
Copy link

sisp commented Nov 24, 2022

I think there are two possible places where this problem could be fixed:

  1. micromark/dev/lib/compile.js::prepareList: When compiling to basic HTML, the events related to a list are preprocessed such that the items of a list without blank lines between items are compiled as <li>...</li> and items of a list with blank lines in at least one item are compiled as <li><p>...</p></li> (tight vs. loose lists). This preprocessing logic could be extended to ignore trailing blank lines which are part of the tokenized list.
  2. micromark-core-commonmark/dev/lib/list.js::tokenizeListContinuation: The list continuation tokenizer could be modified to not only check whether the current line is a blank line but rather keep checking subsequent lines until a non-blank line is found that terminates the list and abort then, so that the blank lines (including non-empty blank lines, which are treated as list item prefixes + line endings) are excluded in the last list item.

I tend to prefer option 2, but I'm not sure whether that's easily done with a document parser as lines are parsed independently and content is parsed implicitly. Any comments and/or suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👍 phase/yes Post is accepted and can be worked on 🙆 yes/confirmed This is confirmed and ready to be worked on
Development

No branches or pull requests

3 participants