I wanted to check if there is interest to add support for superscript and subscript as an extension to our current Markdown support.
The CommonMark markdown specification does not seem to have a proposal for this. But I think we have two options:
- Using 32^nd^ (for 32nd), and H~2~O for (H2O) as described in by pandoc
- Using HTML tags: 32<sup>nd</sup> and H<sub>2</sub>O.
Do you support one of these options, and if so which one do you prefer?
My vote goes to option 1, as it seems more accessible and is cleaner, and is not HTML which needs to be sanitized.
The HTML tags
<sub> are already supported in Collect.
I forgot about that rogue HTML support... (long story - we sort of agreed this was a mistake and would not be documented - I cannot seem to find the forum post about that).
So my proposal is for official cross-client markdown support for superscript and subscript.
Some history here
I don't think we agreed that it wouldn't be documented, but I don't mind removing those docs for the sake of compatibility (and a peace offering).
Relevant issue at https://github.com/opendatakit/docs/issues/699
Yes, you're right. I think, that was about XLSForm documentation. I've been trying to ban those memories :). Your peace offering is MUCH appreciated. Thank you!
For superscript and subscript, I have a small preference for the HTML option because it's easier to remember and because it's a more common standard. In several markdown variants including Github's, tilde (~) is interpreted as strikethrough. None of the markdown variants I've used have used ^ for superscript or ~ for subscript.
Given that CommonMark doesn't have a way to do this, I don't think it's wise to make up our own because then we are back to the dark old days of making up our own "standard".
The thing is most Markdowns (including CommonMark) allow embedded HTML. There is a very long 2 years worth of discussion at https://talk.commonmark.org/t/make-commonmark-safe-by-default/1265 asking for a removal on HTML/JS and it's not going well for the folks who want to remove HTML.
So it's looking like HTML tags are likely the most standard option. And if you are going to include HTML, you will have to sanitize (https://github.com/punkave/sanitize-html looks like a good choice), and if you are going to sanitize we can probably have some agreed upon list that works across compliant clients...
Thanks @LN and @yanokwa. Interesting discussion link. It's a complicated issue.
I don't know how authoritative Pandoc is. I am less concerned about Github's dealing with ~ because that appears to be an artefact. Officially they support the more common ~~ for strikethrough.
However, I'd be okay with supporting
<sub> if there are no other voices preferring a Mark(ish)Down syntax.
Generally, I think our approach should be to add official support for certain HTML tags only grudgingly when there is no acceptable MarkDown syntax available.
I agree with not adding to Markdown's syntax. The proliferation of flavors is a serious problem with Markdown, which I don't want to be a party to contributing to.
I know a lot of tech writers who use Pandoc, but I don't get the sense that it is anything like a standard.
I don't see any reason not to support embedded HTML. (Not incidentally, this was a part of the original Markdown "specification").
So I think we have a way forward by supporting
<sup>. Any violent opposition from the @TSC?