You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The fact that we have two separate types of HTML (internal and external) that we can serialize documents to is unconventional, kind of confusing, difficult to test, and prone to bugs.
Ideal Solution
Ideally, we would only have one HTML format that would both be cross-compatible with other editors, while still containing all the information needed to recreate the original block 1:1 when the HTML is parsed back into the editor.
Currently, we can get the cross-compatible HTML by calling blocksToHTMLLossy, and parse it (or any other HTML) into blocks with tryParseHTMLToBlocks. These functions are currently missing some functionality, but they're a good place to start. Let's specify what we would ideally like these functions to do:
blocksToHTMLLossy
Serializes blocks to HTML. The HTML should retain all information about the blocks, such that it's possible to recreate them 1:1 (i.e. would actually no longer be lossy). The HTML should also be structured to ensure compatibility with other editors as much as possible, so blocks, inline content, and styles should be serialized to be as generic as possible, and not use BlockNote-specific elements/attributes.
tryParseHTMLToBlocks
Parses HTML into blocks. Should be as broad as possible, since different places will serialize their HTML differently, and we want to cover as many cases as possible to retain as much information from the HTML into blocks as possible. For example, we could infer the width prop for an image prop from a width HTML attribute, or a width inline style, and should account for both of these cases. While this function is not expected to always retain all information from the HTML, using the output of blocksToHTMLLossy should recreate the original block 1:1.
Current Limitations
These functions aren't able to convert blocks to HTML and back while retaining all information in all scenarios. This is mainly due to some limitations with blocksToHTMLLossy:
Because we want the cross-compatible HTML to be as stripped down and semantically correct, we strip away all the wrapper divs that are used to structure blocks. As a result, blocks that are not list items lose their nesting, as you can't (or at least shouldn't) nest elements like ps inside each other. List items are still ok to nest as this is allowed in HTML with ul/ol and li tags. There is probably a way to work around this, like a data-parent-id attribute. It goes against keeping things as generic as possible, but this is only relevant for BlockNote and would only be read by BlockNote so it should be ok.
There is some ambiguity when serializing certain props and styles to HTML. For example, bold text can just be within a strong element, but what about colored text? You can put the text in a span, but there's not a definitive way to represent the color that you're sure all other editors are able to read. It probably makes most sense to use inline styles in this case, but then is it better to put color: red" or the actual color hex code? Currently, we use data-* attributes as this is how the props are rendered in the DOM, but this is pretty BlockNote-specific and other editors most likely won't read them. There are other cases, like the textAlignment prop, where this is also an issue.
Additionally, tryParseHTMLToBlocks currently doesn't handle certain fairly obvious cases, such as inferring text color and alignment from inline styles. There are probably more less obvious cases, but this is dependent on how different editors & websites serialize their HTML so it's best to just check individual ones.
Additional Notes
In terms of code organization, HTML serialization/export should probably work the same way as PDF/DOCX/ODT export, as this is a common API and out serialization code is pretty messy at the moment.
The text was updated successfully, but these errors were encountered:
The fact that we have two separate types of HTML (internal and external) that we can serialize documents to is unconventional, kind of confusing, difficult to test, and prone to bugs.
With the solutions proposed; wouldn't there still be separate types of "html"? i.e.: is this (the quoted intro text) really the problem we're solving?
static rendering
One other thing that I'm missing is the scenario where people want to render "static pages" that look 1:1 as they appear in the editor. The current solution for this is people use the BlockNote stylesheet directly and an HTML version of the blocks (blocksToFullHTML) that includes the div structure for nested blocks.
With the solutions proposed; wouldn't there still be separate types of "html"?
I don't think there should be more than one type of HTML. The output of blocksToFullHTML should be removed, and blocksToHTMLLossy should be the main output (and made lossless with a corresponding parse).
This would simplify things by just operating on a single HTML representation, assuming that we can output something that we losslessly can convert back into the original block note blocks.
editor.tryParseHTML(editor.blocksToHTML()) == editor.blocks
the HTML format should be considered "normal", currently named "external".
Problem
The fact that we have two separate types of HTML (internal and external) that we can serialize documents to is unconventional, kind of confusing, difficult to test, and prone to bugs.
Ideal Solution
Ideally, we would only have one HTML format that would both be cross-compatible with other editors, while still containing all the information needed to recreate the original block 1:1 when the HTML is parsed back into the editor.
Currently, we can get the cross-compatible HTML by calling
blocksToHTMLLossy
, and parse it (or any other HTML) into blocks withtryParseHTMLToBlocks
. These functions are currently missing some functionality, but they're a good place to start. Let's specify what we would ideally like these functions to do:blocksToHTMLLossy
Serializes blocks to HTML. The HTML should retain all information about the blocks, such that it's possible to recreate them 1:1 (i.e. would actually no longer be lossy). The HTML should also be structured to ensure compatibility with other editors as much as possible, so blocks, inline content, and styles should be serialized to be as generic as possible, and not use BlockNote-specific elements/attributes.
tryParseHTMLToBlocks
Parses HTML into blocks. Should be as broad as possible, since different places will serialize their HTML differently, and we want to cover as many cases as possible to retain as much information from the HTML into blocks as possible. For example, we could infer the
width
prop for an image prop from awidth
HTML attribute, or awidth
inline style, and should account for both of these cases. While this function is not expected to always retain all information from the HTML, using the output ofblocksToHTMLLossy
should recreate the original block 1:1.Current Limitations
These functions aren't able to convert blocks to HTML and back while retaining all information in all scenarios. This is mainly due to some limitations with
blocksToHTMLLossy
:div
s that are used to structure blocks. As a result, blocks that are not list items lose their nesting, as you can't (or at least shouldn't) nest elements likep
s inside each other. List items are still ok to nest as this is allowed in HTML withul
/ol
andli
tags. There is probably a way to work around this, like adata-parent-id
attribute. It goes against keeping things as generic as possible, but this is only relevant for BlockNote and would only be read by BlockNote so it should be ok.strong
element, but what about colored text? You can put the text in aspan
, but there's not a definitive way to represent the color that you're sure all other editors are able to read. It probably makes most sense to use inline styles in this case, but then is it better to putcolor: red"
or the actual color hex code? Currently, we usedata-*
attributes as this is how the props are rendered in the DOM, but this is pretty BlockNote-specific and other editors most likely won't read them. There are other cases, like thetextAlignment
prop, where this is also an issue.Additionally,
tryParseHTMLToBlocks
currently doesn't handle certain fairly obvious cases, such as inferring text color and alignment from inline styles. There are probably more less obvious cases, but this is dependent on how different editors & websites serialize their HTML so it's best to just check individual ones.Additional Notes
In terms of code organization, HTML serialization/export should probably work the same way as PDF/DOCX/ODT export, as this is a common API and out serialization code is pretty messy at the moment.
The text was updated successfully, but these errors were encountered: