HTML#
After editing the title page and trying to start a link to and finding it disappearing I wondered whether creole was intended to replace or overlay html coding. At least from this wiki (if it is creole 1.0 compliant?) it appears that at least some html coding like is included within the wiki and creole is an addition rather than replacement.
The question really boils down to whether the document should have all html tags removed before/after the creole parser or not at all!!!!
How should: This is%20a test be displayed?
If a document includes: <a href="www.example.com>This is an example</a> should it be parsed to display character for character as is (creating a link), or should it be htmlspecialcharactered to escape all HTML or should it be displayed as "This is an example".
Couldn't find a page explicitly saying what the a standard says on this so started this one!!
In pure Creole, HTML elements and entities aren't supposed to be preserved. An engine which would do so could probably still be qualified to be "Creole-compliant" if this means anything. Wikicreole.org wasn't Creole-compliant last time I checked, a few weeks ago; you shouldn't try to duplicate its current behavior.
Note: this should eventually be moved to a talk page.
-- YvesPiguet, 2008-Apr-28
Yves, the real question I'm asking is what processing ought to happen on the text input by the user WHEN that text includes HTML which will affect the output displayed and/or affect the whole creole page, if e.g. a <table> element is included. Not explicitly stating the pre-processing environment required by the creole specification is a bit like selling venomous snakes in a supermarket expecting the customer to know that venomous snakes can kill.
I take it the specification would say "All html special characters should be escaped prior to parsing by creole", in which case any special characters in the Creole specification should refer to the escaped characters and not to the original raw characters!!!!
-- Isonomia, 2008-Apr-28
The translator must care about the output format: it should escape less-than, ampersand and double-quote in HTML and XML it produces, or backslash, percent and a few other characters for TeX output, or backslash and brace for RTF output, or parenthesis for Postscript output, etc. But some engines might prefer to pass HTML constructs as is to permit the author to use features not supported natively in Creole. In that case, filtering unsafe HTML constructs might be wise.
-- YvesPiguet, 2008-Apr-29
Yves, the specification isn't very helpful when it comes to deciding what needs to be filtered and what does not. E.g. when text is included as a hyperlink, but isn't a full URL, should it be escaped or should it be urlencoded?
-- Isonomia
I'm implementing the Creole spec for Ruby and came across the same issue.
It makes sense to me that by default, HTML is escaped; however, strictly no HTML is not pragmatic. I'd like an "allow HTML" block. Using triple braces denotes a "nowiki" block and perhaps we could use something similar (as it feels like a similar idea) like triple brackets "(((" to denote an "html" block.
((( <b>bold</b> works here ))) but <b>bold</b> does not work here but (((<b>works here</b>)))
Note that it can work "block" or "inline". I can't think of any situation where an accidental triple bracketing would occur. The only situations where this might happen at all is in code, but my guess is this would be triple braced {{{}}} for code anyways.
-- SunnyHirai, 2008-Jun-11
One way to incorporate HTML into an implementation could be through a macro:
<<html>> <b>bold</b> works here <</html>> but <b>bold</b> does not work here but <<html>><b>works here</b><</html>>
This is the direction I've been going with Creoparser.py. I've made a little tutorial about its macro support here.
-- StephenDay, 2008-Jun-11