(anonymous guest) (logged out)

Copyright (C) by the contributors. Some rights reserved, license BY-SA.

Sponsored by the Wiki Symposium and the Nuveon GmbH.

This is version . It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]

Discussion from Talk.PreformattedAndNowiki

Comments about preformatted text#

Preformatted text is important for me, and is turning out to be quite tricky for a variety of reasons.

First, I need to handle inline comment enclosed in curly braces (this is a common notation in set theory, like {A, B}). So if I have four open curlies followed by four closes, how do I handle that? One provisional solution I'm considering is associating the extra close curlies with the inside of the span, so the regex for the closing markup is three closing curlies not followed by a fourth: \}\}\}(?!\}) in Python syntax, for the coders out there. This would also provide an escape for three or more closing curlies inside a span: just follow those with three closing curlies followed by three opening ones, to open a new span.

Second, the 0.3 spec seems to indicate that wiki markup is processed inside the inline preformatted element. That would be a problem, because there's all kinds of stuff that might go in the tag, some of which might look like markup. Block elements clearly do not have their markup processed.

In any case, even if no wiki markup were allowed inside preformatted inline spans, it would be possible to get the bold preformatted effect by reversing the nesting of the markup, putting the bold on the outside and the pre on the inside (as a reminder of how tricky this is, I was not able to get a sample to render correctly in this Wiki engine).

Third, I'm currently using backslash to escape wiki markup (something often done with a <nowiki> tag). This will conflict with the proposed Plugin insertion failed: Image plugin requires the name of an image and cannot be empty.Plugin insertion failed: Image plugin requires the name of an image and cannot be empty. syntax for linebreaks. My current inclination is to remove it entirely, but that leaves me with no good way to indicate nowiki outside monospaced spans. A workable hack is to intersperse empty preformatted spans to break up what would otherwise be interpreted as markup.

-- Raph Levien 31 Dec 2006

For the first of your problems I can see two solutions. If it's only an occassional thing, then you can go with {{{{A,B} }}}. The extra space does look weird, but hopefully doesn't have a meaning in math language. The second solution, if the wiki site you use talks about math often, would be to intorudce an additional $$...$$ markup (not part of the Creole, but Creole is extensible) for math, that would either use Latex or MathML for formatting, or just typeset the mathematical formulas with a distinct font (and of course escape the markup).

Second, obviously no markup should be interpreted inside "nowiki", except the three closing curly braces, and maybe an escape character, if we will have any. Of course, markup defined outside of the "nowiki" span, like lists, emphasis and tables, is still in effect.

Third, we still don't know what markup to use for the line ends. We are open to suggestions. "
" is just one common solution, but I'm sure we can do better than that.

Thank you for your feedback, and please remember that Creole is supposed to be a "common part" of wiki markups (or an alternate input method), not necessarily the only markup available -- there are so many different use cases, that this woudn't be possible.

Btw, your idea with the closing curly braces is interesting -- I don't think it breaks anything, and it lets you write {{{{{{...}}}}}} without the need for escape characters or spaces -- I doubt anybody would need a "nowiki" span immediatelly followed by a curly brace.

-- RadomirDopieralski, 2006-01-01

One idea would be to extend Creole to have a heredoc style syntax. That way could vary the terminating token as needed.

-- JaredWilliams, 2006-01-01

Isn't it a little complicated, and unfortunately, impossible to parse with regexps (ok, it's possible with perl's and python's regexp, by use of the python's (?P=...) for example). What's more, do we need/want to have Creole "complete", in the sense that every (valid) output is possible to achieve?

-- RadomirDopieralski, 2006-01-01

Its just a back reference isn't it? Something like /{{{(\w+)\n(.*)\n\1/ }}}. Obviously most Wiki's probably wouldn't need such a feature, therefore probably shouldn't be in Creole. But also offers a syntax for marking a range of text (if generalised further) to assign an identifier to a section of a page for transclusion. -- JaredWilliams, 2006-01-01 The extra spaces in Radomir's [CheatSheet] suggest the need for extra escaping logic for three closing braces on a line by itself within a preformatted block. I suggest that one backslash is removed from any line within a preformatted block consisting entirely of one or more backslashes followed by three closing braces. This could be optional in the ExtensibleByOmission sense, to make life easier for implementors of simple Creole engines. In any case the chance of this pattern being an actual collision is near zero. Note also that this pattern is the same as the JSPWiki hosting this page, except that JSPWiki uses tilde instead of backslash. (I don't have a strong feeling about this choice, and could live with tilde) I was curious how the main page PreformattedAndNowiki implemented the escaping, and found zero-width spaces between the closing braces. This is a rather extreme form of InvisibleMarkup. I have implemented the suggestion of associating extra curlies to the inside of an inline preformatted block in [Barghest], and am happy with the way it turned out. -- RaphLevien 2007-01-02 Escape logic is not required for preformatted text if (a) the Creole parser reference counts the number of opening and closing braces inside the block; and (b) the block contains a balance of opening and closing braces. Of course, a) might be difficult to implement in some languages, and b) might not be possible in all cases. Anyhow, it works well for me in Key/C and generally the text I work with has a balance of opening and closing braces, so it's no problem. I'm hoping to avoid escape sequences in my wiki implementation but perhaps at some point they will become necessary. As a point of interest, I can copy the whole [Creole 0.1 Test Cases] text into my preformatted block without making any changes. It's kind of fun and certainy a good challange for the parser! For what it's worth, I think the JSPWiki use of tilde for escaping three closing braces is nice and clear. -- MarkWharton, 2007-01-03 Counting is a possibility, and handles the CheatSheet use case fairly handily, but I think it's not as wiki-like and not quite as general (what if you wanted separate blocks to show the beginning and ending sections?) I was going to write that because there are a couple of good ways of doing this, it's a good candidate for ExtensibleByOmission, but if the CheatSheet becomes popular, it really would be a good idea to have it work without alterations in most all Creole parsers. -- RaphLevien 2007-01-02 I'm against special cases in the spec, even if they allow some popular use cases. If we are going to have an escape character, it's best (IMHO) to make it work everywhere the same. Why not solve the problem of three closing braces the same way as in the inline nowiki blocks? If there are two (or more) consecutive lines containing only "" in them, make all but the last one of them a part ofthe pre block, and treat the last one as the ending:

   some example
this doesn't limit the expressive power of Creole -- if someone does desire to put three closing curly braces right after the pre block, he/she can separate them with a blank line (this introduces an explicit end-of-block instead of the implicit one after the <pre>) and have the expected result:
   some example


I don't even think it needs to be mentioned in user cheat sheets -- because that's the expected way it should behave, people will try this first before trying to escape things.

-- RadomirDopieralski, 2007-01-03

Thanks for the suggestions! (I should try thinking about things from a general user point of view ;-) I've implemented a combination of counting the opening curly braces (to monitor the balance) and treating all closing curly braces as terminators. The parser scans backwards to "adjust" whenever any unbalanced closing curly braces are found. It handles most of the weird combinations I put to it with only a few exceptions (like closing curly braces on the first line and opening curly braces on the last line etc.). It's pretty general, however, I've had to add a special case for inline {{{}}} {{{}}} to include }}} inside. I do this by checking for }}} immediately following the first }}}. It seems to work but probably needs more testing before I can be sure.

-- MarkWharton, 2007-01-06

I admire your courage with the balancing of braces -- I'd never try it, for I view it as a path to sure madness :). Consider and ASCII-art wiki... Yet it works in similar way to how *we* read text, so maybe it will be "intuitive" after all? Your work is very interesting.

Another possibility is to use more greedy matching -- always use the last "}}}" as the delimiter. Of course, there is a problem when there are multiple blocks:

 preformatted text
 normal text
 preformatted text
would get parsed as a preformatted text in whole. But we can amend the situation by watching both "{{{" and "}}}". Basically, the preformatted block continues until we hit the last "}}}" before "{{{" or end of document.

This is a loose thought, I haven't tested it. I don't think it gives large improvement over the "multiple consecutive closing braces" rule.

-- Radomir Dopieralski, 2007-01-06

Thanks again Radomir! I find the idea of properly handling unbalanced nesting rather interesting (is it even possible I begin to wonder?). It might turn out to be a can of worms, but anyhow, I think it's worth pursuing a little more.

I've experimented a little with your idea and it basically works except for an exceptional circumstance with inline preformatted text interfering with block preformatted text! I've put together some rough preformatted notes in an attempt to understand the problem a little better. The page is on my experimental site because it needs to use my parser. It might be too much for a general discussion here, but nevertheless, I intend to draw some conclusions which might be useful in the end. Hopefully there's some magic in there somewhere.

-- MarkWharton, 2007-01-07

I recommend against trying to do anything complex. One way or another, any quoting strategy is going to either leave some results impossible or require escaping. The strategies based on counting or searching for patterns have the disadvantage of being non-local - a change to line 75 can affect the rendering of a pre block on line 15. That guarantees user confusion.

If you want to leave three close curlies on a line by themselves impossible, the CheatSheet might be possible using indent:

 Preformatted markup

But those spaces will interfere with cut and paste. Therefore, I'm in favor of escaping syntax, either tilde or backslash. It will only be used by expert users, but then I think the results will be maximally useful to the people who use the cheat sheet.

-- RaphLevien, 2007-01-07

Well, the brace-counting is definitely not going into Creole as a requirement -- at least I would fight this with all my might -- but of course it's an interesting subject and a great addition "for the ambitious students".

On the other hand, the rule for including the last "}" seems to be easy to implement in most parsing techniques I can imagine, and greatly reduces the "what the heck is going on?" factor, taking care of the most common use case. This needs testing in practice -- that's why I'd like to have it in draft for at least one version, so that we can receive complaints from implementors if it's hard for them.

Now, do we have any data on what kinds of markup is used for escaping?

-- RadomirDopieralski, 2007-01-08

Thanks for your comments Raph! It's great to look at all these different ideas.

Perhaps escaping makes the most sense, in the end. At the same time, it's good to explore the possibilities and find interesting solutions. The methods which I'm testing for myself are working well in most cases and don't require escaping, but at the same time, I admit it's not a general enough solution.

Let's continue with the escape character discussion in the next round of the spec. If we can find a nice alternative then let's talk about that too.

P.S. Radomir, exactly, I'm not proposing anything I'm doing for the spec. It's just an interesting technical discussion at this point.

-- MarkWharton, 2007-01-08

I personally like the idea of using a tilde as an escape character. Who would oppose that?

--ChuckSmith, 2007 Jan 10

Tilde "~"#

  • Used on:
    • JSPWiki
  • Has semantic connotation wit "not", which might suggest the meaning when seen
  • Pretty rare in normal text (but not unused)
  • Not available on many Keyboards
  • Hard to type under Microsoft Windows (Could you be more precise? I have no trouble typing it using MS Windows with a US or German keyboard)
  • Traditionally used in text as an alternative to hyphen (eg. in number ranges) or meaning "roughly".
  • The exact way of escaping is not obvous (eg. compare ~ }}} vs ~ }~ }~ })
  • Used in programming languages for operators -- pasted code would need escaping, or could change its meaning.
  • Conflicts:

Backslash "\"#

  • Extremely widespread for escaping in:
    • Programming languages
    • File names
    • Regular expressions and search queries
  • Traditional, obvious way of use (always escapes a single character)
  • Never used in normal text
  • Used in Microsoft Windows for paths
  • Commonly used for forced line break (usually doubled)
  • Used in programming languages for escaping -- pasted code would need double escaping, or could look bad.
  • A bit confusing considering the double backslash proposed in Creole 0.4.

Caret "^"#

  • Never used in normal text
  • Rarely used even in programming languages
  • Sometimes used to mean "not"
  • Not on all Keyboards
  • No precedent (?)
  • Conflicts with superscript markup on all wikis that have it

Percent "%"#

  • Commonly used for an alternate escape character, when backslash is not available
    • for printf parameters
    • for time and date formatting
    • for colored text
  • Established way of using (escapes whole chunks)
  • Sometimes used in normal text to denote, uh, percents
  • Used in programming languages for operators, escaping and variable marks -- code needs escaping or it will change meaning
  • Conflicts:
    • TODO

In my recommendation, the escape character (whether backslash or tilde) is used only to prevent }}} on a line by itself from being interpreted as closing preformatted markup. The regex "one or more escape chars followed by exactly three closing braces" is rare to the point that it doesn't make much sense to worry about collisions.

If we're looking for an escape character to be used in more contexts (as backslash is in Crossmark), then we also need to have that discussion. But I think it's a separate issue from closing of pre blocks, because it generally wouldn't be interpreted inside pre anyway.

-- Raph Levien 2007-01-10

Updated advantages and disadvantages. Removed mention of Terms, because that list is only informational stating which characters are used in Creole, not for stating which characters should be used in Creole.

-- ChuckSmith, 2007-Jan-11

Add new attachment

Only authorized users are allowed to upload new attachments.

« This particular version was published on 12-Jan-2007 16:31 by RadomirDopieralski.