Talk.Nyctergatis

I'm getting this:

Unknown option --htmlbody Usage: /home/www/e6920da4aa34854a1e7ec7f172bb9ab4/web/cgi-bin/creole [options] Filter Creole stdin and renders it to another format. --body naked body without header and footer --creole Creole output --help this help message --html HTML output (default) --latex LaTeX output --rtf RTF output --test test input (stdin ignored) --text plain text output

-- Radomir Dopieralski, 2007-Mar-06

Oops, sorry. I'd forgotten to upload one of the files I'd modified. That should be fixed now.

Thanks for your feedback!

-- YvesPiguet, 2007-Mar-06

Tilde doesn't escape pipes in tables. Also, putting the tilde before closing "=" characters of a title only escapes one of them. Escaping the pipe in a link disables the whole link (the [[ and ]] and url are still consumed though).

-- Radomir Dopieralski, 2007-Mar-06

Tilde-pipe in tables: |abc~|def produces <table><tr><td>abc|def</td></tr></table> as it should. Do you have a counter-example?
Tilde-closing "=" in titles: the tilde escapes one character, not the whole markup. Remaining "=" are consumed as the end-title markup (the parser doesn't care if the number isn't correct)
Tilde-pipe in link: it's what I wanted, even if it isn't what I endorsed or documented. I'm not sure it's wise either. In my parser, all Creole markup is ignored in links, including tilde. I have to check if pipes are valid in URLs. Considering that links aren't always URLs, it's probably better to recognize tildes as escape characters also there.

Thanks,

-- YvesPiguet, 2007-Mar-06

I'm sorry about the pipe in tables -- indeed, I cannot replicate this now. I must have left a space between the tilde and the space.

Great work!

By the way, what parsing technique do you use? How many passes? Do you create a document tree or generate the output immediately?

-- Radomir Dopieralski, 2007-Mar-06

Thanks!

It's a parser written in C which performs one pass and generates outputs immediately. Here is a sketch of its main loop:

set state to "between par"
while not finished
{
  read next token (single char, or markup taking context into account)
  switch state
  {
    case ...
      switch token
      {
        case char
          if start and/or end of element, write corresponding fragment
          write char, encoding it if necessary
          change state if necessary
        case some markup token
          if start and/or end of element, write corresponding fragment
          change state if necessary
        ...
      }
    ...
  }
}
write end of element corresponding to current state, if any

Styles are pushed in a stack and popped in such a way to always produce matching pairs in output. I've chosen C to be able to embed it easily into different projects, some of them running on platforms with very tight resources, such as small embedded systems or PDA.

As you must have guessed with the error message above, for tests, I've compiled it as a stand-alone command-line app and I run it from a simple CGI script, written in

sigh- sh.

-- YvesPiguet, 2007-Mar-06

Hmm... Maybe I should try to roll my own state machine too? The build-in regexp parser is faster in Python, though, even when I do three pasess -- at least on such short input as wiki pages.

-- Radomir Dopieralski, 2007-Mar-06

Do you plan to keep the source closed or would you publish your code? Looking at the code of my Regexp based parser I think it could be better to use a state machine. In the beginning I planned to use one, but I must admit that I failed. My code got a bit complicated and finally I decided just to do it with RegExp. But regular expressions have limitations, so a state machine would definitely be better.

I also have an idea right now: Assuming that the state machine solves all our parsing problems (your implementation seems to be one of the best Creole parsing implementations), and the code is easy understandable: Why not implement it for all Wiki engines? The state machine could be documented in a language independent format (e,g UML). Your C implementation would be the working example implementation. Then it could be reimplemented in Perl Code, Python Code, Java Code and so on. The Creole markup would not only have its grammar, but also its documented way of parsing it. So instead of wasting time as every implementor struggles with its own implementation, everyone could work on the same parser. The more I think about it, the more I like this.

Of course you don't have to publish your code, if you don't want to. But even in this case we should focus on building the one Creole parser that works, is documented and can be implemented for all Wiki engines with reasoable effort. I'm not sure whether this approach works as good as I currently "dream" about it, but I had this idea right now and wanted to publish it.

-- Steffen Schramm, 2007-Mar-16

Add new attachment

Only authorized users are allowed to upload new attachments.

« This particular version was published on 18-Mär-2007 17:30 by SteffenSchramm.