Common Wiki Parsing Techniques

Braindamaged substitution#

This is the simplest parsing method, used by some scratch wikis and "write a wiki in minimal number of lines of code" engines. It's pretty easy to uderstand and implement in any language that supports regular expressions, but is very limited and not guaranteed to produce valid html. Still, it works most of the time

The idea is to apply a number of regexp subtitutions:

s/&/&amp;/
s/</&lt;/
s/>/&gt;/
s/^/<p>
s/$/</p>
s/\n(\s*\n)+/</p><p>/
s/\n----/<hr>/
s/\n==(.*)=*/<\/p><h1>\1<\/h1><p>/
s/\n===(.*)=*/<\/p><h2>\1<\/h2><p>/
s/\n====(.*)=*/<\/p><h3>\1<\/h3><p>/
s/\n\s*\*\s+(.*)/<\/p><ul><li>\1<\/li><\/ul><p>/
s/\n\s*\*\*\s+(.*)/<\/p><ul><li><ul><li>\1<\/li><\/ul><\/li><\/ul><p>/
s/\n\s*\*\*\*\s+(.*)/<\/p><ul><li><ul><li><ul><li>\1<\/li><\/ul><\/li><\/ul><\/li><\/ul><p>/
s/\n\{\{\{(([^}]|\}[^}]|\}\}[^}])*)\n\}\}\}/</p><pre>\1</pre><p>/
s/\{\{\{(([^}]|\}[^}]|\}\}[^}])*)\}\}\}/<code>\1</code>/
s/\/\/([^\/]|\/[^\/])*\/\//<em>\1<\/em>/
s/\*\*([^\*]|\*[^\*])*\*\*/<strong>\1<\/strong>/
s/\[\[(\w+)\]\]/<a href="wiki?\1">\1</a>/
s/\[\[(\w+)\|(.*)\]\]/<a href="wiki?\1">\2</a>/
s/\[\[(http:[^\]|]*)\]\]/<a href="\1">\1</a>/
s/\[\[(http:[^|]*)\|(.*)\]\]/<a href="\1">\2</a>/
s/<p></p>//
s/</li></ul><ul><li>//
s/</li></ul><ul><li>//
s/</li></ul><ul><li>//

You get the point. This technique has a number of drawbacks:

no mixing of bold italic
no markup inside lists, headings, etc.
no mixing of list types, unless a large number of regexps is introduced to accomodate them (like /\n\*\#\*/ etc.)
limited number of headings and list levels
impossible (or extremely hard) to accomodate special cases like the \n --> <br> in paragraphs only, or the list/bold special case

Add new attachment

Only authorized users are allowed to upload new attachments.

« This particular version was published on 15-Dez-2006 17:53 by RadomirDopieralski.