For Creole 0.4 I'd like to bring out the issue of spaces after the bullets. The current (0.3) draft and previous specs have this ugly special case:

{{{
About unordered lists and bold: a line starting with ** (including optional whitespace before and afterwards), immediately following an unordered list element a line above, will be treated as a nested unordered list element. Otherwise it will be treated as the beginning of bold text. Also note that bold and/or italics cannot span lines in  a list.
}}}

I think it's ugly and complicates the parser needlessly. Also, many wikis already have very similar list markup, just without this special case -- making them accept both Creole and native markup at the same time would require some sort of a hack (I can't even imagine it curently).

One possible way of getting rid of that special case and still keeping list markup unambigous with bold markup is //requiring// a space after the bullet.

Now, this is a different case than with space //before// the bullet. There are wiki engines that don't allow space before the bullet, and those that require it -- making it optional is really the only way to make them agree.

On the other hand, no wiki engine I know prohibits the space after the bullet. Some require it.

Moreover, putting a space after most punctuation characters is a tradition, and for many people -- a reflex. I can see nothing unnatural in requiring it -- and it simplifies the parsers and the specs -- making Creole both easier to implement and to teach.

By the way, there is a (pretty ugly) hack to get a bold line even if the above special case is removed (remove the single space):
{{{
 {{{}} }**bold line**
}}}
-- [[RadomirDopieralski]], 2006-12-14

Why not accept both (asterisks and dashes)? And it goes with the unofficial [Goals] {{{Rule of least surprise}}} and some others...

-- [EricChartre], 2006-12-28

Regarding the possible ambiguity of the asterisks, there are none (for the parser anyway) if the specs do not allow for bold text to span multiple lines and that bold text must end at some point with **. Also, I __don't__ think that a user would ever, on purpose, do something like:

{{{
** is this text bold
** or are these just two second-level list items
}}}

meaning 

{{{
<em> is this text bold<br />
</em> or are these just two second-level list items
}}}


However, the parser must do a look-ahead or a two-level parsing...

-- [EricChartre], 2006-12-28

I don't think there is any ambiguity, in the example given above. I believe the asterix signify strong, as it seems illogical to start a sub-list directly.

And the following would be considered list items.
{{{
* List
** SubItem 1
** SubItem 2
}}}

-- [JaredWilliams], 2006-12-30

Yes, the problem is rather with these examples:
{{{
**foo**bar**baz
**one**two
}}}

They could be parsed as:
----
__foo__bar__baz__
__one__two
----
or
----
** foo__bar__baz
** one__two__
----
or
----
** foo__bar__baz
__one__two
----
You can't really decide without infinite (unbound) lookahead -- and that's a great problem if you need to use a ready parsing algorithm or parser framework -- this rules out most of the extensible, plugin-based wiki engines.

You can't just make list or bold the default here -- because there are popular use cases for both:

__Paragraph titles__ are often integrated in the paragraph, like in this example. They are tradidtionally distinguished by making them bold. Italics won't do.

* multilevel lists
** can contain __bold__ fragments

Really, I think that requiring a space after the list bullets is a simple and effective solution. And it also removes the conflict with {{{#pragma}}} and {{{# numbered list}}} for many wiki engines.

-- RadomirDopieralski, 2006-12-30

I have my parser doing this

{{{
**foo**bar**baz
**one**two
}}}
is
{{{<div><p>
<strong>foo</strong>bar<strong>baz</strong>one<strong>two</strong>
</p></div>}}}
But
{{{
*list
**foo**bar**baz
**one**two
}}}
is
{{{<div><ul><li>list<ul>
   <li>foo<strong>bar</strong>baz</li>
   <li>one<strong>two</strong></li>
</ul></li></ul></div>}}}

Which I think covers it.

-- [JaredWilliams], 2006-12-30

How does it looks in the regular expressions? Something like:
{{{
(?=\n\s*\*+\s*.*)\n\s*\*+\s*(.*)
}}}
as an additional rule for the lists? Or did you just write your own algorithm and remember the state between the lines?

-- RadomirDopieralski, 2006-12-30

I don't use regular expressions. 

But here is the algorithm in PHP in anycase, called when the parse has seen {{{\n[*-#]}}}, with $i holding the position of the {{{[*-#]}}}.

{{{
/*
 * $text is the creole text
 * $i is the current position in $text
 * $l is the strlen($text)
 * $doc is the DOM Document
 * $node is the current position in the DOM Document
 * $listMap = array('-' => 'ul', '*' => 'ul', '#' => 'ol');
 */

// Traverse up the DOM tree, from our current position, looking for open lists.
$lists = array();
for($n = $node; $n; $n = $n->parentNode)
	if ($n->nodeName == 'ol' || $n->nodeName == 'ul')
		array_unshift($lists, $n);

// See how many lists we can match... from the $text 
$j = 0;
while (isset($text[$i + $j], $lists[$j], $listMap[$text[$i + $j]])
		&& $listMap[$text[$i + $j]] == $lists[$j]->nodeName)
	++$j;

// See how many list markers left...
$k = strspn($text, '-#*', $i + $j);
switch ($k)
{
	case 1:
		// Going a level deeper..
		if (isset($lists[$j - 1]))
			$node = $lists[$j - 1]->lastChild;
		else if ($j == 0 && $node->nodeName == 'li')
			$node = $node->parentNode;

                // Create UL or UL...
		$node = $this->insertElement($node, $listMap[$text[$i + $j]]);

		$node = $node->appendChild($doc->createElement('li'));
		$i += $j + $k;
		break;

	case 0:
		// List item of the most recent open list.
		$node = $this->insertElement($lists[$j - 1], 'li');
		$i += $j;
		break;

	default:
		// Horizontal line...
		if (strspn($text, '-', $i) >= 4)
		{
			$this->insertElement($node, 'hr');
			$i += $j + $k;
		}
		break;
}
}}}

So **foo**bar**baz doesn't get recognised as a list, as $k = 2, and gets left alone for the inline parser to interpret as <strong>. But *list\n**foo**bar**baz, $k = 1, for both lines.

-- [JaredWilliams], 2006-12-30

----

As I've mentioned in [Raph's 0.4 recommendations], I'm in favor of using trailing whitespace to disambiguate second level list bullets from bold. It's simple and easy to understand. I am not in favor of "magic" algorithms to resolve the ambiguity. I think that non-local algorithms are especially undesirable for bullet lists, because they're often rearranged by cutting and pasting. Requiring trailing whitespace is also NotNew.

From what I can tell in the above tangled discussion, it's also Radomir's favored solution. It seems to me we should be able to reach consensus on this issue fairly easily. Am I off base?

-- [RaphLevien], 2007-01-07

Ideed, I was in favor of that solution, but now after this discussion I think that both can be considered fairly equivalent. I still prefer the added whitespaces slightly -- it has an advantage of being easier to explain, and also fixes the {{{#pragma}}} conflict in many wikis. 

Raph, I'm not really in any way more "core" than you are -- the fact that I dominated RecentChanges recently is a coincidence. On the other hand, I'd really like Creole to be designed in an OpenProcess, while minimising arbitrary decissions and [bikesheding|http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/misc.html#BIKESHED-PAINTING]. That's why I want every possible difference in opinions discussed, even if they seem no-brainers.

So then, I have listed some advantages I perceive in requiring these whitespaces. During the discussion, some alternative solutions have been brought up, most of them pretty much acceptable. Now I just miss one thing: //Is there anything important __against__ the requirement of whitespace after list bullets, other than the desire to have as free and unrestricted format as possible?//

-- RadomirDopieralski, 2007-01-08


I don't see anything against a required space after list bullets __except__ for end-user freedom. Personally, I never put a space after the bullet because that feels like it types faster and allows my thoughts to flow more smoothly. (silly maybe)

I'd like to point out once more that there is no ambiguity between bold and second level bullet items here precisely because of the "ugly special case". Also, it is relatively simple to parse ** so that it is always interpreted as bold __except__ when at the beginning of the line __and__ preceded by a first level bullet item. At least if you are parsing using something like flex, I'm not sure about it when using regular expressions though.

--MartijnVanDerKleijn, 2007-01-11

No problem when parsing using regular expressions, according to my experience.

-- MicheleTomaiuolo, 2007-01-11

Another argument for requiring a space after bullets is that Creole should represent a minimal common set of rules shared by other wiki dialects, which all wiki engines should interpret correctly. Right? So I think requiring a space makes it simpler for engines to handle Creole. The stricter, the better.

If engines relax this constraint, well, it's an extension and it's allowed.

OT now, but this could also stand for titles of subsections, for examples. If we say that trailing equal signs are required, it would make simpler for existing engines to interpret Creole. It would be a single case, and not two.

-- MicheleTomaiuolo, 2007-01-31

I think requiring the space after bullet removes a lot of unnecessary ambiguity.

The note **about unordered lists and bold** in the Creole spec cannot always be applied.

{{{
About unordered lists and bold: a line starting with ** (including optional whitespace 
before and afterwards), immediately following an unordered list element a line above, 
will be treated as a nested unordered list element. Otherwise it will be treated as the 
beginning of bold text. Also note that bold and/or italics cannot span lines in a list. 
}}}

Considering that note, and without a space after //each// bullet in the sample below, the following could be misinterpreted:

{{{
**Schedule**

* **Start Date:** 01 Jul 2006
* **End Date:** 31 Dec 2006
* **Status:** Complete
}}}

**Schedule**

* **Start Date:** 01 Jul 2006
* **End Date:** 31 Dec 2006
* **Status:** Complete

-- [MarkWharton], 2007-02-01


I don't like the proposal. Again like with linebreaks it is a proposal from a viewpoint of programmers enarmored by the simplicity of their code, forgetting about the users - for gods sake, [MakeTheMachineWorkHarder] !. 

-- ChristophSauer, 2007-Feb-01

I strongly agree with Christoph.  I use the TracWiki quite often and it requires a space before the bullet, and even after messing that up several times, I still sometimes forget to add the space and I am a professional wiki researcher specializing in wiki markup!  I can't imagine the problem being different for spaces after a bullet.  Users will forget.  Then they'll forget again.  Then again.  And each time they'll be frustrated, because they tried to use wiki markup and it didn't do what they expected.  Then, they'll complain that wiki markup is stupid, and rightfully so, because imho requiring a space after a bullet is stupid.

Yes there are possible problems with ambiguity of unordered lists and bold.  But, I would say these cases happen about 0,1% of the time, and no one should primarily develop a system to specifically account for obscure edge cases.  Also, I don't see many people here arguing from the user's point of view, but from the programmer's point of view.  A user doesn't care at all about regular expressions or how beautiful the code is.  A user doesn't even care what a wiki is.  They just want to be able to collaborate as easily as possible and that is what we are trying to help them accomplish.

-- ChuckSmith, 2007-Feb-01

I think a space should be required only after two stars. There must be some way to distinguish between
unordered sublists and bold; making it too complicated (depending on the context, for instance) or
unspecified will make the life of the programmer //and the user// more complicated, imo. I'm against the
systematic use of spaces after any other number or combination of bullets.

A simple rule will also be important when we discuss list items spanning multiple lines in the source
code.

-- [[YvesPiguet]], 2007-Feb-01

Ok, I removed that "makes parsers easier to write" advantage, especially that it's only true for some values of "easier" and for some languages and approaches.

The proposal still stands.

-- RadomirDopieralski, 2007-02-01

I agree with Christoph about making the machine work harder, but ambiguity is ambiguity. With the current spec it is possible to produce sequences of text which cannot be determined to be one way or another. That was the point of my example above. Forgive me if I'm wrong, but I don't believe the linebreaks proposal has been argued on the point of making implementation easier. Very real and valid arguments have been put forward there. Anyhow, getting back to the subject...

The following is not clear and cannot be determined:

{{{
*first level list item 1
**second level list item 1
***first level bold list item 2
***first level bold list item 3
}}}

Is it as described or is it actually first, second, and third level list items?

The following is clear and can be determined:

{{{
* first level list item 1
** second level list item 1
* **first level bold list item 2
* **first level bold list item 3
}}}

The only way I can see how making the machine work harder can deal with the first case it to require closing the bold sequence. But that's a whole other argument...

-- [MarkWharton], 2007-02-02

I think a user will almost immediately see the problem, after previewing or saving, and fix it. I see no reason for requiring whitespace for what I suspect to be a rare piece of markup compared with general unordered list usage. I left in the use of hypen (-) in my parser so 
{{{
-first level list item 1
--second level list item 1
-**first level bold list item 2
-**first level bold list item 3
}}}

isn't ambigous.

-- [JaredWilliams], 2007-02-

We already ruled out hyphen because of different kinds of ambiguity:
{{{
Look at the following numbers:
-1
-2~5
--3
--4
Which ones you think are positive and which are negative?
}}}
and also this rare case (more common when blog-like newlines are used):
{{{
When hyphenating compound words, you put the hyphen on both sides of the line-
-break.
}}}
I think that a single hyphen is just too common in normal language to be used for markup. It's also rarely used in wikis.

Incidentally, requiring white space after the bullet resolves this ambiguity as well :).

As for "user freedom", I don't quite get it. It doesn't restrict your freedom more that a "don't jump out of the window" sign. It's not an assalut on your freedom when you're forbidden to do something you don't want to do anyways. Similar case with "forgetting" to put the space after the bullet. That's also not possible -- it's a muscle reflex. You can't forget how to ride a bike. Of course, you //could// get confused if there were two kinds of bikes, requiring different handling. But the space after bullet is used **everywhere**. Sometimes it's not forced, but it is always allowed. This is a typography tradition, picked up from all the books and magazines and pretty much everythig you read -- just like the space after end-of-sentence period. There are two exceptions I can think of: when using dashes for bullets, some typographers advice to use only very thin space, as to now break the page composition, and of course when you want to be "original" on some kind of a poster -- but then the bullets are usually of weird shapes and different color than the text.

Some examples from the sylabus wiki:
* http://sylabus.wmid.amu.edu.pl/Podstawowe_pojecia_i_narz%C4%99dzia_informatyki?action=raw
* http://sylabus.wmid.amu.edu.pl/Algorytmy_i_struktury_danych?action=raw
* http://sylabus.wmid.amu.edu.pl/Matematyka_dyskretna?action=raw
* http://sylabus.wmid.amu.edu.pl/Podstawy_programowania?action=raw

These are some pages that were made public by their editors, so I can show them. But I've looked at all the pages in the wiki (about 160 now) and I haven't found a single case of no space after the bullet (although the hyphens dominate).

We can include this test in [[TheStudentExperiment]].
-- RadomirDopieralski, 2007-02-02

Why not ignore whitespace at the beginning of the line except when required to separate tokens.  

{{{
*Item 1
**Item 1.1
** Item 1.2
* **Bold Item 2
}}} 

Does not force a required space, unless needed.

-- JaredWilliams, 2007-02-02

I think I've put my idea very bad. Sorry. Actually, I was referring to rules more than engines (I wrote engines but I meant wiki languages).

My point is:

# if Creole requires a space after bullets, both languages requiring a space and those not requiring one are 100% Creole-compatible (in the sense they can interpret Creole, they extend it)
# if Creole doesn't require a space, some wiki languages (those which will expect a space) won't fully understand Creole texts

Allowing two different syntaxes for the same semantic makes Creole (a bit) harder to be adopted. I'm talking in general, here, more than specifically on lists, bullets and spaces. I cannot see this in [[Goal]]s, but I would put it as "the stricter, the better". Please note that I'm not arguing it should be respected in every case, but it should be //one of// the goals, to be balanced against others.

-- MicheleTomaiuolo, 2007-02-02

There is something that [[http://www.raskincenter.org/|Jeff Raskin]] has to say about monotonity:
{{{
[...] Archy counters these problems by eliminating modes, which can be a significant source of confusion and error, and streamlining the decision process through "monotony," that is, giving you only one way to accomplish a task. Modelessness and monotony encourage the formation of useful habits that enable you to work faster and more confidently. When such habits are fully formed, you can perform those tasks without conscious thought, and thus not be distracted from your content and your intentions. This is called achieving automaticity.
}}}

So it's not always "the stricter - the better", but it is "there should be one obvious way of doing something". Incidentally, this is also one of the "guidelines" in Python, a language that scores high in readability and ease of editing existing code. Here's the [[http://www.python.org/dev/peps/pep-0020/|Zen of Python]].

-- RadomirDopieralski, 2007-02-02

Looking at random Wikipedia articles, I have found that about half of them include bullet points that do not start with a space.  Users are not used to having to put a space after bullet points.  The above ambiguities are solved just by requiring a space between the bullet and start of bold.  A triple asterisk at the beginning of a line indicates a third-level list item, clearly not a first-level bold item.  This is the only case where it can come up, and I imagine the first thing users will do in light of such a problem is to add the space.  I think we are going overboard with edge cases in order to make the syntax in some sense "more consistent" instead of going with what most users are used to.

-- ChuckSmith, 2007-Feb-05

I asusme this is also a response to my post at [[Talk.Lists]].

So... An {{{<ul>}}} block (after something else than a list item) **must** start with a single asteriks followed by non-asterisk. If it starts with two astersisks, it's just a normal paragraph starting with bold text. When it starts with three asterisks, it's just a normal paragraph starting with a bold asterisk. And when it starts with four asterisks, they are just deleted and normal paragraph follows. Is that right? If so, I'm going to implement it like this now.

-- RadomirDopieralski, 2007-02-05


An advantage of requiring this space is that notations to vote using lists {{{ *#v }}} will work nicely, since the "v" can't be interpreted as text.  This saves having to overload a character and lets users specify 26 kinds of votes...

''See [[MakeTheMachineWorkHarder]] and [[Talk.ListsReasoning]] for more on this idea.''

-- Anonymous, 2007-02-05

According to WikiMatrix, there are 17 wiki [[Engines Using Asterisks For Lists And Bold]].  How do they resolve the ambiguity problem?

-- ChuckSmith, 2007-Feb-06

Having the space after the bullet solves the ambiguity problem and makes the Creole markup itself more collision free. I proposed another collision type for this, see [Talk.CollisionFree].

-- SteffenSchramm, 2007-02-07

Since removing single newlines in lists could make the ambiguity about bold/list more serious, I've yet another proposal. In fact:

* Most occasional users won't need nested lists at all
* Experienced users will be able to remember the space

What about a compromise? Let's be forgiving for the first level, and require a space for nested lists. No ambiguity, while preserving usability.

I mean:

{{{
*One
** Two
*** Three
}}}

We could say that "a space is required, but implementers are strongly encouraged to be forgiving for the first level".

-- [[Michele Tomaiuolo]], 2007-02-08

I like this idea.

-- ChuckSmith, 2007-Feb-09