Moved from Talk.Linebreaks
Why linebreaks are evil? #
I've been meaning to pick this subject for a long time, but I don't want to present my arguments in wrong way, or to miss any important argument. So I was preparing for this carefully. This is going to be a rant against converting newlines to <br>, as you can easily guess. Ok, lets begin.
Target audience #
The justification of the current handling of newlines I've heard goes something like this: "this is intuitive, people coming to wikis from MS Word or blogging software will expect it".
So new wiki users will be happy to have this feature in Creole, as it means less new things for them, right?
In the meantime, people who already use wikis, will stumble on it and find it awkward at least as much as me. And it's not just some time until they get used to it -- because it's not a "standard" they can get used to. 99% of wikis that don't use Creole handle newlines as spaces. Many web forums, message boards, less advanced blogging software handle newlines like that too -- simply because that's how HTML does it, and because it's easy to translate such text directly into HTML. Practically all proffessional or half-professional typesetting languages treat single newlines the same as space -- from HTML, through PostScript and Rich Text Format, up to LaTeX and TeX. Single newline has no meaning other than simple space. And, byt the way, the same goes for handling multiple spaces as a single one (although RTF is different in this regard).
This means, that while new users are happily typing their text the way they think it should work, the experienced users need to maintain this "split personality", remembering where they can use enter, and where they can't, because it will produce invalid layout (mind you, there are literally 2 or 3 cases when a newline is actually considered correct and needed in typesetting).
Now, the oldies will eventually die off, we need to look with hope at the new generation of blogg^H^H^H^H^Hwikizens. But even they, as they will discover more advanced software, will have to maintain the "shizm" between how the newlines are handled. And it's in an especially sensitive, "transition" time, when a small obstacle like that can make them give up and forever stay in the newbie world of MS Word and blogs, locked out from proffessional software and non-Creole wikis.
Usability experts know of a thing called "myth of experienced user" -- a trap that interface designers often fall for, by designing two intrefaces, one for newbies and one for experienced users -- thinking that when a newbie will become experienced, they can switch to the more powerful but also more complicated interface. But this never happens, and users are locked forever in the "newbie" interface, just because they don't become more experienced with the more powerful one by using the less complicated one.
I'm strongly convinced that this kind of "pro-newbie" decissions create a similar chasm -- not between interfaces of a single application, but between different wikis. Creole is not intended to be the one and only wiki markup. We don't want to lock users from other wikis, we want to introduce them gently to them. Thus, Creole should be simple and easy to learn, but it should not be substantially different from other wiki markups.
Technical difficulties #
Everyone who tried to implement a Creole parser knows that this rule, together with some other special cases, increases the parser's complexity considerably, making it much harder to create, debug and extend. Hard to write parser means worse adoption across different wiki engines and more accidental incosistences between implementations. That's on the side of the developers.
Difficulties on the side of users usually involve copying and pasting of text from their e-mails or text editor -- different line-lengths will result in text that looks like this:
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.
That's the effect of mixing of the browser's automatic line wrapping with the user's manual one. The same thing happens when wiki's textarea has different width than the rendered page -- when a wiki site has a large sidebar, for example.
The wiki admins will also have a hard time with this. Text produced by the wiki users will practically lock them in one layout, because changing (especially decreasing) the line length will lead to the effect above.
Even worse, for "flowing" layouts, users that have non-standard font size or window size will experience the effect above.
I don't even want to think about printing wiki pages.
Now, cleaning up this mess automatically is impossible (while it is possible to convert the text the other way around) -- it involves manually browsing trough the text and removing all the spuriouous newlines. I've done it several times. It's debilitating.
Often just looking* for the spurious newlines is hard -- because with the automatically wrapped textarea they are InvisibleMarkup. Another reason to drop them.
Solution #
Remove the rule about newlines from the Creole specification. Add a separate rule for forcing a linebreak when it is really absolutely required -- something like "
" or "##" or "||" at the end of the line, for example.
If we absolutely need to accomodate the blog users, then just don't specify the newline handling in the spec -- allow different wikis to use what is best for their user base. But I'm all for specifying that single newlines shall be ignored and treated as spaces.
-- RadomirDopieralski, 2006-12-11
I found some surveys about how different wiki engines treat newlines on MeatBall: http://www.usemod.com/cgi-bin/mb.pl?ParagraphFormattingRules
Shall we start a discussion on ChangeLinebreakMarkupProposal?
To start moving this fowrward, I'd like to propose "\\\s*$" as the regular expression for forcing line breaks. I'ts simple to type, looks similar to the markup used for marking newlines in many markup langueges, and is consistent with use of "\" for as an escape character.
The requirement for an end of line makes this markup more WYSIWYG, but makes it inadequate to use in table cells. "\\(\s*$|\s)" could be used instead...
-- RadomirDopieralski, 2007-01-01
For me, the difference between a break line and a new paragraph is the space between the two different lines. Am I wrong? Is there a semantic difference?
So, my first thought was to go even further: only one newline is sufficient to change *paragraphs* (<p></p>). Two or more newlines would be treated as one and only one paragraph change. Food for thoughts...
A newline is considered as a new paragraph (marked by a pilcrow) in Microsoft Word. In Word 2007, the spacing between paragraphs is now obvious in the default template (10 points after a paragraph).
Break line #
To break a line (<br /> ), any \s+\\\\\s+} (that's two backslashes (\\) preceded and followed by at least one whitespace) would be sufficient in my opinion. The rendering engine would eliminate theses spaces.
Then, the questions are:
- Why do we *absolutely* need two newlines for paragraphs? (The spec could say: At least one newline is necessary to create a new paragraph.
- Is one backslash sufficient to break a line?
- Do we need to put a newline after or one whitespace is sufficient?
- What about one whitespace before?
- Should two or more subsequent whitespaces be treated as only one space?
Remarks:
- The backslash key is not easy to find on several keyboard layouts.
-- EricChartre, 2007-01-03
Good questions, and I'm glad you asked them, so that we can explicitly substantiate the choices made so far.
I will start with double newline for separating paragraphs. This comes from a long tradition of message boards, newsgroups, various faqs and rfcs, walktroughs and similar text documents -- they are usually preformatted, with single line breaks used merely to fold the lines of text, and an empty line to separate paragraphs. This "tradition" is also very practical in more formal cases, where you can use newlines to emphasize the structure of the raw source code, while keeping the rendering independent -- the "double newline" rule is present in TeX and many markup languages derieved from it, including markup of practically all the wiki engines.
The textareas used commonly to edit wiki markup are usually very simple and pretty hard to use. Until recently, they didn't even have any support for wrapping the text -- if you had any long lines, you had to scroll. Today's browsers are a little better in this regard, but only minimally -- line wrapping is pretty much broken in most of them -- thus providing some means to manually control the flow of text in the source, without impact to the rendering, is still important.
Even if one uses an external editor to edit text -- and there are both special browser plugins or editor scripts allowing to do that -- one's not free from the line-wrapping problems. Many text editors will wrap text by default (which is useful for writing e-mails, for example). You also might need to put on the wiki pages some text taken from e-mails, newsposts, various text files, web pages (some browsers will reformat text when it's copied), or even scanned text. Ignoring sigle newlines allows you to minimize the work with reformatting such text (adding newlines for several paragraphs is always easier than removing them for several hundred lines).
There is also a question of presentation -- with "modern" line-wrapping textareas, a single newline is effectively InvisibleMarkup if it comes near the edge of the editor area. Tracking down and removing such "spurious" newlines is not easy and usually pretty annoying.
As for treating consecutive whitespace as single space, it's similar deal. Plus, we don't want people to use spaces to indent or center text -- it's not only ugly and hard to maintain -- it's also totally unportable to devices/software/sites using different fonts and screen widths. Of course, any space at the end of lines should be ignoed because it forms InvisibleMarkup.
-- RadomirDopieralski, 2007-01-03
As I said before, we should merge Linebreaks and Paragraphs.
Also, see my proposal for linebreaks and paragraphs in Talk.Quoting. It would make the markups generic and usable in normal text, Tables and Lists.
-- EricChartre, 2007-01-10
I don't like the newline rule, but I got used to it. It didn't complicate my parser at all. So yes, I changed my mind.
I think we should have no illusions, here. Wiki is not the be-all and end-all. Blog-like (and forum-like!) treatment of newlines will help wikis to blend in. I think keeping the newline = br rule will mean the least surprise for the greatest number of users.
It's true, LaTeX users will complain. Emacs users will complain. And I am one of them. But most people use MS Word, blogs, and forums, not LaTeX, Emacs, and wikis.
I agree, Alex. What I see on our wikis with endusers is that first of all it feels odd to them to have to use special charachters. As soon as you have explained them this, they use it much to often to make shure they get their linbreaks, even if they should not use it (like in a paragraph). After that a text usually looks like this.
I agree, Alex. What I see on our wikis with endusers is that first of\\ all it feels odd to them to have to use special charachters.\\ As soon as you have explained them this, they use it much to\\ often to make shure they get their linbreaks, even if they should not use it,\\ something like this
Creole should be for endusers - as Radomir mentioned on a page for our children ;-). Talking about Latex etc. is academic. It's good to hear that it does not complicate the parser. I have to admit though that I have no experience with the new rule, so I welcome the discussion on this proposal. I am still not shure if we should really change the spec in 0.4 already. I would like to have more feedback and experience with the new rule. I just know that there is something wrong with forced linebreak syntax. We sould make no premature decissions.
-- Christoph
I gather that we give up an all mixed-mode efforts and only allow the 'edit as Creole' approach then? Because 99% wiki engines uses sane line breaks, and I can see no way of removing the conflict. This alone is, in my opinion, a good reason for a change.
Or are there any suggestions for a solution?
-- RadomirDopieralski, 2007-01-13
The contrived nature of Christoph's example argues against converting newlines to <br> tags. The important thing about <br> markup is that it's rare. There are a few good uses, for example setting of verse, but many of the uses in practice would be better marked up as, say, a bulletless list.
In a 1Mpage fragment of the English wikipedia xml dump (3GB uncompressed data), there are 400k instances of the <br> tag. Of those, about half are inside tables. Another large fraction seem to be for doing relatively sophisticated layout-like things, such as positioning captions for images.
Therefore: very unsophisticated users don't need to know what the linebreak markup is. It's vastly less important than meat-and-potatoes markup like links and emphasis. In the other direction, stray newlines easily find their way into text files, and it's easy to predict that a large number of these will be unwanted. And for specific tasks like prose, it's easy to view source and copy.
-- RaphLevien, 2007-01-13
I am VERY STRONGLY in favour of having newlines be rendered as line breaks.
I have observed over 100 newbie users working on a wiki for about 6h each. Most of them were very confused by the fact that newlines were not rendered as line breaks. This confusion did not go away, even after I explained to them numerous times that newlines were not linebreaks (I didn't explain it in such technical terms of course). They just kept doing the same mistake over and over again. Granted, my subjects were Grade 4 kids, but having served as the "help line" for many a wiki used by adults, I have noticed that non-tecchie adult users are also confused by that. And that they too keep doing the same mistakes over and over again even after I repeatadly tell them that newlines are not linebreaks.
This is the majority of the world out there folks! Let's stop thinking about designing wikis for the tecchie type or the non-tecchie but highly motivated type. These are just the type of the iceberg. We need to design wikis for the common folk.
-- AlainDesilets, 2007-01-16
But we are not designing a wiki or a wiki engine. They are already out there -- working, with hundreds of pages. New ones appear every day, and they are based on the existing ones. We will not change that -- the only result of trying to force this kind of thing would be just not implementing Creole in them. The goal of Creole is to be a common markup for most wikis -- not a way to "fix fundametal design flaws" of wikis.
The best that could be done is not specifying this in the spec at all -- but then we are dodging the problem, and leaving a large hole that leads to even more incompatibilities and confusion for users later on.
I'd really like to see a suggestion of an approach that would allow to have both blog-like line breaks *and* MixedMode Implementation in the majority of wiki engines. Creole has in it a number of decissions and work solely dedicated to removing collissions -- so that MixedMode Implementation is possible. We have rejected or modified a number of things. It would be a waste to dump the MixedMode now because of that. Any ideas?
By the way, are there any reports about users that fumble on blog-like linebreaks? I know I do. I don't have any hard data, but I'd guess it's 50-50 for newbie users who never saw neither a wiki nor the Microsoft Word.
-- RadomirDopieralski, 2007-01-16
Just for the record, Ward Cunningham said at the WikiSym Creole Workshop that when he first invented wikis in 1995, browsers didn't support adding line breaks to a textarea field. From that precedence, wikis run the way they do today with regards to line breaks.
However, having said that, I have to say that I agree with Radomir. In Drupal, text areas are HTML filtered (with automatic line breaks). I can't remember how many times I've been very annoyed by wanting to paste in information from another source and having to try to remove all the line breaks. Also, in one Drupal system, I installed the TinyMCE plugin and it completely destroyed all the line breaks. I realize this is a design flaw of TinyMCE, but still...
In any case, it's really infuriating. So, both ways have their disadvantages, and I strongly believe line breaks in wikis should follow the traditional wiki pattern.
-- ChuckSmith, 2007-Jan-17
I don't actually like the new proposal, I think the original Creole "treat line breaks as line breaks" rule was very good. You could type a text file and have a pretty fair idea of how it would look in the wiki. I don't consider line breaks to be an invisible form of markup... you can easily see line breaks when they're meant to be there (i.e. word wrapping is pretty obvious). Perhaps there are some cases where this is not true, but generally speaking I believe it to be so.
How text is copied and pasted should be a function of the wiki engine. Messy issues with process should not find their way into the Creole spec. Is it too much for a wiki engine to provide an option to save "cleaned" text? What happens when you continually copy and paste the "broken" text which people are talking about... it'll be a complete mess in the end and you'll probably go and clean it up by hand anyhow...
-- MarkWharton, 2007-01-18
That's the point Mark -- it's extremely easy to add linebreaks automatically -- actually most text editors and word processors do it today, some even try to do proper hyphenation. But it's totally impossible to remove spurious line breaks from the text automatically, because without mind reading or (less efficient) understanding the text, there is no way to know which line breaks are meaningful and which are just a result of wrapping long lines. And no, people can't be taught to not hit enter when the cursor gets near the edge of their editing area.
Of course, we could design a markup language that requires the wiki engines to use monospaced font, 80-character wide text areas and display all the characters typed, including spaces and line breaks. You don't need any other markup but links then, actually, as lists, headings, tables, etc. can be easily made using spacing, and maybe even some special unicode characters like · or •. And the pages could be then served as pdf files.
Such a "markup language" (not really) would be extremely intuitive for new users -- it's practically 1:1 WYSIWYG. Things look as good as you make it. If you want the text to be formatted nicely, you just need to spend an hour or two formatting it. You need to change something? No problem, just go through all your pages and change it -- very intuitive. Some users insert additional spaces and line breaks reflectively? Well, that will teach them to be careful what they type.
But for some reason I have a feeling that such Creole would be adopted in, maybe, one wiki engine and two or three blog engines and cms-es. And that it woudn't be really loved by copywriters. The whole idea behind a markup language is that you don't have to care about irrellevant details like line breaks, or spacing, or font family, or heading alignment, or line wrapping, or font size, or colors. You just type the copy, and the software takes care of the rest for you. That's how it works in wikis, at least. You want to go and try to redefine what a wiki is? Go ahead, wish you look. Nobody will look at Creole then.
-- RadomirDopieralski, 2007-01-18
I understand and appreciate most of what you're saying Radomir, but I don't understand one thing... "there is no way to know which line breaks are meaningful and which are just a result of wrapping long lines"... How do "wrapping long lines" turn into linebreaks? Where and how does that happen? I'm not aware of any text editors which insert wrapping line breaks when a block of text is copied. It's a visual thing right? I must be missing something here... let me re-read the above comments and think about it a little more carefully. If you have some insight to share it would be great, thanks!
-- MarkWharton, 2007-01-18
I think Radomir is mostly referring to email. This is where I most frequently have my problems... when someone emails me text to put into Drupal or a blog for example. But, after reading other opinions on here, I've changed my mind. I just noticed today that my LiveJournal account (now there's a mass audience!) also does line breaking like Drupal. Is anyone besides Radomir against me removing the change to Creole 0.4 for line breaks?
-- ChuckSmith, 2007-Jan-18
Yes, for the reasons I stated above, plus it's essential for any hope of Crossmark compatibility.
I took a quick look at LJ, and, exactly as I suspected, the only instances of <br> I was able to find were "lazy paragraph delimiters" (see http://kristogre.livejournal.com/ for an example), and lists. Both have better markup choices.
A big part of the reason why people like <br> as a paragraph breaker is that the default styling for paragraphs in HTML (and LJ as well, perhaps not so coincidentally) is vast, yawning chasms between them.
I stand by my assertion that if <br> tags are a little harder to get to, their relative scarcity won't be missed.
-- RaphLevien, 2007-01-18