The markup should not hinder reading of the RawText of the page.
Only characters rarely used in normal text should be used for markup. When a common character is used (like a dash "-" or a slash "/"), it shoudn't be used alone by itself. Escaping should be rarely necessary.
The markup should be visually separated from the content text, it should be obvious what is text and what is content.
AvoidTextTags, as they form words that make text hard to read.
Make it easy to find in RawText parts corresponding to fragments of RenderedPage.
Avoid InvisibleMarkup.
There is a nice article about The Science of Word Recognition on the Microsoft typography page. Their software might suck, but they do have great researchers ;)