Jim Driscoll's Blog

Notes on Technology and the Web

Hosted WordPress Markdown support is badly broken

leave a comment »

Just a quick update to let users of Hosted WordPress know that the recently advertised Markdown support is badly broken if you’re trying to post source code. (If you’re actually reading my blog regularly, you may have noticed some odd formatting problems come through – I’ve finally figured out what’s causing them.)

The Problem

The standard version of Markdown let’s you specify code blocks by just indenting by 4 characters, while the WordPress version of Markdown claims to support the format “language` for syntax highlighting, for instance by saying:

```java
java code goes here
```

Despite this being very explicitly described in their docs the parser they’re using still strips out all &, <, and > characters within the source code in either of those blocks. Not cool – if I’m testing for greater than or equal to, all my code is silently deleted when I post.

No problem, there’s an alternate syntax that’s WordPress specific – just use shortcodes. To use short codes, you say:

[ code lang="java"]
java code here
[/code]

This prevents the stripping out of the < and > characters, so that’s nice, but there’s still one more problem that you need to deal with, if you’re using a product that uses the WordPress API to post, instead of the website directly: &, < and > are escaped when read out by the API. So that means that if you have any of those characters in your source code block, they’ll be escaped into your editing program, and if you try to post an update, this incorrect escaping will be visible to your users.

I use MarsEdit, but the WordPress official iPhone app also reportedly does this. And since these special characters are only converted on one side of the round trip, if you make more than one edit, this conversion happens multiple times, leaving you with code that looks like this:

java code containing  &amp;amp;amp;amp;amp;gt; here

The workaround

There is a workaround, but it is a bit unpleasant. The following code will display correctly, as well as survive roundtripping with the API:

<pre class="brush: java">
List&lt;Integer&gt; l = new List&lt;&gt;();
</pre>

So, if you’re using the website to post, and never use any other way, then just use the short codes. If you are using the API, or even may consider using it, the workflow you will need to adopt to deal with this is as follows:

Create your post in Markdown, as usual, and mark your code blocks with `language. Then, right before posting, search and replace “`language with <pre class=”brush: language”>. Then search and replace &, >, and <, and replace them with &amp;, &gt;, and &lt; – which should work around all the bugs in place, and should also continue working once they fix the bugs as well.

I’ll be reporting these issues to WordPress, but to say I’m disappointed would be an understatement.

Update (Jan 11, 2014):

WordPress has partially fixed the issue. Here’s the current situation:

List<Integer> - works, auto converted to next line

List&lt;Integer&gt; - works, no conversion

Set<List<Integer>> - doesn't work, converted to Set&lt;List&gt;

```java
List<Integer> - works, autoconverted - (added \ to avoid additional rendering)

```

[ \code lang="java"]
List<Integer>  - doesn't work (added \ to avoid rendering here), still has round trip problem
[\/code]

List&lt;Integer&gt; - doesn't work, "tag" stripped

List&amp;lt;Integer&amp;gt; - doesn't work, auto converted, but displayed as though it wasn't!

<pre>
List<Integer> - doesn't work, "tag" stripped
</pre>

<pre>
List&lt;Integer&gt; - works, no conversion
</pre>

List&lt;Integer&gt;  - doesn't work, "tag" stripped

So, if you use >, < in normal text, it works, but it's auto converted on save to their entity representation. Annoying, but bearable - and if that was all it had done out of the gate, I probably wouldn't even have found that annoying.

If you use those chars in a code block via triple-backtick, that works. Yay! But it's converted on save as well.

And if you use the entity in regular text, that works, and isn't auto converted. Additionally, my original fix continues to work (by wrapping in pre and escaping).

But that's the end of the good news. Shortcodes still don't work, and still have the roundtrip problem.

If you use indentation to mark your code blocks, wordpress will see a "tag" and strip it. Same if you use a pre tag and don't escape.

Much worse, though, is if you use single back tick to show inline code: if you don't escape the code, it stripts the "tag", but if you do escape the code, it displays the raw escaped code to the user, so there's no way to do this:

List&lt;Integer&gt;

and have it work.

The real problem, of course, is that WordPress is trying to re-use their HTML safety code for their Markdown support, when they really, really shouldn't have. The logic I think they should be using is this:

First, scan for all tags using their safety code. If an allowable tag is found, then that's left in. If it is not allowable, then escape it as part of rendering. The stopgap solution of escaping gt and lt in the saved code is probably acceptable, if that escaping on render is too expensive, but it's hard to see how it could be - they're already doing rendering on display anyway, since they have to convert the rest of document. But really, there shouldn't be any code, anywhere that strips tags in the WP Markdown task flow.

Sigh. But I've been where they are, and I understand that it's probably not as easy as that.

Anyway, nice to know they're working on it.

Please keep in mind that there are certainly other render bugs other than the one's I've pointed out here - it's not hard to imagine other combinations of Markdown that tickle their "safe HTML" library.

But at least if you're going to just use the triple back tick code block, that should work.

Further update: (Jan 11, 2014):

If you include text with <S> in your code markup (for instance as a Java Generic Type), the entire markup, escaping, and special handling of WordPress will... mangle... the code it saves to the CMS. So, avoid that. Looks like another way to tickle the safe HTML library. (Hey, I did promise there would be more of these bugs.) While the <S> tag isn't escaped in code markup in Markdown (per the spec, it's treated as a valid HTML tag), the complete mangling is certainly a bug - you essentially leave the "fenced" area, and the code is passed through the safe HTML filter.

And don't even ask about using the html shortcode

Another bug - guess what happens if you try:

```html
<nav>
```

Yep, it doesn't work. The nav tag is stripped. Though tags like div and span work fine. The workaround to use the short code with html works, but roundtripping does still has the same problem of doing repeated substitutions. So, there's no way to do this. Frustrating.

A note about the broken formatting

At some point, the formatting of this page was broken by WordPress markdown parsing changes - it used to work. I really can't figure out how to fix it, so I'm leaving it as is. Sorry about the mess.

About these ads

Written by Jim Driscoll

January 7, 2014 at 6:24 PM

Posted in tools

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 412 other followers

%d bloggers like this: