Critical WordPress XSS Update
Last Updated: 2014-11-20 19:42:42 UTC
by Johannes Ullrich (Version: 1)
Today, Wordpress 4.0.1 was released, which addresses a critical XSS vulnerability (among other vulnerabilities). 
The XSS vulnerability deserves a bit more attention, as it is an all too common problem, and often underestimated. First of all, why is XSS "Critical"? It doesn't allow direct data access like SQL Injection, and it doesn't allow code execution on the server. Or does it?
XSS does allow an attacker to modify the HTML of the site. With that, the attacker can easily modify form tags (think about the login form, changing the URL it submits it's data to) or the attacker could use XMLHTTPRequest to conduct CSRF without being limited by same origin policy. The attacker will know what you type, and will be able to change what you type, so in short: The attacker is in full control. This is why XSS is happening.
The particular issue here was that Wordpress allows some limited HTML tags in comments. This is always a very dangerous undertaking. The word press developers did attempt to implement the necessary safeguards. Only certain tags are allowed, and even for these tags, the code checked for unsafe attributes. Sadly, this check wasn't done quite right. Remember that browsers will also parse somewhat malformed HTML just fine.
A better solution would have probably been to use a standard library instead of trying to do this themselves. HTML Purifier is one such library for PHP. Many developer shy away from using it as it is pretty bulky. But it is bulky for a reason: it does try to cover a lot of ground. It not only normalizes HTML and eliminates malformed HTML, but it also provides a rather flexible configuration file. Many "lightweight" alternatives, like the solution Wordpress came up with, rely on regular expressions. Regular expressions are typically not the right tool to parse HTML. Too much can go wrong starting from new lines and ending somewhere around multi-byte characters. In short: Don't use regular expressions to parse HTML (or XML), in particular for security.