WASHTML

WasHTML is a PHP script that allow only safe HTML for web-applications that have to display unsafe data. It washes your HTML from dangerous data like javascript, references to unchecked remote files and forms that allow anyone to remotly control web applications in the same domain. The script is short enough to be easily reviewed (around 100 lines).

/* OVERVIEW:
 *
 * Wahstml take an untrusted HTML and return a safe html string.
 *
 * SYNOPSIS:
 *
 * washtml($html, $config, $full);
 * It return a sanityzed string of the $html parameter without html and head tags.
 * $html is a string containing the html code to wash.
 * $config is an array containing options:
 *   $config['allow_remote'] is a boolean to allow link to remote images.
 *   $config['cid_map'] is an array where cid urls index urls to replace them.
 *   $config['charset'] is a string containing the charset of the HTML document if it is not defined in it.
 * $full is a reference to a boolean that is set to true if no remote images are removed. (FE: show remote images link)
 *
 * INTERNALS:
 *
 * Only tags and attributes in the globals $html_elements and $html_attributes
 * are kept, inline styles are also filtered: all style identifiers matching
 * /[a-z\-]/i are allowed. Values matching colors, sizes, /[a-z\-]/i and safe
 * urls if allowed and cid urls if mapped are kept.
 * 
 * BUGS: It *MUST* be safe !
 *  - Check regexp
 *  - urlencode URLs instead of htmlspecials
 *  - Check is a 3 bytes utf8 first char can eat '">'
 *  - Update PCRE: CVE-2007-1659 - CVE-2007-1660 - CVE-2007-1661 - CVE-2007-1662 
 *                 CVE-2007-4766 - CVE-2007-4767 - CVE-2007-4768  
 *    http://lists.debian.org/debian-security-announce/debian-security-announce-2007/msg00177.html 
 *  - ...
 *
 * MISSING:
 *  - relative links, can be implemented by prefixing an absolute path, ask me
 *    if you need it...
 *  - ...
 *
 * Dont be a fool:
 *  - Dont alter data on a GET: '<img src="http://yourhost/mail?action=delete&uid=3267" />'
 *  - ...
 */
      

WORK IN PROGRESS, PLEASE REVIEW.


Thank you.
Frederic Motte
Liazo.fr, high avaibility & security for applications, systems and networks