Malformed UTF-8 and non-SGML character detection and cleaning implemented