Skip to main content

Double byte and PHP (unicode problems)

ยท One min read

A while back I ran into a problem with PHP, how can I read in files that have double byte (unicode) characters and display them in a form that any browser can read. Most programming languages don't handle these characters very well, and end up with non sense instead of passing through the correct text.

This function should be able to strip out any unicode characters from text and return them as html entities (something none of the core PHP functions are able to do).

function removeuni($content){
preg_match_all("/[\x{90}-\x{3000}]/u", $content, $matches);

foreach($matches[0] as $match){
$content = str_replace($match, mb_convert_encoding($match, "HTML-ENTITIES","UTF-8″), $content);
}

return $content;
}