htmlentities() is a well known PHP function used to convert strings in a HTML-safe representation. Recent PHP versions change the way to use it and the documentation page can fool people working in 5.3 or below.
A classic escaping
He would have expected something like “I like chocolate éclairs.” and got “I like chocolate Ã©clairs.”.
You may recognise typical characters - Ã© - involved in conversion of iso->utf8, when the source is already utf8.
When you look at the description of the doc, it says:
My developper told me that according to the documentation page, the function was expecting a UTF-8 string by default, which he provided, so everything should be good. Unfortunatly, it is actually more complicated than that.
PHP changed with the 5.4 release, including its documentation.
If you scroll down a few pages, you’ll see that the PHP version 5.4 has led quite a lot of changes, including the constants or the default settings.
Thus, the default encoding is ISO-8859-1 for versions prior to version 5.4.0 of PHP and UTF-8 from version 5.4.0. Unaware of that, the developper - who was working with PHP 5.3 - remained at the function description and did not specify any encoding (it has always been an optional parameter).
Here is what the developper and I ended with after looking more carrefully the documentation.
- htmlentities() escapes strings in an HTML-safe representation with a different behavior depending on the version of PHP
- 5.4 modifies a lot of things and has an impact in the documentation: reading the description is not sufficient. Read everything. Having a local documentation suited for your PHP version is recommanded.
- Mind that htmlentities() does not recognise all Unicode-Symbols.
- The html_entity_decode() function is the opposite of htmlentities(). Yep, with underscores in the middle of function name.