Background
I was having some problems with double escaping when creating XML via PHP5 SimpleXML library. Turns out I had stumbled upon, (IMHO at least) an inconsistency in how SimpleXML handles escaping.
When generating XML, consideration needs to made as to how escaping of the reserved characters < > & ” ‘ are to be handled (Although the apostrophe is an interesting case by itself
). The problem I was having is, I did not realise that, depending on how you use SimpleXML, the library will choose to automatically escape your text or require that you do the escaping. In the latter case, if you had already done the escaping, it results in double escaping. The real issue is there seems to be that this behavior is not documented at all. That coupled with the fact that there have been bugs in PHP versions prior to 5.2.6 (#4478) around this module makes this whole issue utterly confusing. I hope to help document what I have found to help the traveler that follows
.
A lot of the information that follows was a result of reading the following link : http://bugs.php.net/bug.php?id=45253
SimpleXML dual API
The first thing that needs to be understood clearly is that there are 2 ways to create text nodes and attributes in SimpleXML. I am going to refer to them as magic and non-magic methods.
Creating a text node
Non magic:
$sxml=new SimpleXMLElement(’<test></test>’);
$sxml->addChild(’child1′,’One & Two’);
Magic:
$sxml=new SimpleXMLElement(’<test></test>’);
$sxml->child3 = ‘One & Two’;
Creating an attribute
Non magic:
$sxml=new SimpleXMLElement(’<test></test>’);
$sxml->addAttribute(’child1′,’One & Two’);
Magic:
$sxml=new SimpleXMLElement(’<test></test>’);
$sxml['child3'] = ‘One & Two’;
The dual nature API, although cool, can lead to confusion as the automatic escaping rules are different, not only in the magic and non-magic use case but also whether you are creating text nodes or attributes!
Finding
When it comes to attributes , never escape the attributes (whether you are using the magic or non magic methods), SimpleXML will do it for you. When it comes to setting text nodes, if you are using the magic methods, don’t escape, otherwise if you’re setting the value from the addChild method directly, you will have to escape. The way I have chosen to do it is always use the magic methods (for both attribute and text nodes), and let SimpleXML handle the escaping. The test script which I used is here. Tested in PHP 5.2.6 .
Apostrophe
Turns out that the apostrophe character is not escaped automatically. The only explanation I can find is here, where it points to libxml as the reason behind this.