Tuesday, August 12, 2008

PHP and XSL Part 1

This is the first entry in a three part series on the advantages of using XML and XSL to generate a web page. Part one will focus on the downsides of echoing HTML through PHP and how an XML/XSL driven system overcomes these shortcomings. Part two will focus on how PHP uses DOMDocument and the XSLTProcessor to generate XML documents and convert the XML into XHTML with an XSL stylesheet. Part three will introduce the DOMi object, a purpose built class that is designed to simplify and speed up the process of building XSL driven websites.

Part 1 - Discouraging Echo

#1 - HTML echo loses points for unwieldy whitespacing, unreliable highlighting, and bouncing back and forth between C style code and DOM style code.

The modern convention for PHP based websites is to use the echo function to send output to the display. This is problematic for many reasons, yet only a handful of sites use a system other than the tried and true echo system. One of the main drawbacks to using echo is the need to jump back and forth between two vastly different codebases within the same document.

In a single document, we often go into and out of PHP and HTML frequently. Occasionally, these calls are nested within one another (for instance - in an HTML list created by a PHP foreach). We can echo out the HTML as well, but that causes some whitespacing issues with awkwardly formed strings being passed to echo.

In most IDEs, we also lose our syntax highlighting on the HTML inside the echo. In addition to the whitespacing, HTML uses DOM style formatting (nested nodes, attributes, etc.), while PHP uses C style formatting (curly braces indicate scope changes, etc.), and it is occasionally hard to rectify the differences while keeping indentation intact.

#2 - HTML echo loses points for requiring derivative data and clumsiness in scanning a large, deeply nested data tree.

The next drawback of HTML echo lies in derivative data. If you had a multidimensional array, where each element contained an employee task, and each task was given a category, you would have a difficult time getting a count of tasks by category. You would need to run a foreach loop across the array and keep a count of how many tasks fell into each category.

The main problem with this foreach loop is that a count of tasks by category is derivative data from a list of all tasks. Rather than each piece of data being unique and necessary, you now have muddied up the data with something that can be figured out by looking at the rest of the data. If we had a system where extracting deeply nested data were incredibly easy, then we could have PHP only pull unique, essential data and allow the display tools to derive anything additional that is needed.

A related issue is PHP's difficulty in extracting information from a variety of places within an array in a single call. One project I had some time ago involved a multidimensional array containing employee groups. Each group had either subgroups or an employee list, and the depth of an employee was variable, depending on how many subgroups an employee group had. Some employees were fairly shallow, for instance /sales/Ben, whereas some were fairly deep, such as /operations/it/seo/blue/Logan. Because of this, PHP couldn't get me a list of all employees very easily, as a foreach loop only scans one level at a time.

#3 - HTML echo loses points for forcing data and display to be bound together.

The final point I would like to make in this first entry is on how difficult it can be to alter the designs in a page that is using an HTML echo system. When using HTML echo, you are often linking the data and the display in hard to separate ways. While there are some systems that do a fairly good job of keeping data processing away from the display, most of them are not complete and will still leave traces of data touching the display.

When the data and display are intertwined, it becomes difficult to create a truly headless system - one where the display can be changed at will without ever touching the data. A headless system has many advantages, such as ease of converting to an RSS feed, an API or vast site redesigns. When you have to alter the data, even just a bit, to alter the display, you lose mobility and flexibility.

In brief, HTML echo systems were good for awhile, but we have moved beyond that. Just as CSS overtook tables when designing a site, the time is right for XSL to overtake HTML echo. HTML echo is ugly to code, requires repetitive data, is difficult to extract data, and is difficult to change the design.

Part 2 will focus on using DOMDocument and the XSLTProcessor to alleviate these problems. I will show you how you can have PHP build a pure data tree and use XSL to convert that data tree in to a display.

Part 3 will focus on the DOMi object, which is available at sourceforge.net, and how that smooths out the kinks in DOMDocument and integrates XSLTProcessor and DOMXpath into a single, powerful object.

No comments: