Saturday, September 20, 2008

Promoting and Documenting an Open Source Project

As any of my readers will know, I was part of a team to build the DOMi project, which is an open source PHP object designed for XSL driven websites. The project has had less activity than we would like, and as it is the first project like this for any of us, we are at a loss on how to promote this project.

The three of us genuinely feel that this paradigm is the way of future web development, and all of us use it exclusively in our projects. So there's no lack of love or passion or dedication for this project, we just don't know how to get others to give it a shot and see how well it works.

I recently did some documentation for the project. I hope that helps move it along. We should've done this from the start. It's hard to just give out an object like this and assume everyone can figure out how to use it, especially since some of the internals are doing really odd things. It has some good Javadoc style markup in it, though, which should've helped.

Documentation is a bore, though. A necessary evil that must be endured to get a project on it's feet. I'll be thrilled once this is out there and being used for the next generation of web development. HTML echo needs to be phased out, just like table based designs are being phased out. There is no two ways about it - XSL driven systems are superior to HTML echo driven systems.

Why are RSS feeds so dirty?

At work, we run a web framework that incorporates some RSS feeds for automatic content generation on one of the modules and a weather display that can be enabled on the main skin. My boss (who wrote the original version over a year ago) and myself (I took the project over about 4 months ago) tried in vain to get a fully W3C compliant XSL / MVC driven system. We ended up giving up due to not being able to get the XSL to generate valid code. I recently looked into it again, now that I understand the XSLTProcessor a lot better than either of usdid a year ago. I managed to correct the output settings to get all of our code valid, but now I am struggling against the horrors of RSS feeds.

Why do so many feeds use noncompliant code?

I am now working on an extension to DOMi that is used for RSS, with the most central tenet of it's design being cleaned up code. My short list of tasks is to get all attributes placed in double quotes, lowercase all attribute and tag names, add required attributes to certain nodes, and convert deprecated tags to more modern ones.

It'll be regex heavy, but I'm quite handy with regex (rightfully owning the xkcd regex shirt), but it shouldn't be too difficult.

Once it's done, it'll go onto sourceforge with the rest of the DOMi stuff.

Monday, August 18, 2008

I cannot wait for Cuil to crash and burn

In the past ten years, Google has risen from a small startup into one of the most powerful companies in existence. Google knows you better than you know yourself. Google would make a better president than Obama, McCain, Paul or Barr. Google is the new overlord, and I'm happy with that.

So what happens when some minor players who popped up halfway through Google's rise branch off and create a search engine, with the mistaken belief that the dot com bubble never burst?

Cuil is what happens.

Cuil is when a few ex-Google employees start a new company, constantly preach themselves as ex-Google employees, sucker a few investors to drop $33m, and then mismanage themselves into a search engine that has all the usefulness and relevance of Infoseek circa 2000.

Cuil seems to be stuck in the past. The site layout is atrocious, the relevancy is terrible, it has no extra search features (images, maps, news, etc.), and it blows money like it's 1999. Cuil pays for lavish extravagance for all employees, including weekly barbecues, catered lunch daily, gym memberships, on-site doctor, and fridges packed with food and snacks. In principle, this is fantastic, in practice, this is not a good method to run a business, especially when the product is as poor as Cuil.

Cuil is run by thieves. Not in the traditional sense, but in the sense that they were not responsible for Google's success, but they happily took money for proclaiming themselves as ex-Google employees building a Google killer. Cuil is built by a group who backstabbed Google, and the only good thing is that the product is so miserably bad, I know their treachery won't affect Google.

Thursday, August 14, 2008

PHP and XSL Part 3

This is the third entry in a three part series on the advantages of using XML and XSL to generate a web page. Part one will focus on the downsides of echoing HTML through PHP and how an XML/XSL driven system overcomes these shortcomings. Part two will focus on how PHP uses DOMDocument and the XSLTProcessor to generate XML documents and convert the XML into XHTML with an XSL stylesheet. Part three will introduce the DOMi object, a purpose built class that is designed to simplify and speed up the process of building XSL driven websites.

Part 3 - Introducing DOMi

In Part 1 and Part 2 of this article, I explained why the traditional method of HTML echo is on the brink of obsolescence, and why the DOMDocument object is the next logical step in web development. However, if you have used DOMDocument and XSLTProcessor before, you know one major downfall to using those systems - monotonous, tedious tasks to build even a simple object, followed by a hard to remember series of commands to render the XML through the XSLTProcessor.

In Part 3, I will introduce an open source tool that is my bread and butter for XSL driven websites - the DOMi object. DOMi was created purely for XSL driven websites, and as such houses several very useful tools for speeding up the process and briding the gap between PHP data structures, such as arrays, and an XML tree.

How is DOMi used?

DOMi is used in the same way that DOMDocument is used. An instance is created and manipulated. Infact, DOMi has every single function that DOMDocument has. DOMi contains a DOMDocument, XSLTProcessor and DOMXpath, transparently accessed through DOMi. In essence, the following two lines of code are the same...

$Domi->createElement('element');
$Domi->Dom->createElement('element');


The first line attempts to invoke createElement as though it were a member method of $Domi, whereas the second line accesses the $Domi->Dom member property, which is a DOMDocument, and invokes its createElement method. Domi transparently passed the parameters to the DOMDocument object, and will even return it's results. This means that anything written for DOMDocument can be transferred to DOMi with absolutely no issues.

How does DOMi make it easier to build an XML tree?

PHP wasn't designed to manipulate the DOMDocument. It was designed to manipulate arrays and strings and integers. So when you try to use DOMDocument's createElement, appendChild and setAttribute methods to build an XML tree, you will quickly find the process to be extremely time consuming. Building even a simple tree is a labor of many lines and more time than should be required to simply convert one data structure into another.

Enter DOMi::AttachToXml - a method that receives a large variety of data types and converts them to XML and attaches them to the DOMDocument stored within DOMi. AttachToXml accepts two parameters, a data structure and a prefix. The prefix determines what to name the node that is being built, and the data structure is then built into that node, and the node is attached to the root of the DOMDocument. In addition, if you don't want the node attached at the base, you can pass a third, optional parameter that is a DOMNode within the DOMDocument, and the data will be attached as a child node to that DOMNode.

With this system, you can pass an infinitely large multidimensional array to AttachToXml and have a perfect XML tree, ready for the XSLTProcessor. Also - by naming keys in certain fashion, you can even set up your attributes just as easily. With this system, a proper database abstraction layer and a well set up database, you can convert a store's database into an XML tree ready for display in less than 5 lines of code.

How does DOMi render pages?

When DOMi was built for XSL driven websites, there was an obvious need for rendering support. The DOMi::Render method is designed to display a page in one of three formats - HTML, XML or View. The HTML output is the final rendered output, combining the XML and XSL into a web page. The XML output is used for debugging mode to see what the current DOMDocument XML tree looks like - which is essential when you are designing your XSL stylesheets. The View output is a system to convert from one DOMDocument into another and display it on screen as XML. It is used similar to a database view, in that it allows you to alter the real storage structure, and so long as the view is properly updated, any external calls to the XML will still work. This is a great tool for building an API.

To render a page with DOMi, you must first import a stylesheet through the XSLTProcessor::importStylesheet() method. Due to the transparent nature of DOMi, this method can be invoked as DOMi::importStylesheet(). Just as with a regular XSL driven system, importStylesheet requires to be given a DOMDocument that contains an XSL stylesheet, but DOMi has provided tools for importing those stylesheets easily. DOMi::GenerateXSL() receives an array that contains file locations of XSL stylesheets and returns a DOMDocument of an XSL stylesheet that includes all provided stylesheets. This allows for dynamic inclusion of XSL stylesheets, which is useful for CMS skinning.

The stylesheet that is returned by DOMi::GenerateXSL can then be provided to DOMi::importStylesheet and used for final rendering. The last task is to render the page using DOMi::Render(), which accepts one parameter, which is either DOMi::RENDER_XML, DOMi::RENDER_HTML, or DOMi::RENDER_VIEW, and based on what is sent, the DOMi will render the desired results.

A brief look at DOMi

An example of DOMi code to get a list of employees and render on screen is shown below. This script assumes you already have a stylesheet and a function that returns an array of employee data.

//create the document with the root node as 'root'
$Domi = new DOMi('root');

$Domi->AttachToXml(GetEmployeeList(), 'employee')); //attach the array to the DOMDocument

$Domi->importStylesheet($Domi->GenerateXSL(array('stylesheet.xsl')));

$Domi->Render(DOMi::RENDER_HTML);


Conclusion

HTML echo is an old system that needs to be put to pasture. Its limitations are now too great to consider it a viable system for much longer, and a better system has been established. Why continue writing sites in a system that fails in so many regards? Upgrade to an XSL based system as soon as possible. After you get used to it, you'll begin viewing HTML echo in the same light you now view table based designs (I hope, if not, I highly recommend looking into CSS based designs). XSL has a bit of a hard barrier to break. Initially, it seems backwards and worse than HTML echo, but as you get more skilled with it, you begin to realize it's strength, and once you are as skilled with XSL as you are with HTML echo, it becomes painfully clear that HTML echo is a dying system, or at the very least, will be a dying system in the near future.

Sourceforge.net - DOMi

Wednesday, August 13, 2008

PHP and XSL Part 2

This is the second entry in a three part series on the advantages of using XML and XSL to generate a web page. Part one will focus on the downsides of echoing HTML through PHP and how an XML/XSL driven system overcomes these shortcomings. Part two will focus on how PHP uses DOMDocument and the XSLTProcessor to generate XML documents and convert the XML into XHTML with an XSL stylesheet. Part three will introduce the DOMi object, a purpose built class that is designed to simplify and speed up the process of building XSL driven websites.

Part 2 - Encouraging DOMDocument

In part one of this series, I covered several shortcomings of the traditional system of using the PHP echo function to display page contents to the user. I highlighted three main areas:
  1. Code cleanliness - issues with inconsistent whitespacing, faulty syntax highlighting, and the need to rectify two different code styles or risk losing proper indentation.
  2. Data organization - PHP has no clean methods for easily navigating complex data structures, and thus issues arise when trying to display the contents of complex data structures
  3. Non headless system - by echoing the display throughout the execution of the script, it is difficult to fully separate the data from the display, which reduces flexibility of design.
All three of these issues are solved in one fell swoop by switching over to an XSL driven system, and this entry in this series is going to explain how to use DOMDocument and the XSLTProcessor to do this.

What is DOMDocument?

DOMDocument is an object built into PHP that is used to manipulate a document that is built under the Document Object Model. The Document Object Model is a syntax and markup language used primarly for web based applications. Anyone familiar with HTML or XML is already familiar with the structure of DOM. Anyone not familiar with the structure of DOM should go read up on it before proceeding, as I will assume you are already familiar with DOM.

For someone with knowledge of DOM, using DOMDocument will prove to be easy. Just as a DOM structure contains a document, nodes, attributes and text, so does a DOMDocument contain a DOMDocument, DOMNode, DOMAttr and DOMText. The following code snippet is a very basic DOMDocument being built with a single root node that contains 3 child nodes.

<?php

$Dom = new DOMDocument('1.0', 'UTF-8');

$Root = $Dom->createElement('root');
$Dom->appendChild($Root);

$Root->appendChild($Dom->createElement('child', 'first'));
$Root->appendChild($Dom->createElement('child', 'second'));
$Root->appendChild($Dom->createElement('child', 'third'));

?>

This will produce the following output...

<?xml version="1.0" encoding="UTF-8" ?>
<root>
<child>first</child>
<child>second</child>
<child>third</child>
</root>

As you can see, DOMDocument usage is not very difficult at all. DOMDocument::createElement is used to create a DOMElement, and DOMDocument, DOMElement and DOMNode support the member method appendChild, which attaches a provided DOMElement to its new parent. However, this series of articles isn't meant to teach how to use DOMDocument, so I'll leave that up to you to learn more than what I've explained here.

What is the XSLTProcessor?

The XSLTProcessor is an object built into PHP that uses XSL, Xpath and XSLT to convert an XSL stylesheet and an XML tree into XHTML. The object itself is very simple, as it just combines the XSL stylesheet and the XML tree. The more complex, and more powerful, part of the equation lies in the XSL stylesheet itself.

If we were to add the following lines to the previous sample, we would load an XSL stylesheet and output the results of the XSLTProcessor's conversion.

<?php

$Xsl = new XSLTProcessor();

$Stylesheet = new DOMDocument();
$Stylesheet->load('stylesheet.xsl');

$Xsl->importStylesheet($Stylesheet);

echo $Xsl->transformToXml($Dom);

?>

As you can see, the XSLTProcessor needs to be given a DOMDocument stylesheet, and the transformToXml method receives the xml tree as a DOMDocument and returns an XHTML output. For basic work, it really is that simple. The complex part is the new part - the XSL stylesheet.

What is an XSL stylesheet?

An XSL stylesheet is an XML document that is formatted with special xsl nodes. These nodes, such as value-of, template, call-template, for-each, variable, param, etc., are used to dictate commands to an XSLTProcessor.

Here is an example XSL stylesheet that uses the previously created DOM tree to display a list of the child nodes.

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet xmlns:xsl="">
<xsl:template match="/">
<ul>
<xsl:for-each select="root/child">
<li>
<xsl:value-of select=".">
</li>
</xsl:for-each>
</ul>
</xsl:template>
</xsl:stylesheet>

If we break down the document and analyze each node, we can easily see what it is doing.

xsl:stylesheet is the root node to encapsulate the entire stylesheet.

xsl:template is comparable to a PHP function. it is a discreet code snippet that can be executed individually. This template is given the match attribute of "/". Templates can either be given a name attribute or a match attribute. Naming a template allows it to be called at will, and matching a template allows it to be called when the XML tree contains the specified node. In this case, the specified node is "/", which means any root level node. In other words, this template will be called in any situation, and is essentially our starting point for the display.

Next, we put up a simple HTML ul node.

Within the ul node, we use xsl:for-each and a provided xpath to set up a for-each loop. The xpath that was provided is "root/child". Without going outside the scope of this article and explaining xpath, I'll just say that selects all of the child nodes that we created in our earlier document. An xpath is nothing more than a ruleset that is used to match one or more nodes. These rules can be customized in incredible ways to identify a nodelist, and then the contents of the for-each node will reflect each item in the nodelist. This overcomes PHP's foreach shortcoming with only scanning one level of one array at a time, and breaks us from the need to contain derivative or redundant data.

The final xsl node listed here is the xsl:value-of node, which is used similar to PHP's echo. It will take the specified value, which in this case is simply ".", meaning the current node's value, and display it on screen. In this case, it will created an li node and put the value of the node within that li node.

In the end, the XML tree and the XSL stylesheet will combine in the XSLTProcessor to create the following HTML

<ul>
<li>first</li>
<li>second</li>
<li>third</li>
</ul>

The formatting of XSL is identical to HTML (as both use the DOM structure), and thus blends in to create attractive, clean code. Since XSL stylesheets are so commonly used with HTML, almost any IDE that supports XSL will blend the syntax highlighting to keep it nicely viewable. Thus, the first problem with HTML echo is eliminated.

Due to the flexibility of Xpath, complex data structures are navigated with ease, allowing a single xsl:for-each node to loop across data anywhere in the structure, regardless of nesting or location. When you can easily scan and interact with data anywhere in the structure, during the display phase, you no longer have any need to store redundant data in the XML. If you want to derive data from within the tree, you can do it immediately with a single call. Thus, the second problem with HTML echo is eliminated.

Since the display is dictated by an XSL stylesheet that can be swapped out on the fly, your data and display are now firmly separated. The data tree is pure, unencumbered by redundant data, and not forced into any particular look or order. You now have a headless system, and the third and final problem with HTML echo is eliminated.

What I have demonstrated here is just a taste of the power of an XSL driven system. Add in more templates, XSLT functions and a little creativity, and you can turn an XML data tree into any kind of output that you want. In addition to being headless, the data tree is so easily navigated through Xpath, you never, ever need to put in derivative data. If any piece of data can be obtained by looking at the rest of the data, then it is not needed for an XSL driven system.

The final part of this three piece set will focus on the DOMi object. This open source tool is built specifically for XSL driven websites, and contains tools to rapidly transform PHP data types, such as arrays, into XML data trees, while keeping the structure perfectly intact. In addition, it blends in XSLTProcessor to allow quick rendering without the need for intricate knowledge of XSLTProcessor's commands.

Tuesday, August 12, 2008

PHP and XSL Part 1

This is the first entry in a three part series on the advantages of using XML and XSL to generate a web page. Part one will focus on the downsides of echoing HTML through PHP and how an XML/XSL driven system overcomes these shortcomings. Part two will focus on how PHP uses DOMDocument and the XSLTProcessor to generate XML documents and convert the XML into XHTML with an XSL stylesheet. Part three will introduce the DOMi object, a purpose built class that is designed to simplify and speed up the process of building XSL driven websites.

Part 1 - Discouraging Echo

#1 - HTML echo loses points for unwieldy whitespacing, unreliable highlighting, and bouncing back and forth between C style code and DOM style code.

The modern convention for PHP based websites is to use the echo function to send output to the display. This is problematic for many reasons, yet only a handful of sites use a system other than the tried and true echo system. One of the main drawbacks to using echo is the need to jump back and forth between two vastly different codebases within the same document.

In a single document, we often go into and out of PHP and HTML frequently. Occasionally, these calls are nested within one another (for instance - in an HTML list created by a PHP foreach). We can echo out the HTML as well, but that causes some whitespacing issues with awkwardly formed strings being passed to echo.

In most IDEs, we also lose our syntax highlighting on the HTML inside the echo. In addition to the whitespacing, HTML uses DOM style formatting (nested nodes, attributes, etc.), while PHP uses C style formatting (curly braces indicate scope changes, etc.), and it is occasionally hard to rectify the differences while keeping indentation intact.

#2 - HTML echo loses points for requiring derivative data and clumsiness in scanning a large, deeply nested data tree.

The next drawback of HTML echo lies in derivative data. If you had a multidimensional array, where each element contained an employee task, and each task was given a category, you would have a difficult time getting a count of tasks by category. You would need to run a foreach loop across the array and keep a count of how many tasks fell into each category.

The main problem with this foreach loop is that a count of tasks by category is derivative data from a list of all tasks. Rather than each piece of data being unique and necessary, you now have muddied up the data with something that can be figured out by looking at the rest of the data. If we had a system where extracting deeply nested data were incredibly easy, then we could have PHP only pull unique, essential data and allow the display tools to derive anything additional that is needed.

A related issue is PHP's difficulty in extracting information from a variety of places within an array in a single call. One project I had some time ago involved a multidimensional array containing employee groups. Each group had either subgroups or an employee list, and the depth of an employee was variable, depending on how many subgroups an employee group had. Some employees were fairly shallow, for instance /sales/Ben, whereas some were fairly deep, such as /operations/it/seo/blue/Logan. Because of this, PHP couldn't get me a list of all employees very easily, as a foreach loop only scans one level at a time.

#3 - HTML echo loses points for forcing data and display to be bound together.

The final point I would like to make in this first entry is on how difficult it can be to alter the designs in a page that is using an HTML echo system. When using HTML echo, you are often linking the data and the display in hard to separate ways. While there are some systems that do a fairly good job of keeping data processing away from the display, most of them are not complete and will still leave traces of data touching the display.

When the data and display are intertwined, it becomes difficult to create a truly headless system - one where the display can be changed at will without ever touching the data. A headless system has many advantages, such as ease of converting to an RSS feed, an API or vast site redesigns. When you have to alter the data, even just a bit, to alter the display, you lose mobility and flexibility.

In brief, HTML echo systems were good for awhile, but we have moved beyond that. Just as CSS overtook tables when designing a site, the time is right for XSL to overtake HTML echo. HTML echo is ugly to code, requires repetitive data, is difficult to extract data, and is difficult to change the design.

Part 2 will focus on using DOMDocument and the XSLTProcessor to alleviate these problems. I will show you how you can have PHP build a pure data tree and use XSL to convert that data tree in to a display.

Part 3 will focus on the DOMi object, which is available at sourceforge.net, and how that smooths out the kinks in DOMDocument and integrates XSLTProcessor and DOMXpath into a single, powerful object.

Monday, August 11, 2008

The launch of DOMi

Earlier today, the first release of the PHP DOMi object went up on sourceforge. This object was one created by a group of people of which I am a member. This is my first foray into contributing to the open source community I so vocally adore. This object was built with XSL driven websites in mind. It is a tool to ease the transition between PHP data structures and XML data trees, and allow for quick rendering of the XML tree through the XSL stylesheet.

I'm quite excited about all of this. I love XSL driven websites and would like to see this technology overtake traditional HTML echo websites as soon as possible. It is far neater, far cleaner, far more powerful. On top of that, I really want to give back to the open source community and have long been a proponent of knowledge and information being free for all, and now that I have a portable, stable object that I feel can help many people, the first thing I wanted to do was make this available to anyone and everyone.

If you are a PHP developer, I ask that you check out DOMi. The documentation is a bit sparse right now, but it's not hard, and I'm always available to answer any questions. The object is fairly simple, though, so it shouldn't be hard to figure out how to use it.