Saturday, September 20, 2008

Why are RSS feeds so dirty?

At work, we run a web framework that incorporates some RSS feeds for automatic content generation on one of the modules and a weather display that can be enabled on the main skin. My boss (who wrote the original version over a year ago) and myself (I took the project over about 4 months ago) tried in vain to get a fully W3C compliant XSL / MVC driven system. We ended up giving up due to not being able to get the XSL to generate valid code. I recently looked into it again, now that I understand the XSLTProcessor a lot better than either of usdid a year ago. I managed to correct the output settings to get all of our code valid, but now I am struggling against the horrors of RSS feeds.

Why do so many feeds use noncompliant code?

I am now working on an extension to DOMi that is used for RSS, with the most central tenet of it's design being cleaned up code. My short list of tasks is to get all attributes placed in double quotes, lowercase all attribute and tag names, add required attributes to certain nodes, and convert deprecated tags to more modern ones.

It'll be regex heavy, but I'm quite handy with regex (rightfully owning the xkcd regex shirt), but it shouldn't be too difficult.

Once it's done, it'll go onto sourceforge with the rest of the DOMi stuff.

No comments: