XML Backlash?

Tagged:  

Now in the fourth edition of version 1.0 or second edition of 1.1, XML has enjoyed a popularity matched by few other technologies. Introduced in 1998 as a more general-purpose (and extensible) markup language than HTML (and also derived from SGML), XML has spawned a host of other related technologies (XPath, XSL/XSLT, XQuery, XML Schema, Relax NG, etc.) as well as a plethora of XML-based dialect languages covering every conceivable purpose.

In the world of enterprise programming (most notably Java), XML extended its reach to become the data/configuration/metadata format of choice. At one point in time, any software framework that even had a vaguely enterprise-y smell to it relied on XML almost as a matter of course: J2EE (EJB, Servlet API, JSF, etc.), Struts, Spring, Tapestry, and the list goes on. When the ability to communicate via HTTP between browser requests was popularized in 2005 (famously by Jesse James Garrett), XML was so prevalent that it was simply assumed that this was the data serialization format of choice--hence the term AJAX (Asynchronous JavaScript and XML) or the "XmlHttpRequest" object. Use of XML was simply unquestioned.

Then something happened, or started to happen...In the last few years other technologies have begun to encroach in some of the areas where XML was once so dominant: annotations in Java (attributes in C#), JSON , Protocol Buffers, and even YAML. Dissatisfaction with XML is on the rise. Could it be that developers are realizing that XML is not good for everything? The evidence is growing:

  • Much of the motivation behind Google Guice was to create as a "pure java"--i.e., sans XML--implementation of a dependency injection framework. One of the creators of Guice, "Crazy" Bob Lee, has made no bones about his disdain for XML as a framework design tool.
  • Spring itself followed suite, introducing a pure-annotation approach to dependency injection (in addition to the XML-based approach) in Spring 2.5.
  • Wicket advertises itself as having a "refreshing lack of XML".
  • JSON's compactness and ease of serializing/deserializing to and from JavaScript has made it a very appealing alternative to use of XML, and has taken a big bite out of the X in AJAX.
  • Numerous official Java specifications (EJB 3.0, JPA, JSF 2.0, Servlet 3.0) are moving away from the use of XML metadata and towards Java annotations. This is really one of the largest pieces of damning evidence, as Java specifications were one of the major drivers behind the canonization of XML as a key enterprise technology.
  • Some people have even gone as far as to dedicate web sites explaining why XML sucks.

Of course, as one may point out, this could just be the "vocal minority" voicing objections while the quieter majority continue to use it--and indeed the use of XML and the development of XML-related technology shows no sign of really slowing down. But this trend does raise the legitimate question of whether or not XML has really over-reached its original purpose (and usefulness), and needs to be re-evaluated for some of the use cases to which it is currently being applied. So, now that XML is in the denouement of its hype cycle, it is a good candidate for a more honest evaluation of its strengths and weaknesses.

Let's start with the minuses...

Drawbacks of XML

  • Verbose: By its very nature as a markup language, XML contains considerable redundancy (i.e., <tag></tag>). While this suits the hierarchical structure of markup languages well, it can be a big drawback, especially when dealing with large amounts of data. This verbosity carries real consequences in terms of processing efficiency and network transmission overhead. Technologies like MTOM or XOP are really a hack to get around this problem.
  • Trees: The hierarchical tree structure of XML is a generally useful structure, but is not naturally suited to every problem domain. Some types of data are simply better suited to other data structures: lists, maps, etc. Unnecessary representation of this data as a tree carries some consequences in terms of processing efficiency and complexity.
  • Markup Language: XML is a markup language, not an imperative or functional language. And it is not good at faking either one. This seems to be a fundamental point missed by some fairly knowledgeable people. The otherwise well-designed BPEL is a case in point: right ideas, wrong technology. This doesn't mean that XML can't be used as a kind of "Poor Man's DSL", but being declarative is about as far as one should stretch a markup language.
  • Language Metadata: Though specifications like XML Schema brought a kind of type system to XML, this was a type system meant to be language agnostic. Historically speaking, however, it is common to see XML applied as a tool for language metadata, forcing the tedious and non-typesafe use of references to language types. This is the classic XML attribute class="com.example.Foo" seen in way too many Java enterprise frameworks. A real facility for language metadata (annotations in Java or attributes in C#) is a much better solution.

Nothing world-shaking here. Most developers having to type out XML documents have probably thought of these at some point in time. So what are the good points?

Advantages of XML

  • Platform and Language Neutral: Although other competing technologies can make the same claim, this is one of the big reasons for the rise in XML's popularity in the first place.
  • Great tools: There are a very rich set of tools for working with XML, which is certainly one of the reasons for its great popularity. This makes working with XML a much simpler choice, since in most languages the parser and other tools have already been written for you.
  • Readability: Some people may argue with this and provide good counter-examples (EJB deployment descriptor files come to mind), but in general 90+% of XML documents I've ever seen are fairly readable. This readability, however, certainly does not scale: larger, more complex XML documents tend to be fairly unreadable, but this is often more of a consequence of the misapplication of the technology.
  • Namespaces: Although using different namespaces in an XML document can have some unexpected surprises for the beginner, generally speaking namespaces are a pretty powerful feature of XML. They enable, among other things, ideas like "mashups"--i.e., XML documents being extended or combined with other XML documents (or content) in ways not necessarily foreseen by the providers of those documents (think Yahoo Pipes). Being able to avoid conflicts between different data sources is one of XML's great advantages over other technologies that do not support namespaces.
  • Validation: Built-in data validation is another one of XML's advantages. However one may feel about the widely-used standard, XML Schema, having the heavy lifting of this tedious functionality off-loaded from the author to the tools is truly a blessing.

Using simple math, it would seem that the advantages outweigh the disadvantages. But of course, it isn't that simple and the benefits vs. drawbacks have to be weighed on an individual basis. Most of the abuses of any given technology usually stem from the case that simple facts like this get overlooked or forgotten amidst the hype. In the worst cases, this kind of thinking results in elaborate specifications that are designed simply as workarounds to the limitations of the technology. The "right tool for the right job" is the caveat here, but the warning seems easily forgotten.

The popularity of XML is definitely here to stay and the technology is generally "good enough" for most purposes to which it has been applied. But I think that it is important, as with any technology, to apply some critical thinking before using it for a given purpose. If you find yourself in the middle of coding an elaborate workaround to a problem you are encountering (performance or otherwise), the question "Why am I doing this?" should be more than a passing thought. A little (un?)common sense goes a long way in building the right solution.

With annotations, you do Java configuration in native .java code. With JSON, you can represent JavaScript data in native .js code. XML isn't a problem, we just don't need it for everything anymore.

I agree that XML isn't good for everything. When all those XML-based config files started to appear in the wake of the 1.0 spec, I couldn't understand it. It was as if XML had to be used, for some strange reason. Well, it doesn't have to. There are (plenty of) times when it's a really bad choice.

XML was designed for structured documentation and as a "web-friendly" SGML Lite, not as yet another toolkit for yet another programming language.

It's not really that easy to read. It takes plenty of power to parse. It doesn't save any space transferring across the net. I'm glad it's getting used less.

Most of the abuses of any given technology usually stem from the case that simple facts like this get overlooked or forgotten amidst the hype. In the worst cases, this kind of thinking results in elaborate specifications that are designed simply as workarounds to the limitations of the technology.

The CSV format (Comma Separated Values) is used a lot.

  • It is better at representing tabular data
  • If the data is going to be stored in an RDBMS there is less of an impedance miss-match
  • It is easier to parse
  • It is more human readable

I am not passsing judgement on wether it should be used, but I am saying it is used. To the extent it is so common place w don't notice and overlook it.

LOL. Yes, CSV is actually most likely the king of file formats (if one were to peek behind the firewalls of most corporations). And you can't get much more to the point in representing a simple list.

What kills it is the encoding. CSV usually uses the local 8-bit encoding, instead of unicode.

CSV's use of a common character for a separator compounded by the painfully boneheaded use of quotes thereafter mean that interpreting CSv files is always heuristic guesswork. Tab-separated ones don't have this problem.

Tabs are only lightly less common than commas in the text you're trying to separate, plus they're awful to try to read manually. Pipes are a much better choice.

Tabs are only suitable for cutting and pasting in and out of a spreadsheet.

I like XML, it's my favorite language!

I am one of the haters of XML. My biggest problem with it is config files.

Config files need to be human readable. XML is not human readable - it is too complex.

I hope that every author who made XML config files for other humans will reconsider quickly and offer an alternative that is less verbose.

Less XML for it.

PS: I dont know why so many used XML for config stuff. Maybe it was hyped so much in the past.

"[XML was] Introduced in 1998..."
"in 1995 ... XML was so prevalent"

So, wait, XML was prevalent before it was introduced? I think AJAX was popularised in 2005, not 1995.

Hmm...must have something to do with the space-time continuum...either that or it's a typo ;-).

Fixed.

Maybe five years ago it seemed many business oriented persons were thinking that if applications used XML in any form it would magically allow applications to interconnect. So in many projects it was "mandatory" to use XML in some form, so XML was used for example in config files. As some previous commenters pointed out, XML might not be the best for config files because of size, hierarchy etc.

I think your article has a little too much attitude that misrepresents the past. Ex. "Use of XML was simply unquestioned." and "Could it be that developers are realizing that XML is not good for everything?". There has always been heavy criticism of XML, it's not as if the developer community one day realized it "sucks". Also, using a technology should not be considered a vote of "satisfaction".

On an unrelated note; what is the state of validation for these supposed XML alternatives? Are there any easy, standard ways to validate the contents of a JSON or YAML document? This was always the thing I looked to XML for. When I have a program that wants to read some data, I want to be able to validate that the data is coming in exactly how I expect, completely independent even of my program that reads it. It's nice to have a separate tool that anyone can run that validates "OK, this is broken, don't bother trying". Thus schemas can be used as a contract between a program and a data provider. And it saves lots of boilerplate and bug-prone validation code from the application, assuming the application trusts the validator (which, admittedly, sounds suspect).

Not that validation has ever been exactly rosy in the XML world, in my opinion... so I'm wondering if the alternatives are feasible.

Actually, there was a time when using XML really was unquestioned. As a Java developer, I can't remember how many times some new framework or specification was rolled out, and the first thing you heard was "OK, this is how you write the XML file...". Nobody batted an eyelash at the idea. Many times it gave your application the aura of being more "enterprise" suitable. I'm not trying to really bash XML, per se; I am just trying to be a little more pragmatic about what it is suitable for...

As far as validation is concerned, I know that JSON has a proposal for JSON schema. Not sure if there is anything formally for YAML (though there are some out there informally). My general impression of JSON/YAML is that the tools are far less mature than XML--which may be a good reason for using XML in many cases. Personally, I'd like to see a good alternative (tools and all) out there to XML, but that may take a while to develop; it certainly took a while for XML to become as mature as it is today.

XML is readable, but not very much so. YAML is much more readable. I'm not sure readabilty should be in the plus column.

Dude, you're a bi late to the party. Some of us have been backlashing against XML since '97 or so. It was a bad idea.

[...] 4 Dead, four Twitter bots fight zombies in real-time First saved by nomecreonada | 9 days ago XML Backlash? First saved by MajorBritneyFan | 11 days ago Installing sfGuardPlugin in symfony 1.1 — A [...]

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <pre> <div> <blockquote> <object> <embed> <img> <param>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Copy the characters (respecting upper/lower case) from the image.