Slightly skeptical look at XML

News Books Editors Papers Beautifiers   Etc

Another major theme in the keynote was that XML developers are asked to use "APIs from Hell." For example, a programmer working with a purchase order in XML format must deal with events, or child/sibling nodes in a tree, rather than application-level concepts such as products and quantities. Hmm, that's a posting in and of itself, because it ties in with a town hall meeting on storing/querying XML that turned into a discussion of XML APIs. More later.

In his  XML 2003 address Adam Bosworth emphasized the need for us to return to simplicity regarding the current wave of existing/emerging Web Services standards.
 


News

Producing documentation and reusing information in XML, Part 1 Document publishing using XML

 

YML 2.1.9

YML (Why a Markup Language?!) is an easy language to compile into XML. YSLT is an easy language for code generation, automating your software development tasks.

Why Web services sucks...

Guest Article
Why "Web Services" Sucks...

ZapThink LLC, special to SearchWebServices
by Ronald Schmelzer and Jason Bloomberg

The phrase "Web services" (note the lower case "s") has been in use for several years now, either to mean a service offered on a Web site (eCommerce, for example), or Web site-related professional services. The meaning of Web Services discussed in this report, of course, has little to do with either of these vernacular uses of the phrase. In addition, the word "Web" has come to refer to the World Wide Web, which again is only tangentially related to Web Services in that it uses HTTP as a transport protocol and HTML instead of XML for its data format. Just what "Web" is the phrase referring to?

It is true that the World Wide Web and Web Services share the Hypertext Transfer Protocol (HTTP) as their fundamental communications protocol. However, it is possible to use HTTP for types of communication that aren't related to either kind of Web, and furthermore, Web Services do not require HTTP. Nevertheless, the fact that both concepts share HTTP is probably how the term Web got involved in Web Services.

The word "Services" is more straightforward, but still leaves room for some confusion. We are referring here to the fact that we are using a Service-Oriented Architecture (SOA) to expose, bind, and locate available functionality on a networks. However, are Web Services the Services themselves, the overall computing architecture, or are they the software that provides the Services? Do you build Web Services, build applications that provide functionality using Web Services, or do you build software that provides Web Services? If the latter is the case, then what do you call the software? The answer is, unfortunately, "Web Services" -the term is used to refer to the Service, the computing paradigm, and the software.

This multiple meaning is especially pernicious, because Web Services represent a shift from thinking about the software and its functionality to thinking about the Services first, and then thinking about the software behind them. Fundamentally, Service-oriented architectures are not software architectures; they are Service architectures. More annoyingly, is Web Services singular or plural? Is Web Services a singular concept, or are Web Services a collection of modular components? Is Web Services a noun, adjective, or direct object? These are small annoyances that bother not just developers, but the marketers that must promote the technology.

Counterpoint:
Because Web Services are an enabling technology, the terminology isn't particularly important. What's important is what people and companies do with Web Services.

Today, "Web Services" is a buzzword, and as such, marketing departments have chosen it to lead many of their campaigns. As with all buzzwords, however, the phrase's lifetime will be relatively short. As Web Services themselves become increasingly important in the context of the solutions companies build with them, the terminology will either shift to more accurate and descriptive words, or the phrase will simply lose its buzzword status. The real point is that this discussion of terminology is irrelevant to the fundamental technology and business issues that Web Services represent.

ZapThink Opinion:
ZapThink is not interested in the term "Web Services," or any other terminology, for that matter. We'd be perfectly happy if someone coined a different term altogether! In fact, our definition of Web Services doesn't even depend on SOAP or WSDL, but on open, non-proprietary technologies for loosely coupling systems. Given that, in order for the concept to work, those technologies must be standardized and accepted by all implementing bodies.

 

Adam Bosworth, Sloppy KISSes, and WS-Mess

About two months ago, I linked to a tiny little paragraph Adam Bosworth wrote at the end of a completely unrelated weblog entry, where he mentions that he had been trying to justify all of the WS-Complexity when simple XML over HTTP works so well. People have been proposing that simple XML over HTTP hits the 80/20 for awhile and it’s beginning to catch on but today might have been a watershed event for the Loyal WS-Opposition. Adam evidently thought about this stuff really hard over the past two months and has just published the transcript of a brilliant talk he gave at ISCOC04 where he emphasizes simplicity and organicness over complexity and cathedral building in the Web Services space. Herewith some notes and speculation on What It All Might Mean.

What makes this talk so special?

This talk is about this conflict as it relates to computing on the Internet. This talk is also a polemic in support of KISS. As such it is unfair, opinionated, and perhaps even unconscionable. Indeed, at times it will verge on a jeremiad.

Well, for starters, Adam is a complete bad-ass as is obvious by his use of words like jeremiad, which turns out to mean exactly the kind of thing bad-asses talk about all the time:

jer-e-mi-ad : A literary work or speech expressing a bitter lament or a righteous prophecy of doom.

But seriously, this eWeek article from July 2004 talks about Bosworth leaving his Chief Architect/SVP of development post at BEA for Google and gives some history behind Bosworth’s other adventures in technology. He’s been involved—and is often given credit for—the success of many applications and technological achievements over the past decade or so.

The other reason this is an important event for the REST people, and the KISS/YAGNI people in general, is because Bosworth worked primarily on WS technology when he was at BEA. So not only is he a really smart guy in general but his really smart brain has been cranking away on concepts surrounding Web Services for the past couple of years. And now he just casually plops the following out on his weblog:

On the one hand we have RSS 2.0 or Atom. The documents that are based on these formats are growing like a bay weed. Nobody really cares which one is used because they are largely interoperable. Both are essentially lists of links to content with interesting associated metadata. Both enable a model for capturing reputation, filtering, stand-off annotation, and so on. There was an abortive attempt to impose a rich abstract analytic formality on this community under the aegis of RDF and RSS 1.0. It failed. It failed because it was really too abstract, too formal, and altogether too hard to be useful to the shock troops just trying to get the job done. Instead RSS 2.0 and Atom have prevailed and are used these days to put together talk shows and play lists (podcasting) photo albums (Flickr), schedules for events, lists of interesting content, news, shopping specials, and so on. There is a killer app for it, Blogreaders/RSS Viewers. Anyone can play. It is becoming the easy sloppy lingua franca by which information flows over the web. As it flows, it is filtered, aggregated, extended, and even converted, like water flowing from streams to rivers down to great estuaries. It is something one can get directly using a URL over HTTP. It takes one line of code in most languages to fetch it. It is a world that Google and Yahoo are happily adjusting to, as media centric, as malleable, as flexible and chaotic, and as simple and consumer-focused as they are.

On the other hand we have the world of SOAP and WSDL and XML SCHEMA and WS_ROUTING and WS_POLICY and WS_SECURITY and WS_EVENTING and WS_ADDRESSING and WS_RELIABLEMESSAGING and attempts to formalize rich conversation models. Each spec is thicker and far more complex than the initial XML one. It is a world with which the IT departments of the corporations are profoundly comfortable. It appears to represent ironclad control. It appears to be auditable. It appears to be controllable. If the world of RSS is streams and rivers and estuaries, laden with silt picked up along the way, this is a world of Locks, Concrete Channels, Dams and Pure Water Filters. It is a world for experts, arcane, complex, and esoteric. The code written to process these messages is so early bound that it is precompiled from the WSDL’s and, as many have found, when it doesn’t work, no human can figure out why. The difference between HTTP, with its small number of simple verbs, and this world with its innumerable layers which must be composed together in Byzantine complexity cannot be overstated. It is, in short, a world only IBM and MSFT could love. And they do.

What does that mean? I mean, other than the obvious things he’s saying like simple is better than complex. What did he just say in that last sentence? Did he just say that IBM and Microsoft, the two biggest contributors to WS-Madness, stand to gain significantly from making things require complex toolkits as well as certified experts? I think he did.

In fact, that’s what really pisses me off more than anything about the whole WS-Situation. I've never really been able to put my finger on it but I think that he just nailed it for me. When the very first SOAP specs were being published five or six years ago, it was extremely simple and light-weight and was more of a concept than a specification. It was all about “hey, why don’t you expose that customer record as XML over HTTP and then I don’t need access to your database and we won’t have to mess with CORBA and,” pause/think “well, shit! If we slap some SSL on that pipe we could even do this over the public internet.” At some point, the wrong people got involved and turned these simple ideas into another piece of massive complexity and it became a tool for vendor lock-in.

Real quick, I want to make sure I'm not giving the impression that Bosworth was some kind of WS-Nazi and suddenly saw the light in the REST architectural style, joined Google and is now working off the evil points he earned at Microsoft to meet some kind of not-evil quota required by Google; that’s not the case. In fact, I believe he was one of the first people to really champion loosely coupled, late bound, message-based SOAP web services as opposed to tightly coupled, early bound, RPC style web services. But today was the first time I've seen him go so far as to state publicly that the WS stack probably isn’t going to work in a large number of scenarios.

I think we just need to get more enterprise developers hanging out on the public web and seeing what kind of things are possible with a simple set of semi-standard protocols and formats. Bosworth had to leave BEA (enterprise) for Google (web) before he could recognize and think objectively about the value of simple concepts like REST and loosely specified XML messages.

ongoing·XML Is Too Hard For Programmers

XML is a bouncing thriving five-year-old now, and yet I've been feeling unsatisfied with it, particularly in recent times. In particular in my capacity as a programmer.

[Hello there, visitors from /. - there's a whole lot of feedback out there; give me a few days to soak it up and I'll follow-up with some more on the subject, since obviously people care. In the meantime, there's other stuff here you might find interesting.]

During the process of setting up ongoing, for the first time in a year or more I wrote a bunch of code to process arbitrary incoming XML, and I found it irritating, time-consuming, and error-prone.

Some other recent data points:

Programming Baskets Some more background. Serious programming these days more or less all falls into three baskets:

I think all of these communities are having more trouble than they really ought to with XML. Oddly enough, the problem isn't in writing the XML processor, which isn't that hard, look at the number that are out there. The difficulty is in using one.

An XML-Oriented Programing Language? One response has been a suggestion that we need a language whose semantics and native data model are optimized for XML. That premise is silly on the face of it: here are two reasons why:

Life in the Scripting Basket As regards XML, I've been living in the land of scripting generally and Perl specifically in recent times; the internals of the Antarctica runtime codebase are all C, the back end has Java and C++, but these all build and manage internal data structures that look nothing like XML, and the XML we generate is via the venerable printf()-plus-markup-escaping approach.

That leaves input data munging, which I do a lot of, and a lot of input data these days is XML. Now here's the dirty secret; most of it is machine-generated XML, and in most cases, I use the perl regexp engine to read and process it. I've even gone to the length of writing a prefilter to glue together tags that got split across multiple lines, just so I could do the regexp trick.

The reasons are not complicated: If I use any of the perl+XML machinery, it wants me either to let it read the whole thing and build a structure in memory, or go to a callback interface.

Since we're typically reading very large datasets, and typically looking at the vast majority of it, preloading it into a data structure would be impractical not to say stupid. Thus we'd be forced to use parser callbacks of one kind or another, which is sufficiently non-idiomatic and awkward that I'd rather just live in regexp-land.

When I came to do ongoing, I decided as a matter of principle that the input had to be XML and had to be read with a real XML processor. Since, once again, I was going to be using every byte of every file, I decided that loading it all into an in-memory data structure so I could run through it inorder was egregiously stupid, and went with callbacks. Which are irritating.

The program that writes ongoing sets up for processing an entry by initializing a bunch of global state variables, unleashes the XML parser, and stands back. I've been writing Perl since 1993 or so and this just feels awkward and unnecessary. The canonical Perl program, in my idiom anyhow, looks something like:

my ($state_var1, $state_var2) = (0, '');
my (%collector1, $collector2);
while (<STDIN>) {
  next if (/rexexp-for-something-I-ignore/);
  if    (/something-I'm interested-in/) 
  { $state_var1 = &foo($1, $4, \%collector1); }
  elsif (/something-else/)              
  { $state_var2 = &bar($_, $state_var1); }
  elsif (/yet another/)
  {
    $state_var_1 = $state_var2 +  $collector1{baz};
  }
  else { print; }
}

This may feel primitive to the O-O heavies out there, but it's the way a lot of the Net is stitched together.

I'm not sure what the right solution to the XML awkwardness is in O-O land or close-to-the-metal-ville, but I'm pretty damn sure what I'd like to see in Scripting Village. By example:

while (<STDIN>) {
  next if (X<meta>X);
  if    (X<h1>|<h2>|<h3>|<h4>X)
  { $divert = 'head'; }
  elsif (X<img src="/^(.*\.jpg)$/i>X)
  { &proc_jpeg($1); }
  # and so on...
}

The idea is that the element-ish and attribute-y syntax in regexps abstracts away all the XML syntax weirdness, igoring line-breaks, attribute orders, choice of quotemakrs and so on. I've invented some Perl syntax off the top of my head which is a highly dangerous thing to do, particularly in the fraught land of regexps, particularly since the Perloids are re-inventing all that right now in the Perl6 project; so let's be clear that the above is not a serious syntax proposal. But essentially, I want to have my idiomatic regexp cake and eat my well-formed XML goodness too. Too much to ask?

Out of the Scripting Basket I suspect there are parallel proposals to be made for the people who live in the O-O and close-to-the-metal worlds, but they don't leap to the front of my mind. I will make one slightly-brave prediction though: I think that the stream-processing mode of reading and using XML is going to occupy a substantial part of the landscape no matter which basket you're living in; the costs of the alternatives are frequently going to be just too high.

So I think the key first step is to make XML stream processing idiomatic in as many programming languages as possible. Rumor has it that the .NET CLR is going the right way on this one, but I haven't been there.

I guess I ought to say in closing that even given the irritation which programmers encounter in dealing with XML, the benefits are sufficient that the current trend toward using it as the interchange format for more or less everything still seems sound. But we can make people's lives easier I think.

XML & Web Services Magazine – Speaking XML December 2002/January 2003 Issue

XML poses some interesting challenges for programmers. This is the first of a series of columns in which I will look at XML's interaction with programming languages.

Adam Bosworth
Vice President, Engineering
BEA Systems Inc.

XML's schema model is not as hardened as are types in a programming language, but in some ways it is richer. Language has nothing even remotely equivalent to mixed content, for example. Mapping XML into program data structures inherently risks losing semantics and even data because any unexpected annotations may be stripped out or the schema may be simply too flexible for the language.

To illustrate, given an incoming XML message x, imagine that the programmer wants to compute the price-earnings ratio:

XML x  = getxml("somewhere");
PERatio = x.price/( x.revenues - 
	x.expenses);

Today's programmer has two tools available to parse and manipulate XML files: the Document Object Model (DOM) and Simple API for XML (SAX). Both, as we shall see, are infinitely more painful and infinitely more prolix than the previous code example.

While the DOM can be used to access elements, the language doesn't know how to navigate through the XML's structure or understand its schema and node types. Methods must be used to find elements by name. Instead of the previous simple instruction, now the programmer must write something like:

Tree t = ParseXML("somewhere");

PERatio = number(t.getmember(
	"/stock/price"))  /
	(( number(t.getmember(
	"/stock/revenues") - number(
	t.getmember("/stock/expenses"))

In this example, number converts an XML leaf node into a double. This is not only hideously baroque, it's seriously inefficient. Building up a tree in memory uses up huge amounts of memory, which must then be garbage collected—bad news indeed in a server environment.

Now let's examine how a developer might use SAX to implement the same task. First the developer must set up a Content Handler to parse the XML file and then fetch the result of the expression. This requires a charming piece of Java like:

XMLReader xmlreader = new SAXParser();
ContentHandler contentHandler = 
	new MyContentHandler();
xmlreader.setContentHandler(contentHandler);
String uri = "test.xml";
InputSource is = new InputSource(
	new FileInputStream(new File(uri)));
xmlreader.parse(is);
double result = contentHandler.getPERatio()

Of course, the developer must write the class that implements the ContentHandler as well as the method getResult(), which requires more warm and fuzzy code than will fit on this page (see Listing 1).

Imagine if the object-oriented revolution had been ushered in with such syntax just to access an object. It is as if the only way to interact with objects were to use reflections. The object-oriented revolution would have been stillborn. Instead object-oriented languages took care of this plumbing and typing for the programmer.

In short, the current situation is unacceptable. With the increasing ubiquity of XML both as a way to describe metadata and exchange information between programs and applications, and with the rocketing acceptance of XML Web services, it is becoming increasingly necessary for developers to directly access and manipulate XML documents. It should not require that they be rocket scientists to do so.

In the next issue I'll discuss how work that's brewing in the developer community to address these matters holds extraordinary promise for developers everywhere.

xmlsuck Posted by PaulT on Saturday March 08 2003, @11:00AM

Code first, then specify. Anticipatory specs for problems people haven't tried to solve yet are just wild, random shots in the dark; at best, they waste everyone's time, and at worst, they cause confusion and hostility. Most existing XML-related specs should not have been written yet: we don't need a spec to cover X until many, many people have been trying to implement X for a while and have discovered where a common spec might be beneficial. A new field of development shouldn't *start* with a spec; it should *end* with one. --David Megginson on the xml-dev mailing list, Sun, 27 Oct 2002

XML-DEV - OASIS - W3C

http://www.docuverse.com/blog/donpark/2003/03/12.html#a342

I have been on XML-DEV mailing list for a long time. Since then, XML 1.0, SAX, DOM, XML Namespaces (urgh), XSLT, XSL-FO, XML Schema, RELAX-NG, SVG, and other specs have been discussed in-depth on the list and released by W3C. It is interesting to note that OASIS specs were not discussed as throughly as W3C specs on XML-DEV, yet OASIS took over as host XML-DEV from Peter Murray-Rust and Henry Rzepa sometime in 1999. Peter and Henry has since noted that OASIS has not been appropriately supportive of the list.

An accurate observation by Don Park. Actually, it is funny, what happens on and with all the 'W3C' mailing lists. XML mailing lists are 'bad', comparing to almost any mailing list I've seen in my life. XML mailing lists are polluted by W3C politicians, suffer from permathreads, very few people write what they really think e t.c. (Actually, most of interesting and honest stuff about the XML is presented off-list.

On another hand, XML-DEV (still) has some smart and honest people, who post some good stuff once in a while.

Still - I would not bother 'moving' XML-dev anywhere. I think it would be better if more ex-XML-dev participators would create their weblogs. Exactly like Don did himself, BTW.

Xml Sucks

XML does suck, in many ways:

See http://xmlsucks.org/ or http://xmlsucks.org/but_you_have_to_use_it_anyway/

XML describes hierarchically structured text or data, i.e., trees. To handle tree structures you need Context Free grammars. All our search procedures are record or string oriented and based on regular expression, which are Regular Grammars. Regexps do NOT work on trees. This is a fundamental issue and not a problem of botched implementations or overly complex standards. So until someone comes up with a simple way to code CF grammars for searching and manipulating XML trees, XML will be horrible. Still, you cannot shoehorn a tree-based structure into an record-oriented database. So XML is still unavoidable.

Does anyone whine more than LISP/Scheme programmers? (Dont DisagreeByDeleting)

What an inane comment. Seems quite appropriate to me given the sophomoric criticisms of XML on this page. The frat-boy mentality of the approach is amusing: mid-level XML programmer hates some project on because of XML, and so he escalates his ranting into a website. Cool, dude! Maybe they can spin off some books on the name, e.g. a "X Sucks" series of books to compete with the For Dummies/Idiots books.

Well lots of people posted on this page. What about the non-midlevel XML programmer who is frustrated by the technological shortcomings of XML, coupled with marketeering? What about the programmer who is frustrated because some of these design mistakes are decades old and only being repeated in XML? XML could never live up to some of its more ridiculous hype (but then again, what can?), but it could have been much better than it is. If enough of the industry gets behind it (and it looks that way now), the political benefits of a common format (hell, nearly *any* common format) could outweigh the technical issues. That isn't a sure thing, but there is hope. The original "whining" comment was stupid, a weak attempt to wash away a valid criticism by name calling. Drawing connections between sexpr's and XML syntax does not make one a 'whiner', nor a lisp programmer for that matter (anyone with a competent CS background should at least see these connections).

To the original author: please evaluate sexpressions before the knee-jerk reaction what does this mean? The anti-sexpr knee-jerk reaction?

What is too complex?

In summary, XML started out simple, and then caught really nasty FeatureCreep and DesignByCommittee.

It's pretty easy to parse. Not if you actually want to support schemas.

Everything takes training. Some things more than others.

It's not a data model. You build your model on top of it.

Why XML is awful

Introduction

XML is awful, and I shall prove it to you. First, I feel I need to justify my authority in saying this. I spent more than a year on a project that used XML as the interchange language for its internal and external interfaces. I also later wrote a validating XML parser and XPath expression engine in both Java and C to work with XHTML, word processor documents and component assembly instructions.

I think XML is awful. It is harmful, and it is crap. This essay attempts to organise my reasons why.

It is not easy to write an XML parser

Contrary to popular belief, XML is not an easy language to write a (correct) parser for. It is commonly heralded as being a textual language for which an entire marketplace of processors and tools exist. Its textual nature carries the implication that it is platform-neutral and easy and cheap to work with - like you can do most things in XML with a simple text editor, as separate tools are optional.

At first blush, XML's form does indeed look simple. It looks like HTML, with familiar tags in angled brackets, quoted attribute values, a handful of entities like '&amp;' and everything else is unadulterated 'CDATA' text. Actually, from this position, reading "simple XML" is a breeze, and I think that's how people get addicted in the first place.

But, in fact for real projects, the inclusion of XML is just like having a rose bush: roses have a strong attractive appearance, about which there is much romantic talk. They're planted in the richest of establishments and their presence seems to put those walking by at a pleasant ease. But if your job happens to be gardener, you'll find out pretty fast that they are bristling underneath with spines that spike through into your finger with that cold, unnatural bone pain that doesn't leave you.

The first painful thorn of XML is the document type definition. You can't skip these: you need to parse them. Different tools put them in - some don't. It's not clear what to do with processing instructions found in the DTD section, either. You also have to deal with entity expansion, and the strange recursive rules of entity expansion in strange contexts, and the subtle whitespace-stripping rules of XML "string-values".

As you wrench your hand outwards, you feel XML Namespace splinters snapping off to lie dormant for some later, hideous infection. While namespaces look like a good idea, their implementation side-effects begin to pollute every part of your software. No longer is a tag name just a simple string - it's a QName - and they're everywhere. All comparisons of QNames now have to consider null parts and what that means in the context.

The heady aroma of the bush now long forgotten, you examine the bleeding wounds down your arm from the XML schema. Its not clear what to make of XML Schema. It's a complex type system for what was originally "obvious" string attributes and CDATA. Its type system is peculiar, and is hard to consistently align with the various type systems of the languages you would think of using or need to use. It just appears to be an ill fit, and reeks of over-standardisation.

XPath 2.0 is some kind of new neurotoxin that makes your brain twist when you try to understand the subtleties of manipulated tuple values walking up and along path chain filters - when all the time the other half of your brain tells you it looks like a simple directory pathname!

I'll complete the rose bush analogy by suggesting that as the gardner you'll spend the remainder of your lifetime maintaining it with manure and prayers, hoping it doesn't wilt and die in the poor ill-chosen soil it was planted in. But it will.

You cannot think in XML

The second reason why XML is awful is because it is useless for human comprehension. XML is rarely easy to read, although it has been simultaneously promoted as being machine-readable and human-readable. The machine-readable part is only true if the machine is expensive (i.e. correctly programmed), and the human-readable part is misleading, or hopeful at best.

Sure, you can open an XML document in your editor and see the tag names, and some of them might even make sense by themselves. But, it is quite difficult to read large documents for the purposes of reasoning about data structure - even with the luxury of indentation and collapsing viewers. Custom tools can help a lot though - but I find that it is just less confusing to display the content without any reference to XML syntax if you can help it.

The Sapir-Whorf hypothesis from lingustics says that language is the workhorse of thought. If you have no word to express a thought, then it is difficult to think it. On the other hand, if you happen to need to think about some concept it helps considerably if you label it with a word. If you learnt to speak another language fluently, you might be aware of the new world or subtle way of thinking diffently that was opened up when you finally got to think "inside" that new language. More relevantly to the topic at hand, I note that mathematicians think in symbolic arrangement, and most computer programmers (regardless of first language) think with existing English terms.

XML is verbose, clumsy and unrelenting, and makes direct reasoning in it very uncomfortable. The close-tag abbreviation (</>) was dropped early in the XML standards process, while the SGML/HTML trick of implied/optional opening/closing tags was also ditched - all in favour of machine readability and ability to check for errors.

Syntax-highlighting editors help a little bit in the comprehension of XML, especially if they try to factor out all the namespaces for you, and you can collapse a lot of the stuff you don't want to see. But the strict heirarchical nature of XML documents makes easy viewing of the separate aspects of the document impossible. The uniform "shape" of the data makes structural recognition very hard and increases the difficulty of comprehension no end.

These problems all add up to the unhappy situation where only trivially small XML documents can be imagined in your mind. Practically, people working with large XML documents view the content in some other, more natural syntax that they can work with at the different levels needed. The same thing happens conceptually with programmers who go to pains to extract the document content into more useful structures in their language rather than use an XML DOM.

Document types are not standard

I have yet to find a working repository of standardised XML document types. For example, is there a useful standard for book content? Well, yes, it exists, but its not really standard: OpenEBook was "embraced and extended" by Microsoft to make their proprietary eBook standard. I also went looking for one about recipes, but no luck.

XML Schema has provided a muddy pit in which competing standards may each draw from in order to claim a bit more standard-ness, but the real problem of DTD exchange comes from the lack of a central authority. Compare this with the IANA which maintains the list of assigned numbers - both TCP port numbers, MIME media types, and also IP addresses. It is a central place to go look when you are extending the internet. But if you are wanting to build a standard document format on top of XML? Well, the W3C assumes that your industry will take that role, which is silly if your work involves multiple industries - or countries.

XML alternatives

Alternatives to XML exist. I mentioned CSV for tabular data, but for hierarchical structures, there is the older ASN.1. ASN.1 has been around for a while, is used in things like LDAP, security certificates, and some network protocols. It's self-describing, like XML, but its all binary - which makes it efficient. It is amusing to read about the standards submissions to efficiently encode XML as binary!

SGML, the mother of HTML, is also quite old, and has shown its utility, although it suffers a lot of the XML problems I described above.

Prior to XML, standard interchange formats were developed by industry. For example, mapping data formats included ArcInfo, and even DWF. These were well documented and efficient, albeit proprietary.

Today, documents are shipped about in Word and Adobe PDF format which have become de facto standards. PDF incorporates special features to make display possible even with partial file data.

The future of XML

XML has been around for a while. It is certainly mature enough to accept this criticism now. Especially as it seems to be coming out of vogue, and I can write what I feel without fearing too much flak!

I doubt that XML will disappear like hopefully WAP will. (WAP is crap.) XML is now so well entrenched in places like the standard Java library, and some major web tools that it will linger for many years yet to come.

Conclusion

Well, I've griped on for a while now about how XML is awful. I think the root cause of the problem is that XML has been overstandardised in the wrong areas (semantics representation) and under-standardised in the needed areas (semantics coordination). Because of this, and because a good-looking idea was heavily sold to the IT industry, a lot of people are in pain right now. (On the plus side, though, it means more chance of employment to fix up all the mess.)

XML Indent

XML Indent is a XML stream reformatter written in ANSI C. It is analogous to GNU indent.

DocBook XML 4.1.2 Quick Start Guide

DocBook with LyX

DocBook XSL Stylesheets

Get Going With DocBook

DocBook

Welcome to the DocBook Open Repository

IBM developerWorks: A gentle guide to DocBook - How to use the portable document creator

(Oct 10, 2000, 21:24 UTC) (1742 reads) (0 talkbacks) (Posted by marty)
"This article explains what DocBook is and how to create a simple document using DocBook."  

[Jul 9, 2000] IBM developerWorks: Dare to script tree-based XML with Perl

"Parsing an XML document into tree structures makes it possible to operate on the tree structure of the data. Find out how to use the functions for accessing and manipulating the document tree, and follow a sample stock-trading application that uses Perl, DOM, XML, and a database to evaluate trading rules."

InternetNews.com: XML Standards Move Forward

(Jul 9, 2000, 13:50 UTC) (103 reads) (0 talkbacks) (Posted by marty)
"Along with providing linking data structures, XLink additionally provides a minimal link behavior model. Therefore, higher level applications layered on XLink will often specify alternate or more sophisticated rendering and processing treatments."

XML.com: AxKit - XML web publishing with Apache and mod_perl

(May 27, 2000, 23:08 UTC) (125 reads) (0 talkbacks) (Posted by john)
"...the reality is that HTML or XHTML will be served from web servers for a long time to come. This means server-side XML transformation is the most viable option for publishing with XML today."

One of XML's major benefits to web developers is that it is a standard way to separate data from presentation, and create a consistent templating system for a web site. Yet that promise is yet to be fully realized by many, due to the immature state of XML tool support, especially in authoring.

An important part of using XML for web publishing is content delivery. Although XML-to-HTML conversion is partially possible in browsers such as Internet Explorer, the reality is that HTML or XHTML will be served from web servers for a long time to come. This means server-side XML transformation is the most viable option for publishing with XML today.

Server side transformations can be handled at various levels. The most basic of these is static transformations (e.g., using an XSLT processor and some shell scripts), but this method can quickly become awkward, and is not satisfactory for dynamic web sites.

Another option is application server environments such as Zope or Enhydra. If you have a real need to use these products, they are a good choice. But keep in mind that they have a tendency to operate within their own enclosed universe.

A third choice is to use an XML content delivery infrastructure such as that provided by the Apache Cocoon project. Cocoon is a Java-based environment for pipelined transformation of XML resulting in web pages served to the user. It also offers more advanced features for active server pages etc.

AxKit, a mod_perl and Apache-based XML content delivery solution, takes an approach similar to Cocoon. It provides simple ways for web developers to deliver XML utilizing multiple processing stages and style sheets, all programmable through Perl. AxKit takes care of caching so that the developer doesn't have to worry about it. It's also tightly bound to the Apache web server, providing a good route forward for those with an existing investment in mod_perl and Apache.

The fundamental way in which XML is delivered to a client in AxKit is through transformation with one or more style sheets. AxKit does not see style sheets solely in terms of XSLT transformations, but as more generic processing stages allowing arbitrary languages and operations.

In this article, I will describe AxKit's architecture, and give details of its installation and future development. Some familiarity with transforming XML would be helpful in reading this article.

ZDNet Sm@rt Reseller - Exposing XML Myths


Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

DocBook Resources

 

 

XML.com -- O'Reilly & Associates

XML.ORG - The XML Industry Portal, hosted by OASIS

OASIS

TCI XML Tutorials and References Welcome


Tutorials

XHTML 1.0 The Extensible HyperText Markup Language

XML.com - Character Encodings in XML and Perl

XML.com - XML support in IE5

XML Tutorial List of Lessons -- microsoft tutorial (10 lessons)

The Express Way to the Internet

WebTools for HTML, XML, & CSS Manipulate And Display

Welcome to XHTML School

XML and Javascript menu

This is Constantin Kuznetsov's Javascript Topbar Static Menu. Here the XML file is used to create the interface file for the menu script.

 

This excellent cross-browser, two-level, Javascript top-bar menu is the one I use in this site home page. It uses advanced features (although most of them work only in IE), and is easy to use and maintain.

Xparse

Dynamically changing an XML stylesheet using javascript

TechRepublic: How XML will resolve the COM versus CORBA debate and end world hunger

(Mar 18, 2000, 15:33 UTC) (Posted by john) (1 talkback posted) (1038 reads)
"...[legacy] system interfaces are hard coded and require intimate knowledge of both systems in order to make them work together properly. But by using XML to define the interfaces and any custom object types, we can make the process more universal and accessible."

 

SECURITY: XML.com: When XML gets ugly

(Mar 4, 2000, 01:27 UTC) (Posted by dwj) (0 talkbacks posted) (2003 reads)
"The XML Web dreams of a world where machines can read information readily from the web.... Sadly, there are some real problems with this dream..."

 

IBM developerWorks: Tutorial: XML and scripting languages

(Mar 4, 2000, 01:17 UTC) (Posted by dwj) (0 talkbacks posted) (1227 reads)
"Manipulating XML documents with Perl and other scripting languages."

 

IBM developerWorks: The Tcl/SMAPI Project

(Mar 4, 2000, 01:09 UTC) (Posted by dwj) (0 talkbacks posted) (390 reads)
"Developers may also use Tcl/SMAPI to quickly prototype graphical, speech aware applications using Tcl/Tk."

SGMLtools project

Draft sketch of an SGML Editor

On the Road to XML Remaking the LDP as a Digital Library

The SGMLXML Web Page - Home Page


Recommended Papers

 


Editors

XML.com - XML Editors  -- Free or trial versions of XML Authoring Software.

XML Editors at XMLSOFTWARE

Arbortext Products and Services

Arbortext´s XML-based E-Content Software Selected By Leading International Journal And Book Publisher

John Wiley & Sons, Inc. chooses Arbortext's Epic e-content software for speedy preparation of high value content for the Web and print

ANN ARBOR, Mich., July 11, 2000  Arbortext, Inc., a leading provider of Extensible Markup Language (XML)-based e-content software for e-publishing, e-commerce, and B2B e-marketplaces, today announced that its Epic e-content management software has been selected by global publisher John Wiley & Sons, Inc., (NYSE:JWa) (NYSE:JWb). New York-based Wiley currently develops, publishes and produces products in print and electronic format for the scientific, technical and medical communities as well as the educational, professional and consumer markets worldwide.  Wiley will use Arbortext's Epic software to accelerate Internet and print publishing of their journal articles.

 

Athame XML Editor

An experimental Java-based GUI editor comes pre-bundled with DocBook support, Swing and lots of other packages which were probably better left as options. I don't know about the Java 1.2 edition or if the included packages conflicted with my sgml-tools, but all this thing did for me was sit there and/or crash the kvt terminal window. To be fair, it is way too early to expect anything: Sean hopes to develop this into a standard cross-platform free DocBook editor for general use by the open source documentation community. Worth checking back later

 Framemaker for Linux

The great grand-daddy of them all, Framemaker, is now running with the penguins. At 40Mb to download the core system and help files, this is not for the faint-hearted, but that's only a thirdabout the size of Corel WordPerfect, and if it's anything like it's namesake, this is a full featured professional quality publishing workstation capable of taking just about anything you can throw at it. The real question is if that includes DocBook...

The unfortunate answer is "apparently not", or at least, not easily. As with AbiWord, You can save your file in an XML with an automatically generated CSS file, but it is not configurable to your own DTD

There have, however, been posts in the DSSSL mailing list suggesting it is possible to translate Framemaker MIF files to and from other DTDs such as DocBook, and there is a collection of Framemaker tools on Norm's website, so we shouldn't give up on FM just yet; as the Babe said "It ain't over 'til it's over."

 The XML Handbook
  
Goldfarb, Prescod 1999

The co-author also has a companion website at Paul Prescod's Prescod.net

 XML Apache

The goals of the Apache XML Project project are:

  • to provide commercial-quality standards-based XML solutions that are developed in an open and cooperative fashion,
  • to provide feedback to standards bodies (such as IETF and W3C) from an implementation perspective, and
  • to be a focus for XML-related activies within Apache projects

The Apache XML Project project currently consists of four sub-projects, each focused on a different aspect of XML:

  • Xerces - XML parsers in Java, C++, and Perl
  • Xalan - XSLT stylesheet processors, in Java and C++
  • Cocoon - XML-based web publishing, in Java
  • FOP - XSL formatting objects, in Java

 

 OpenJade Homepage
OpenJade is a GPL SGML/XML processing kit to convert documents and DTDs to TeX, RTF, PDF and HTML --- also includes many tutorials and news items about all Style Languages and document processing issues.
 The Debian SGML/XML HOWTO
Linux-oriented guide to the sgml-tools as they are used to produce Linux documents in the Debian project.