spudtater | XML Parsing with Java the Quick And Dirty™ way

If you've ever tried to parse an XML file into a set of Java objects, you'll know that you have to jump through hoops to do so. Creating a "SAXParser"¹ is easy enough, but in order to actually do anything with the parsed document, you have to create a so-called XML Handler (a subclass of "DefaultHandler") to help you.

Now the first time you write a parser, this is fine. You define an XML Handler, give it a stack, write a bunch of "if-then-else" statements to handle the various element names etc., and set it off to generate your objects.

The second time, it's a bit easier, since you've done it before. You take your previously-build parser, substitute different element and object names into your if statements, tweak it arbitrarily, and set it going again.

The third time, you start to wonder to yourself: why am I reusing all this code? Surely there are some basic principles that I could follow to write my handler once, and use it multiple times?

This project is an attempt at generalising an XML handler. Let's start with the xml file below:

<?xml version="1.0"?>
<cheese-reviews>
	<reviewer name="Alice">
		<cheese name="Cheddar" pungency="0.5"/>
		<cheese name="Epoisses" pungency="5.0"/>
		<cheese name="Stilton" pungency="9.5"/>
		<cheese name="Brie" pungency="1.0"/>
		<about>When not tasting edible cheeses, Alice likes to listen to 80's pop music and watch Disney movies</about>
	</reviewer>
	<reviewer name="Bob">
		<cheese name="Cheddar" pungency="1.0"/>
		<cheese name="Edam" pungency="0.5"/>
		<cheese name="Gruyere" pungency="3.5"/>
		<about>With over 20 years in the dairy industry, Bob certainly knows his way around a cow!</about>
	</reviewer>
</cheese-reviews>

What actual sequence of Java constructor and method calls does that suggest to you? To me, it suggests:

ArrayList<Reviewer> cheeseReviews = new ArrayList<Reviewer>();
Reviewer alice = new Reviewer();
alice.setName("Alice");

Cheese cheddar = new Cheese();
cheddar.setName("Cheddar");
cheddar.setPungency(0.5);
alice.add(cheddar);

Cheese epoisses = new Cheese();
epoisses.setName("Epoisses");
epoisses.setPungency(5.0);
alice.add(epoisses);

<elided for sanity>

alice.addAbout("When not tasting edible cheeses, Alice likes to listen to 80's pop music and watch Disney movies");
cheeseReviews.add(alice);

Reviewer bob = new Reviewer();
bob.setName("Bob");

<elided for sanity>

cheeseReviews.add(bob);
return cheeseReviews;

And thus, QuickAndDirtyXMLHandler was born. And after implementing Cheese and Reviewer — as simple POJOs (Plain Old Java Objects) — and adding the neccessary methods above, the cheesy xml file above was parsed very nicely indeed.

If you're interested, I'm releasing the code under the LGPL: here's the binary and here's the source (including cheeses).

It's still got plenty that I'm wanting to do with it — for example, so far it only looks for String setters, rather than int, float, etc., which isn't ideal. But it's a good start for now.

[1] I won't cover DOM parsing here, because I'm not very familiar with it.

no subject

Dom parsers are easier than stream parsers

Date: 2009-02-01 07:50 pm (UTC)

fizzyboot.livejournal.com

They can be. If you've got a multi-gigabyte input file, dom parsers may not be particularly practical. (Although I suppose you could store the file on disk and have several disk based indexes to it -- hey I've just invented a cross between SQL and XML).

BTW, how far have you got with that proposal/spec/quote you were going to write up for me?

Date: 2009-02-01 09:27 pm (UTC)

re spec: I wrote a sax xml parser that returns dom elements.

So you do:

def callback(dom_object):
# hello I have a dom object

a = XPath('/thing/blah', callback)

parse(a)

This seems like a simpler way to parse large xml objects, and it is reusable.

Date: 2009-02-01 09:31 pm (UTC)

P.s. xpath is a query language for xml. It is awesome.

Date: 2009-02-01 09:33 pm (UTC)

p.p.s on the whole pulldom looks nicer though.

Date: 2009-02-01 09:09 pm (UTC)

spudtater.livejournal.com

But is there any reason they should be?

Yeah, with dom parsers you don't have to build objects on the fly, and you have random access to information.

(On the whole xpath is much nicer than any api I have used for xml).

Date: 2009-02-01 09:42 pm (UTC)

I suppose I'm starting from the assumption that you want these objects to all be created. But if you're using XML as a data base, then yes, I see your point.

Spudtater's Journal

Recent Thoughts of the Tuberous One

XML Parsing with Java the Quick And Dirty™ way

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Profile

Links

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags