XML Processing in Java: JAXP SAX Example

Java XML tutorial and example: how to parse XML in Java using JAXP SAX API.

1 What is SAX

SAX means Simple API for XML. SAX reads an XML document as stream from top to bottom instead of loading the entire XML document into memory at once.

SAX is event-driven and uses event-handler pattern.

architecture_of_SAX_parser

Once the SAX parser reads and recognizes an XML node from the reading stream, an event will be triggered to notify the SAX reader. Then the SAX reader will call correspending callback methods of its handler to process the XML node data.

SAX_event_and_handler

So the main task of developer is to give a handler and implement its callbacks to process XML data.

2 Steps

JAXP (Java API for XMLProcessing) SAX APIs are included in package javax.xml.* and package org.xml.sax.*.

Main steps of using SAX to parse XML documents.

  1. Get an instance of SAXParserFactory.

    SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
  2. Obtain SAXParser from SAX parser factory.

    SAXParser saxParser = saxParserFactory.newSAXParser();
  3. Create a handler.

    JAXP defined four handler interfaces for different purpose. For example, ContentHandler for processing XML content; ErrorHandler for error handling.

    There are two ways to create your own handler here. First of all, your handler class can implement one or more above interfaces directly. But you need then implement all callback methods in them. Some handlers of above defined so many methods.

    class YourHandler implements ContentHandler, ErrorHandler {
        // Implement all methods here
    }

    Secondly and additionally, JAXP provided a DefaultHandler which had already implemented all these interfaces and given empty implementation to all defined callback methods. As a matter of convenience, your own handler can extend the DefaultHandler and then just override the callbacks you need.

    class YourHandler extends DefaultHandler {
        // Only override what you need
    
        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {}
    
        @Override
        public void characters(char[] ch, int start, int length) throws SAXException {}
    
        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {}
    }
  4. Set the handler instance to the SAX reader or the SAX parser, and start to parse the XML docuemnt.

    XMLReader xmlReader = saxParser.getXMLReader();
    xmlReader.setContentHandler(new YourHandler());
    xmlReader.parse("src/Books.xml");

    Or set your handler to SAX parser directly and start to parse in the meantime.

    saxParser.parse("src/Books.xml", new YourHandler());

3 Example

3.1 Goal

  1. Read data about books from an XML file;
  2. Encapsulate book items to JavaBean objects.

Files and folder structure of this example project.

java_SAX_example

3.2 Sample data

An XML file about books: src/Books.xml.

<?xml version="1.0" encoding="UTF-8"?>
<books>
    <book ISBN="ISBN-001">
        <name>How To Be A Cat</name>
        <author>Tom</author>
        <price>$29</price>
    </book>
    <book ISBN="ISBN-002">
        <name>How To Be A Mouse</name>
        <author>Jerry</author>
        <price>$25</price>
        </book>
</books>

3.3 Sample codes

JavaBean class for book in BookBean.java

package saxdemo;

public class BookBean {
    private String name;
    private String author;
    private String price;
    private String ISBN;
    
    public String getName() {
        return name;
    }
    
    public void setName(String name) {
        this.name = name;
    }
    
    public String getAuthor() {
        return author;
    }
    
    public void setAuthor(String author) {
        this.author = author;
    }
    
    public String getPrice() {
        return price;
    }
    
    public void setPrice(String price) {
        this.price = price;
    }
    
    public String getISBN() {
        return ISBN;
    }
    
    public void setISBN(String iSBN) {
        ISBN = iSBN;
    }
    
    @Override
    public String toString() {
        return "[" + this.name + "]" + " by " + this.author + " at " + this.price;
    }
}

Create a BookHandler class extends org.xml.sax.helpers.DefaultHandler, then override startElement(), characters() and endElement() methods.

package saxdemo;

import java.util.ArrayList;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class BookHandler extends DefaultHandler {

    private String mCurrentTagName;
    private BookBean mBook;
    
    private ArrayList<BookBean> mBookList = new ArrayList<BookBean>();
    
    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        
        // Remember the current element tag
        this.mCurrentTagName = qName;
        
        // If current tag is a new book element item, create a new BookBean object
        if ("book".equals(this.mCurrentTagName)) {
            this.mBook = new BookBean();
            this.mBook.setISBN(attributes.getValue("ISBN"));
        }
    }
    
    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
    
        if ("name".equals(this.mCurrentTagName)) {
            String name = new String(ch, start, length);
            this.mBook.setName(name);
        }
        
        if ("author".equals(this.mCurrentTagName)) {
            String author = new String(ch, start, length);
            this.mBook.setAuthor(author);
        }
        
        if ("price".equals(this.mCurrentTagName)) {
            String price = new String(ch, start, length);
            this.mBook.setPrice(price);
        }
    }
    
    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
    
        // If parsing of a book item is finished, add it to the list and reset mBook
        if ("book".equals(qName)) {
            this.mBookList.add(this.mBook);
            this.mBook = null;
        }
        
        // Reset current element tag
        this.mCurrentTagName = null;
    }
    
    public ArrayList<BookBean> getBookList() {
        return this.mBookList;
    }
}

Parse Books.xml and print the results book list in the main function of SAXDemo.java.

package saxdemo;

import java.util.ArrayList;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.XMLReader;

public class SAXDemo {
    public static void main(String[] args) {
        ArrayList<BookBean> bookList = null;
        BookHandler bookHandler = new BookHandler();
        
        SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
        SAXParser saxParser;
        
        try {
            saxParser = saxParserFactory.newSAXParser();
            
            XMLReader xmlReader = saxParser.getXMLReader();
            xmlReader.setContentHandler(bookHandler);
            xmlReader.parse("src/Books.xml");
            
            /* or */
            // saxParser.parse("src/Books.xml", bookHandler);
        } catch (Exception e) {
            e.printStackTrace();
        }
        
        bookList = bookHandler.getBookList();
        
        if (bookList != null) {
            for (BookBean book : bookList) {
                System.out.println(book);
            }
        }
    }
}

4 Summary

Pros:

SAX is more efficient and cost less memory because SAX won't create an entire DOM tree for XML document in memory.

Cons:

Inconvenient to modify the XML document.

Comments