XML Processing in Java: JAXP DOM Example

Java XML tutorial and example: how to parse XML in Java using JAXP DOM API.

1 What is DOM

DOM, which is short for Document Object Model, is an object representation in-memory of an XML document.

The XML document is presented as a tree-structure Document object in DOM.

Document_Object_Model_DOM

According to DOM, everything in an XML document is a DOM Node, but can be different node type.

  • the entire XML document is considered as a Document DOM node;
  • every XML element is an Element DOM node;
  • every attribute of XML elements is an Attribute DOM node;
  • XML comments are Comment DOM nodes;
  • the text in the XML elements are Text DOM nodes;

Node is defined as an Java interface in JAXP (Java API for XML Processing) API. And Document, Element, Attribute, Comment, Text are interfaces inherit from Node.

JAXP_DOM_node_interfaces_inheritance_hierarchy

Different DOM node types are illustrated with a real XML document as follows.

different_node_types

2 Load XML file

JAXP DOM APIs are included in javax.xml.* package and org.w3c.dom.* package.

JAXP provides DocumentBuilder to load an XML document as a DOM Document object.

JAXP_DocumentBuilder

For example, given a sample XML file named students.xml.

<?xml version="1.0" encoding="UTF-8"?>
<students>
    <student id="001">
        <![CDATA[
        CDATA section may use reserved characters like < > & "
        ]]>
        <name>Tom</name>
        <gender>male</gender>
        <!-- Tom is a cat -->
    </student>
    <student id="002">
        <name>Jerry</name>
        <gender>male</gender>
        <!-- Jerry is a mouse -->
    </student>
</students>

Load the sample XML file using DocumentBuilder.

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;

// ...

DocumentBuilderFactory df;
DocumentBuilder builder;
Document document;

try {
    // Obtain DocumentBuilder factory
    df = DocumentBuilderFactory.newInstance();
    
    // Get DocumentBuilder instance from factory
    builder = df.newDocumentBuilder();
    
    // Document object instance now is the in-memory representation of the XML file
    document = builder.parse("src/students.xml");
} catch (Exception e) {
    e.printStackTrace();
}

Note: DOM will load the entire XML into memory and create a document tree object at once. If the XML file is too large, your application will crash due to OOM (Out of Memory) error.

3 More about Node interface

Node interface defines common operations on a DOM node.

overview_of_DOM_node_CRUD_interfaces

3.1 Query

For instance, retrieve all student elements in the given sample XML file.

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;

// ...

DocumentBuilderFactory df;
DocumentBuilder builder;
Document document;

try {
    df = DocumentBuilderFactory.newInstance();
    builder = df.newDocumentBuilder();
    document = builder.parse("src/students.xml");
    
    // Query by tag name
    NodeList studentNodesList = document.getElementsByTagName("student");
    
    for (int i = 0; i < studentNodesList.getLength(); i++) {
        Element studentItem = (Element) nodeList.item(i);
        System.out.println(studentItem.getAttribute("id"));
    }
} catch (Exception e) {
    e.printStackTrace();
}

3.2 Node types

As I mentioned earlier, there are different types of nodes that inherit from Node interface, such as Document, Element, Text, Comment. JAXP DOM has API to let you check node type conveniently.

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.CharacterData;
import org.w3c.dom.Comment;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

// ...

DocumentBuilderFactory df;
DocumentBuilder builder;
Document document;

try {
    df = DocumentBuilderFactory.newInstance();
    builder = df.newDocumentBuilder();
    document = builder.parse("src/students.xml");
    
    // Query by tag name
    NodeList studentNodesList = document.getElementsByTagName("student");
    
    for (int i = 0; i < studentNodesList.getLength(); i++) {
        Element studentItem = (Element) nodeList.item(i);
        System.out.println(studentItem.getAttribute("id"));
        
        NodeList studentItemChildNodes = studentItem.getChildNodes();
        for (int j = 0; j < studentItemChildNodes.getLength(); j++) {
            Node childNode = studentItemChildNodes.item(j);
            
            // Element
            if (childNode.getNodeType() == Node.ELEMENT_NODE && "name".equals(childNode.getNodeName())) {
                System.out..println("name: " + childNode.getFirstChild().getNodeValue());
            }
            
            // Element
            if (childNode.getNodeType() == Node.ELEMENT_NODE && "gender".equals(childNode.getNodeName())) {
                System.out..println("gender: " + childNode.getFirstChild().getNodeValue());
            }
            
            // Comment
            if (childNode.getNodeType() == Node.COMMENT_NODE) {
                Comment comment = (Comment)childNode;
                System.out..println("comment: " + comment.getData());
            }
            
            // CharacterData (CDATA)
            if (childNode.getNodeType() == Node.CDATA_SECTION_NODE) {
                CharacterData cData = (CharacterData)childNode;
                System.out..println("CDATA: " + cData.getData());
            }
        }
    }
} catch (Exception e) {
    e.printStackTrace();
}

3.3 Delete and save

We'll delete the gender child node of the first student element and save it back to XML file.

import java.io.FileOutputStream;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Node;

// ...

DocumentBuilderFactory df;
DocumentBuilder builder;
Document document;

try {
    df = DocumentBuilderFactory.newInstance();
    builder = df.newDocumentBuilder();
    document = builder.parse("src/students.xml");
    
    Node firstGenderNode = document.getElementsByTagName("gender").item(0);
    
    // Remove self from parent
    firstGenderNode.getParentNode().removeChild(firstGenderNode);

    TransformerFactory tfFactory = TransformerFactory.newInstance();
    Transformer tf = tfFactory.newTransformer();
    tf.setOutputProperty(OutputKeys.INDENT, "yes");
    tf.transform(new DOMSource(document), new StreamResult(
            new FileOutputStream("src/students_modified.xml")));
} catch (Exception e) {
    e.printStackTrace();
}

Notice that we save the modified document object to XML file using javax.xml.transform.Transformer.

transformer_save_DOM_to_XML

3.4 Create and insert a new element

Create and add a new student element.

import java.io.FileOutputStream;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Node;

// ...

DocumentBuilderFactory df;
DocumentBuilder builder;
Document document;

try {
    df = DocumentBuilderFactory.newInstance();
    builder = df.newDocumentBuilder();
    document = builder.parse("src/students.xml");
    
    // New name element
    Element name = document.createElement("name");
    name.setTextContent("Bugs Bunny");
    
    // New gender element
    Element gender = document.createElement("gender");
    gender.setTextContent("male");
    
    // New student element
    Element newStudent = document.createElement("student");
    // Add id attribute
    newStudent.setAttribute("id", "003");
    
    newStudent.appendChild(name);
    newStudent.appendChild(gender);
    
    // Obtain the first student element
    Element firstStudent = (Element) document.getElementsByTagName("student").item(0);
    
    // Insert the new created element before the first student element
    firstStudent.getParentNode().insertBefore(newStudent, firstStudent);
    
    TransformerFactory tfFactory = TransformerFactory.newInstance();
    Transformer tf = tfFactory.newTransformer();
    tf.setOutputProperty(OutputKeys.INDENT, "yes");
    tf.transform(new DOMSource(document), new StreamResult(
            new FileOutputStream("src/students_modified.xml")));
} catch (Exception e) {
    e.printStackTrace();
}

4 Summary

Pros:

DOM loads entire XML document as a tree-structure object in memory.

It's easy to read, delete and update the DOM document tree object in memory.

Cons:

DOM object tree may take too much memory.

Comments