| Home: www.vipan.com | Vipan Singla | e-mail: vipan@vipan.com |
Node: The base datatype of the DOM.
Element: The vast majority of the objects you’ll deal with are "Elements".
Attr: Represents an attribute of an "Element".
Text: The actual content of an "Element" or "Attr".
Document: Represents the entire XML document. A "Document" object is often referred to as a DOM tree.
Document.getDocumentElement()
Node.getFirstChild() and
Node.getLastChild()
Node.getNextSibling() and
Node.getPreviousSibling()
Node.getAttribute(attrName)
Attr object for the attribute named id, use
getAttribute("id").
getElementsByTagName("tag_name")
<tag_name> elements in the document. This
method saves the trouble of writing code to
traverse the entire tree. Or, you can use XPath. See below.
Document object is a type of Node. For example, in:NodeIterator nl = XPathAPI.selectNodeIterator(node, "para");, the argument
node is the context node you want to start searching from. You may obtain the "Document" object using:
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(new File("C:\some_dir\some_file.xml");
The parse method can also take an "InputStream", "URL" or XML "InputSource" object.
After you get the "Document" object, you should collapse all contiguous whitespace and "Text" nodes into one "text" node using:doc.getDocumentElement().normalize();Otherwise, your "Document" object is going to contain so many useless (empty) "Text" nodes that you are going to have a tough time reaching the useful textual content within an element.
para selects the "para" element children of the context node
* selects all element children of the context node
text() selects all text node children of the context node
@name selects the "name" attribute of the context node
@* selects all the attributes of the context node
para[1] selects the first "para" child of the context node
para[last()] selects the last "para" child of the context node
*/para selects all para grandchildren of the context node
/doc/chapter[5]/section[2] selects the second section of the fifth chapter of the doc
chapter//para selects the "para" element descendants of the "chapter" element children of the context node
//para selects all the para descendants of the "document root" and thus selects all "para" elements in the same document as the context node
//olist/item selects all the "item" elements in the same document as the context node that have an "olist" parent
. selects the context node itself
.//para selects the "para" element descendants of the context node
.. selects the parent of the context node
../@lang selects the "lang" attribute of the parent of the context node
para[@type="warning"] selects all "para" children of the context node that have a "type" attribute with value "warning"
para[@type="warning"][5] selects the fifth "para" child of the context node that has a "type" attribute with value "warning"
para[5][@type="warning"] selects the fifth "para" child of the context node if that child has a "type" attribute with value "warning"
chapter[title="Introduction"] selects the "chapter" children of the context node that have one or more "title" children with string-value equal to "Introduction" (Use this to match to a particular element which contains the text value you desire)
chapter[title] selects the "chapter" children of the context node that have one or more "title" children
employee[@secretary and @assistant] selects all the "employee" children of the context node that have both a "secretary" attribute and an "assistant" attribute
div/para is short for child::div/child::para.
attribute:: is @. For example, a location path para[@type="warning"] is short for child::para[attribute::type="warning"].
// is short for /descendant-or-self::node()/. For example, //para is short for /descendant-or-self::node()/child::para. Here, even a "para" element that is a document element will be selected since the document element node is a child of the root node.
//para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.
. is short for self::node(). This is particularly useful in conjunction with //. For example, the location path .//para is short for self::node()/descendant-or-self::node()/child::para
and so will select all para descendant elements of the context node.
.. is short for parent::node(). For example, ../title is short for parent::node()/child::title and so will select the title children of the parent of the context node.
XPathDemo.java file:
import java.io.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.w3c.dom.*;
import org.w3c.dom.traversal.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.apache.xpath.*;
/**
* This class demonstrates how to use Java to parse an XML file and get
* any element's content or attribute's value WITHOUT "walking the tree".
* It uses XPath to achieve this goal. Also shown is a trivial usage of
* an XML transform to print the parsed XML file to console.
*
* Some of the program snippets are by http://xml.apache.org.
*
*/
public class XPathDemo {
public static void main(String[] args) {
if (args.length < 2) {
System.out.println("Usage: ");
System.out.println(
"java -classpath xerces.jar;.;xalan.jar "
+ " XPathDemo your-file.xml your-xpath-string");
return;
}
try {
/****************************************************************
* How to use turn an XML file into a document object in Java
****************************************************************/
System.out.println("Parsing XML file " + args[0] + " ...");
DocumentBuilderFactory docBuilderFactory =
DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
// Parse the XML file and build the Document object in RAM
Document doc = docBuilder.parse(new File(args[0]));
// Normalize text representation.
// Collapses adjacent text nodes into one node.
doc.getDocumentElement().normalize();
/****************************************************************/
/****************************************************************
* How to use xpath to extract info from document object in Java
****************************************************************/
String xpath = args[1];
System.out.println("\nQuerying DOM using xpath string:" + xpath);
// Catches the first node that meets the criteria of xpath string
String str = XPathAPI.eval(doc, xpath).toString();
System.out.println("=>" + str + "<=\n");
/****************************************************************/
/****************************************************************
* How to get root node of the document object
****************************************************************/
Node root = doc.getDocumentElement();
System.out.println("\nRoot element of the doc is =>"
+ root.getNodeName() + "<=");
/****************************************************************/
/****************************************************************
* How to print the parsed xml file right back to system out
****************************************************************/
String xpathString = args[1];
// Set up an identity transformer to use as serializer.
// This one can write input to output stream
Transformer serializer =
TransformerFactory.newInstance().newTransformer();
serializer.setOutputProperty(
OutputKeys.OMIT_XML_DECLARATION, "yes");
// Use the simple XPath API to select a nodeIterator.
System.out.println("\nPrinting subtree under xpath =>"
+ xpathString + "<=");
NodeIterator nl = XPathAPI.selectNodeIterator(doc, xpathString);
Node n;
while ((n = nl.nextNode()) != null) {
// Serialize the found nodes to System.out
serializer.transform(
new DOMSource(n),
new StreamResult(System.out));
}
/****************************************************************/
}
catch (SAXParseException err) {
String msg =
"** SAXParseException"
+ ", line "
+ err.getLineNumber()
+ ", uri "
+ err.getSystemId()
+ "\n"
+ " "
+ err.getMessage();
System.out.println(msg);
// print stack trace
Exception x = err.getException();
((x == null) ? err : x).printStackTrace();
}
catch (SAXException e) {
String msg = "SAXException";
System.out.println(msg);
Exception x = e.getException();
((x == null) ? e : x).printStackTrace();
}
catch (Exception e) {
e.printStackTrace();
}
catch (Throwable t) {
t.printStackTrace();
String msg = "Some other exception while getting XML";
System.out.println(msg);
}
}
}
xerces.jar and xalan.jar files and copy these files in the same directory where you saved the above code in XPathDemo.java file (just to make the demonstration easier).
The download is about 7MB although the two files you need are about 2MB combined. The rest is documentation and the full Java source of Xalan!
XPathDemo.java using:javac -classpath xerces.jar;.;xalan.jar XPathDemo.java
example.xml file in the same directory as the above files (just to make the demonstration easier).
<demo-xpath>
<database-access db-name="db1">
Here is to xpath!
<username>scott</username>
<password>tiger</password>
May be some text here.
Some more text here.
</database-access>
Last text line!
</demo-xpath>
. for current node (in this Java program, same as the root node) and / for root node.
XPathDemo using these commands one by one as examples:java -classpath xerces.jar;.;xalan.jar XPathDemo example.xml / java -classpath xerces.jar;.;xalan.jar XPathDemo example.xml . java -classpath xerces.jar;.;xalan.jar XPathDemo example.xml /demo-xpath java -classpath xerces.jar;.;xalan.jar XPathDemo example.xml //@db-name java -classpath xerces.jar;.;xalan.jar XPathDemo example.xml //username
These runs will demonstrate different ways to use XPath to get the content of an element or the value of an attribute.
toString() method of XObject obtained from the XpathAPI.eval(...) method returns an empty string, not a nullPointerException, by design. Actually, a subclass of XObject, XNull, is returned whose toString() method has been programmed to return an empty string. See Xalan's javadoc.
Each function in the function library is specified using a function prototype, which gives the return type, the name of the function, and the type of the arguments. If an argument type is followed by a question mark, then the argument is optional; otherwise, the argument is required.
number last(): The last node "number" in the node-set.
number position()
number count(node-set): Number of nodes in the node-set.
node-set id(object): id("foo") selects the element with unique ID "foo" and id("foo")/child::para[position()=5] selects the fifth "para" child of the element with unique ID "foo".
string local-name(node-set?): Local part of the expanded-name of the node in the argument node-set that is first in document order. If the argument node-set is empty or the first node has no expanded-name, an empty string is returned. If the argument is omitted, it defaults to a node-set with the context node as its only member.
string namespace-uri(node-set?): Some advanced function.
string name(node-set?): Some advanced function. Returns weird-looking name.
string string(object?): Converts an object to a string as follows:
NaN is converted to the string NaN
true value is converted to the string true.
If the argument is omitted, it defaults to a node-set with the context node as its only member.
NOTE: The string function is not intended for converting numbers into strings for presentation to users. The format-number function and xsl:number element in [XSLT] provide this functionality.
string concat(string, string, string*): Concatenates its arguments.
boolean starts-with("string1", "string2"): Checks if "string1" starts with "string2".
boolean contains("string1", "string2"): Checks if "string1" contains "string2".
string substring-before("string1", "string2"): Returns a part of "string1" up to the first occurance of start of "string2". Or, empty string if no "string2" found.
string substring-after(string, string): Similar to above.
string substring(string, number1, number2?): Substring starting at number1 index position. number2 is end index position if present, otherwise go till the end.
More precisely, each character in the string (see [3.6 Strings]) is considered to have a numeric position: the position of the first character is 1, the position of the second character is 2 and so on. This differs from Java and ECMAScript, in which the String.substring method treats the position of the first character as 0.
The returned substring contains those characters for which the position of the character is greater than or equal to the rounded value of the second argument and, if the third argument is specified, less than the sum of the rounded value of the second argument and the rounded value of the third argument; the comparisons and addition used for the above follow the standard IEEE 754 rules; rounding is done as if by a call to the round function. The following examples illustrate various unusual cases:
substring("12345", 1.5, 2.6) returns "234"
substring("12345", 0, 3) returns "12"
substring("12345", 0 div 0, 3) returns ""
substring("12345", 1, 0 div 0) returns ""
substring("12345", -42, 1 div 0) returns "12345"
substring("12345", -1 div 0, 1 div 0) returns ""
number string-length(string?): Number of characters in the string. If no argument, returns length of string-value of context node.
string normalize-space(string?): Removes leading and trailing whitespace and replaces sequences of whitespace characters with a single space. If no argument, returns length of string-value of context node.
string translate(string, string1, string2): In "string", replaces occurrences of characters in "string1" with character at the corresponding position in "string2". For example, translate("bar","abc","ABC") returns the string BAr. If there is a character in the second argument string with no character at a corresponding position in the third argument string (because the second argument string is longer than the third argument string), then occurrences of that character in the first argument string are removed. For example, translate("--aaa--","abc-","ABC") returns "AAA". If a character occurs more than once in the second argument string, then the first occurrence determines the replacement character. If the third argument string is longer than the second argument string, then excess characters are ignored. Generally used for case-conversion.
boolean boolean(object): Converts object to a boolean as follows:
boolean not(boolean): Reverses the argument.
boolean true(): Returns true.
boolean false(): Returns false.
boolean lang(string): Some advanced function
number number(object?): Converts object to a number as follows:
If the argument is omitted, it defaults to a node-set with the context node as its only member.
number sum(node-set): Sum total of all nodes in node-set after converting their string-values to numbers.
number floor(number): Lower integer than the number
number ceiling(number): Higher integer than the number
number round(number):
The round function returns the number that is closest to the argument and that is an integer. If there are two such numbers, then the one that is closest to positive infinity is returned. If the argument is NaN, then NaN is returned. If the argument is positive infinity, then positive infinity is returned. If the argument is negative infinity, then negative infinity is returned. If the argument is positive zero, then positive zero is returned. If the argument is negative zero, then negative zero is returned. If the argument is less than zero, but greater than or equal to -0.5, then negative zero is returned.
NOTE: For these last two cases, the result of calling the round function is not the same as the result of adding 0.5 and then calling the floor function.
NOTE: For element nodes and root nodes, the string-value of a node is not the same as the string returned by the DOM nodeValue method (see [DOM]).
The string-value of the root node is the concatenation of the string-values of all text node descendants of the root node in document order.
The string-value of an element node is the concatenation of the string-values of all text node descendants of the element node in document order.
NOTE: This is different from the DOM, which does not treat the element bearing an attribute as the parent of the attribute.
= operator tests whether two nodes have the same value, not whether they are the same node. Thus attributes of two different elements may compare as equal using =, even though they are not the same node.
The string-value of comment is the content of the comment not including the opening <!-- or the closing -->.