Java API for XML Processing: Difference between revisions

Content deleted Content added
 
(100 intermediate revisions by 61 users not shown)
Line 1:
{{Short description|Java application programming interface}}
{{Out of date}}
{{Multiple issues|
{{more footnotes|date=June 2013}}
{{primary sources|date=June 2013}}
}}
In [[computing]], the '''Java API for XML Processing''' ('''JAXP''') ({{IPAc-en|ˈ|dʒ|æ|k|s|p|iː}} {{respell|JAKS|pee}}), one of the [[Java XML]] [[application programming interface]]s (APIs), provides the capability of validating and parsing [[XML]] documents. It has three basic parsing interfaces:
 
The '''Java API for [[XML]] Processing''', or '''JAXP''' (pronounced ''jaks-p''), is one of the [[Java XML]] [[programming]] [[Application programming interface|API]]s. It provides the capability of validating and parsing [[XML]] documents. The three basic parsing interfaces are:
* the [[Document Object Model]] parsing interface or '''DOM''' interface
* the [[Simple API for XML]] parsing interface or '''SAX''' interface
* the [[StAX | Streaming API for XML]] or '''StAX''' interface (addedpart inof JDK 6; separate jar available for JDK 5)
 
In addition to the parsing interfaces, the API provides an [[XSLT]] interface to provide data and structural transformations on an XML document. JAXP was developed under the [[Java Community Process]] as JSR 5 (JAXP 1.0) and JSR 63 (JAXP 1.1 and 1.2). [[Java Platform, Standard Edition|J2SE]] 1.4 is the first version of Java that comes with an implementation of JAXP. JAXP version 1.4.2 was released on May 30, 2007. JAXP 1.3 was [https://jaxp.dev.java.net/1.3/EndofLife.html end-of-lifed] on February 12, 2008.
 
JAXP was developed under the [[Java Community Process]] as JSR 5 (JAXP 1.0), JSR 63 (JAXP 1.1 and 1.2), and JSR 206 (JAXP 1.3).
{| class="wikitable"
|-
! [[Java Platform, Standard Edition|Java SE]] version !! JAXP version bundled
|-
| 1.4 || 1.1
|-
| 1.5 || 1.3
|-
| 1.6 || 1.4
|-
| 1.7.0 || 1.4.5
|-
| 1.7.40 || 1.5
|-
| 1.8 || 1.6<ref>{{Cite web|url=https://www.jcp.org/en/jsr/detail?id=206|title = The Java Community Process(SM) Program - JSRS: Java Specification Requests - detail JSR# 206}}</ref>
|}
 
JAXP version 1.4.4 was released on September 3, 2010. JAXP 1.3 was declared [https://web.archive.org/web/20110807094938/http://jaxp.java.net/1.3/EndofLife.html end-of-life] on February 12, 2008.
 
== DOM interface ==
{{main article|Document Object Model}}
 
The DOM interface parses an entire XML document and constructs a complete in-memory representation of the document using the classes and modeling the concepts found in the Document Object Model Level 2 Core Specification.
 
The DOM parser is called a {{java|DocumentBuilder}}, as it builds an in-memory <code>Document</code> representation. The {{Javadoc|module=java.xml|package=javax.xml.parsers|class=DocumentBuilder|monotype=y}} is created by the {{Javadoc|module=java.xml|package=javax.xml.parsers|class=DocumentBuilderFactory|monotype=y}}.{{sfn | Horstmann | 2022 | loc=§3.3 Parsing an XML Document}} The {{java|DocumentBuilder}} creates an {{Javadoc:SE|package=org.w3c.dom|org/w3c/dom|Document|module=java.xml}} instance - a tree structure containing nodes in the XML Document. Each tree node in the structure implements the {{Javadoc:SE|package=org.w3c.dom|org/w3c/dom|Node|module=java.xml}} interface. Among the many different types of tree nodes, each representing the type of data found in an XML document, the most important include:
The DOM interface is perhaps the easiest to understand. It parses an entire XML document and constructs a complete in-memory representation of the document using the classes modeling the concepts found in the [http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113 Document Object Model(DOM) Level 2 Core Specification].
 
The DOM parser is called a <code>DocumentBuilder</code>, as it builds an in-memory <code>Document</code> representation. The {{Javadoc:SE|package=javax.xml.parsers|javax/xml/parsers|DocumentBuilder}} is created by the {{Javadoc:SE|package=javax.xml.parsers|javax/xml/parsers|DocumentBuilderFactory}}. The <code>DocumentBuilder</code> creates an {{Javadoc:SE|package=org.w3c.dom|org/w3c/dom|Document}} instance, which is a tree structure containing nodes in the XML Document. Each tree node in the structure implements the {{Javadoc:SE|package=org.w3c.dom|org/w3c/dom|Node}} interface. There are many different types of tree nodes, representing the type of data found in an XML document. The most important node types are:
* element nodes that may have attributes
* text nodes representing the text found between the start and end tags of a document element.
 
Refer to the [[Javadoc]] documentation of the [[Java package]] {{Javadoc:SE|package=org.w3c.dom|org/w3c/dom}} for a complete list of node types.
 
== SAX interface ==
{{main article|Simple API for XML}}
 
The SAX parser is called the {{Javadoc:SE|module=java.xml|package=javax/.xml/.parsers|SAXParserclass=SAXParserFactory|monotype=y}} andcreates isthe createdSAX byparser, called the {{Javadoc:SE|module=java.xml|package=javax.xml.parsers|javax/xml/parsersclass=SAXParser|SAXParserFactorytext=SAXParser|monotype=y}}. Unlike the DOM parser, the SAX parser does not create an in-memory representation of the XML document and so isruns faster and uses less memory. Instead, the SAX parser informs clients of the XML document structure by invoking callbacks, that is, by invoking methods on aan {{Javadoc:SE|module=java.xml|package=org.xml.sax.helpers|org/xml/sax/helpersclass=DefaultHandler|text=DefaultHandler|monotype=y}} instance provided to the parser. This way of accessing document is called [[Streaming XML]].
 
The <code>DefaultHandler</code> class implements the {{Javadoc:SE|module=java.xml|package=org/.xml/.sax|class=ContentHandler|text=ContentHandler|monotype=y}}, the {{Javadoc:SE|module=java.xml|package=org/.xml/.sax|class=ErrorHandler|text=ErrorHandler|monotype=y}}, the {{Javadoc:SE|module=java.xml|package=org/.xml/.sax|class=DTDHandler|text=DTDHandler|monotype=y}}, and the {{Javadoc:SE|module=java.xml|package=org/.xml/.sax|class=EntityResolver|text=EntityResolver|monotype=y}} interfaces. Most clients will be interested in methods defined in the <code>ContentHandler</code> interface that are called when the SAX parser encounters the corresponding elements in the XML document. The most important methods in this interface are:
 
* <code>startDocument()</code> and <code>endDocument()</code> methods that are called at the start and end of an XML document.
* {{Javadoc|module=java.xml|package=org.xml.sax.helpers|class=DefaultHandler|member=startDocument()|text=startDocument()|monotype=y}} and {{Javadoc|module=java.xml|package=org.xml.sax.helpers|class=DefaultHandler|member=endDocument()|text=endDocument()|monotype=y}} methods that are called at the start and end of a XML document.
* <code>startElement()</code> and <code>endElement()</code> methods that are called at the start and end of a document element.
* <code>characters()</code> method that is called with the text data contents contained between the start and end tags of an XML document element.
Line 29 ⟶ 55:
Clients provide a subclass of the <code>DefaultHandler</code> that overrides these methods and processes the data. This may involve storing the data into a database or writing it out to a stream.
 
During parsing, the parser may need to access external documents. It is possible to store a local cache for frequently- used documents using an [[XML Catalog]].
 
This was introduced with Java 1.3 in May of 2000.<ref>Compare the [http://java.sun.com/j2ee/sdk_1.2.1/techdocs/api/index-all.html#_S_ Java 1.2.1 API index] with the [http://java.sun.com/j2ee/sdk_1.3/techdocs/api/index-all.html#_S_ 1.3 index]. The Java Specification Request (JSR) 5, ''XML Parsing Specification'', was finalised on [http://jcp.org/en/jsr/detail?id=5 21 March, 2000].</ref>
 
== StAX interface ==
{{main article|StAX}}
StAX was designed as a median between the DOM and SAX interface. In its metaphor, the programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward - 'pulling' the information from the parser as it needs. This is different from an event based API - such as SAX - which 'pushes' data to the application - requiring the application to maintain state between events as necessary to keep track of ___location within the document.
 
[[StAX]] was designed as a median between the DOM and SAX interface. In its metaphor, the programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward - 'pulling' the information from the parser as it needs. This is different from an event based API - such as SAX - which 'pushes' data to the application - requiring the application to maintain state between events as necessary to keep track of ___location within the document.
 
== XSLT interface ==
{{main article|XSLT}}
 
The '''X'''ML '''S'''tylesheet '''L'''anguage for '''T'''ransformations, or '''[[XSL Transformations|XSLT]]''', allows for conversion of an XML document into other forms of data. JAXP provides interfaces in package <code>{{Javadoc|module=java.xml|package=javax.xml.transform</code>|monotype=y}} allowing applications to invoke an XSLT transformation. This interface was originally called TrAX (Transformation API for XML), and was developed by an informal collaboration between the developers of a number of Java XSLT processors.
 
Main features of the interface are:
 
* a factory class allowing the application to select dynamically which XSLT processor it wishes to use ({{Javadoc|module=java.xml|package=javax.xml.transformer|class=TransformerFactory|text=TransformerFactory|monotype=y}}, {{Javadoc|module=java.xml|package=javax.xml.transform|class=TransformerFactory|member=newInstance()|text=TransformerFactory.newInstance()|monotype=y}}, {{Javadoc|module=java.xml|package=javax.xml.transformer|class=TransformerFactory|member=newInstance(java.lang.String,java.lang.ClassLoader)|text=TransformerFactory.newInstance(String factoryClassName, ClassLoader classLoader)|monotype=y}}.
* a factory class allowing the application to select dynamically which XSLT processor it wishes to use
* methods on the factory class to create a {{Javadoc|module=java.xml|package=javax.xml.transform|class=Templates|text=Templates|monotype=y}} object, representing the compiled form of a stylesheet. This is a thread-safe object that can be used repeatedly, in series or in parallel, to apply the same stylesheet to multiple source documents (or to the same source document with different parameters) ({{Javadoc|module=java.xml|package=javax.xml.transform|class=TransformerFactory|member=newTemplates(javax.xml.transform.Source)|text=TransformerFactory.newTemplates(Source source)|monotype=y}}), also {{Javadoc|module=java.xml|package=javax.xml.transformer|class=TransformerFactory|member=newTransformer(javax.xml.transform.Source)|text=TransformerFactory.newTransformer(Source source)|monotype=y}}, {{Javadoc|module=java.xml|package=javax.xml.transform|class=TransformerFactory|member=newTransformer()|text=TransformerFactory.newTransformer()|monotype=y}}), a method on the {{java|Templates}} object to create a {{Javadoc|module=java.xml|package=javax.xml.transform|class=Transformer|monotype=y}}, representing the executable form of a stylesheet ({{Javadoc|module=java.xml|package=javax.xml.transform|class=Templates|member=newTransformer()|text=Templates.newTransformer()|monotype=y}}) This cannot be shared across threads, though it is serially reusable. The {{java|Transformer}} provides methods to set stylesheet parameters and serialization options (for example, whether output should be indented), and a method to actually run the transformation. ({{Javadoc|module=java.xml|package=javax.xml.transform|class=Transformer|member=transform(javax.xml.transform.Source,javax.xml.transform.Result)|text=TransformerFactory.transformer(Source xmlSource, Result outputTarget)|monotype=y}}).
 
Two abstract interfaces {{Javadoc|module=java.xml|package=javax.xml.transform|class=Source|text=Source|monotype=y}} and {{Javadoc|module=java.xml|package=javax.xml.transform|class=Result|text=Result|monotype=y}} are defined to represent the input and output of the transformation. This is a somewhat unconventional use of Java interfaces, since there is no expectation that a processor will accept any class that implements the interface - each processor can choose which kinds of {{java|Source}} or {{java|Result}} it is prepared to handle. In practice all JAXP processors supports several standard kinds of Source ({{Javadoc|module=java.xml|package=javax.xml.transform|class=DOMSource|text=DOMSource|monotype=y}}, {{Javadoc|module=java.xml|package=javax.xml.transform|class=SAXSource|text=SAXSource|monotype=y}} {{Javadoc|module=java.xml|package=javax.xml.transform|class=StreamSource|text=StreamSource|monotype=y}}) and several standard kinds of Result ({{Javadoc|module=java.xml|package=javax.xml.transform|class=DOMResult|text=DOMResult|monotype=y}}, {{Javadoc|module=java.xml|package=javax.xml.transform|class=SAXResult|text=SAXResult|monotype=y}} {{Javadoc|module=java.xml|package=javax.xml.transform|class=StreamResult|text=StreamResult|monotype=y}}) and possibly other implementations of their own.
* methods on the factory class to create a Templates object, representing the compiled form of a stylesheet. This is a thread-safe object that can be used repeatedly, in series or in parallel, to apply the same stylesheet to multiple source documents (or to the same source document with different parameters)
 
=== Example ===
* a method on the Templates object to create a Transformer, representing the executable form of a stylesheet. This cannot be shared across threads, though it is serially reusable. The Transformer provides methods to set stylesheet parameters and serialization options (for example, whether output should be indented), and a method to actually run the transformation.
The most primitive but complete example of XSLT transformation launching may look like this:<syntaxhighlight lang="java">
/* file src/examples/xslt/XsltDemo.java */
package examples.xslt;
 
import java.io.StringReader;
Two abstract interfaces Source and Result are defined to represent the input and output of the transformation. This is a somewhat unconventional use of Java interfaces, since there is no expectation that a processor will accept any class that implements the interface - each processor can choose which kinds of Source or Result it is prepared to handle. In practice all JAXP processors support the three standard kinds of Source (<code>DOMSource</code>, <code>SAXSource</code>, <code>StreamSource</code>) and the three standard kinds of Result (<code>DOMResult</code>, <code>SAXResult</code>, <code>StreamResult</code>) and possibly other implementations of their own.
import java.io.StringWriter;
 
import javax.xml.transform.Transformer;
== External links ==
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerFactoryConfigurationError;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
 
public class XsltDemo {
public static void main(String[] args) throws TransformerFactoryConfigurationError, TransformerException {
//language=xslt
String xsltResource = """
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version='2.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:output method='xml' indent='no'/>
<xsl:template match='/'>
<reRoot><reNode><xsl:value-of select='/root/node/@val' /> world</reNode></reRoot>
</xsl:template>
</xsl:stylesheet>
""";
// language=XML
String xmlSourceResource = """
<?xml version='1.0' encoding='UTF-8'?>
<root><node val='hello'/></root>
""";
 
StringWriter xmlResultResource = new StringWriter();
 
Transformer xmlTransformer = TransformerFactory.newInstance().newTransformer(
new StreamSource(new StringReader(xsltResource))
);
 
xmlTransformer.transform(
new StreamSource(new StringReader(xmlSourceResource)), new StreamResult(xmlResultResource)
);
 
System.out.println(xmlResultResource.getBuffer().toString());
}
}
</syntaxhighlight>
It applies the following hardcoded <!-- literal --> XSLT transformation:<syntaxhighlight lang="xml">
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version='2.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:output method='xml' indent='no'/>
<xsl:template match='/'>
<reRoot><reNode><xsl:value-of select='/root/node/@val' /> world</reNode></reRoot>
</xsl:template>
</xsl:stylesheet>
</syntaxhighlight>
To the following hardcoded <!-- literal --> XML document:<syntaxhighlight lang="xml">
<?xml version='1.0' encoding='UTF-8'?>
<root><node val='hello'/></root>
</syntaxhighlight>
The result of execution will be<syntaxhighlight lang="xml">
<?xml version="1.0" encoding="UTF-8"?><reRoot><reNode>hello world</reNode></reRoot>
</syntaxhighlight>
 
== Citations ==
* [http://java.sun.com/webservices/jaxp/ Sun's JAXP product description]
{{Reflist}}
* [http://www.jcp.org/en/jsr/detail?id=63 JSR 63] (JAXP 1.1 and 1.2)
* [http://www.jcp.org/en/jsr/detail?id=5 JSR 5] (JAXP 1.0)
* [http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113 Document Object Model(DOM) Level 2 Core Specification]
* Sample programs using the DOM and SAX parser [http://totheriver.com/learn/xml/xmltutorial.html Tutorial: XML with Xerces for Java]
* [http://www.ibm.com/developerworks/xml/library/x-xjavaforum4.html Sun's Java and XML APIs: Helping or hurting?]
* [http://xml.apache.org/xalan-j/trax.html JAXP/TrAX introduction on the Apache XML web site]
 
== References ==
* {{cite book | last=Horstmann | first=Cay | title=Core Java | publisher=Oracle Press Java | date=April 15, 2022 | isbn=978-0-13-787107-0}}
<references />
 
== External links ==
*[http://jaxp.java.net JAXP Reference Implementation Project Home Page] {{Webarchive|url=https://web.archive.org/web/20110812151256/http://jaxp.java.net/ |date=2011-08-12 }}
 
{{Jakarta EE}}
 
<!-- Categories -->
[[Category:Java API for XML]]
[[Category:Java specification requests|XML Processing]]
[[Category:XML parsers]]
 
<!-- Interwikis -->
[[de:Java API for XML Processing]]
[[es:JAXP]]
[[fr:Java API for XML Processing]]
[[ko:JAXP]]
[[nl:Java API for XML Processing]]
[[ja:Java API for XML Processing]]
[[ru:JAXP]]
[[vi:JAXP]]