XML
XML (Extensible Markup Language) is a markup language derived from SGML (Standard Generalized Markup Language). It is used to transport data in a human-readable format. XML documents are structured as nodes or tags, often organized hierarchically like nested Russian dolls or a tree structure.
Example XML Document
<?xml version="1.0" encoding="UTF-8"?>
<user id="1">
<name>John</name>
<age>30</age>
<address>
<street>123 Main St</street>
<city>Anytown</city>
</address>
</user>
Common Uses
Data exchange between systems
Storage and configuration in web applications
Defining document structures
XSLT (Extensible Stylesheet Language Transformations)
XSLT is a language used to transform XML documents into other formats (e.g., XML, HTML, or other SGML documents). Uploading XSLT files to a server can potentially allow manipulation of XML documents during rendering.
DTDs (Document Type Definitions)
DTDs define the structure and format of XML files, including element declarations and attribute rules. They can also define entities, which makes them a potential vector for XXE (XML External Entity) attacks.
XML Entities
Entities are placeholders for data or code in XML documents. There are five types of XML entities:
Internal Entities Defined in the DTD and reference local content. Example:
<!DOCTYPE note [ <!ENTITY inf "This is a test."> ]> <note> <info>&inf;</info> </note>
External Entities Defined in the DTD and reference external URLs or files. Example:
<!DOCTYPE note [ <!ENTITY ext SYSTEM "http://example.com/external.dtd"> ]> <note> <info>&ext;</info> </note>
Parameter Entities Used within DTDs to define reusable structures or include external DTD subsets. Example:
<!DOCTYPE note [ <!ENTITY % common "CDATA"> <!ELEMENT name (%common;)> ]> <note> <name>John Doe</name> </note>
General Entities Declare pieces of reusable text. Example:
<!DOCTYPE note [ <!ENTITY author "John Doe"> ]> <note> <writer>&author;</writer> </note>
Character Entities Represent special or reserved characters that cannot be used directly in XML. Examples:
<
for<
(less-than symbol)>
for>
(greater-than symbol)&
for&
(ampersand)
Common XML Parsers
Different programming environments use various XML parsers, each with unique characteristics. Some examples include:
DOM (Document Object Model) Parser
Builds the entire XML document into a memory-based tree structure.
Resource-intensive but provides random access to all document parts.
SAX (Simple API for XML) Parser
Parses XML data sequentially without loading the whole document into memory.
Suitable for large XML files but lacks random access flexibility.
StAX (Streaming API for XML) Parser
Parses XML documents in a streaming fashion.
Provides more control over parsing compared to SAX.
XPath Parser
Parses XML documents based on XPath expressions.
Commonly used in conjunction with XSLT.
References
Last updated