XML structure

This page aims to provide general tips for handling XML documents for masking, including editing XML structures and leveraging XPath to target data with precision. With the creation and manipulation of XML file formats, masked data should be handled per requirements.

Understanding XML structure

An XML document is both human-readable and machine-readable, which allows it to serve as a common medium for information exchange across diverse systems.

Definitions

Prolog (optional): The prolog appears at the beginning of the XML document and contains metadata about the document itself, such as the XML version and the character encoding (e.g., <?xml version="1.0" encoding="UTF-8"?>).
Elements: Elements are the building blocks of XML documents, denoted by tags. An element can contain text, other elements, or a mix of both. Elements are used to encase data points in a document, and typically consist of a start tag, content, and an end tag (e.g., <name>John Doe</name>).
Attributes: Attributes provide additional information about elements. They are included within the start tag of an element and usually come in name/value pairs (e.g., <postcode id="12345"/>).
Root Element: Every XML document must contain a single root element that encases all other elements. The root element provides a container for all data in the document to enforce a hierarchical structure.

Hierarchical structure

XML documents are inherently hierarchical, a feature that allows them to represent complex data structures effectively.

Parent and child elements: Elements nested within other elements create parent-child relationships. This structure allows XML to represent complex data relationships naturally (e.g., a Person element might contain FirstName, LastName, and ContactDetails as child elements).
Sibling elements: Elements that are at the same level of the hierarchy and share the same parent are called siblings. Sibling elements often represent similar types of data or repeated elements in a list (e.g., multiple Person elements within a People root element).

Use of XML in data masking

Masking operations on XML files typically involve modifying the content of elements or attributes to obfuscate sensitive data while maintaining the structural integrity of the document. Using XML's hierarchical nature, you can selectively apply masking rules to specific parts of the document without disrupting its overall format, to keep the masked data useful for testing or development purposes.

This structured approach not only helps in maintaining the logical grouping of data but also ensures that data masking can be done efficiently and effectively, targeting only those elements that contain sensitive information.

XML example

XML

<Person>
  <First_Name>John</First_Name>
  <Last_Name>Doe</Last_Name>
  <DOB>1968-11-24</DOB>
  <State></State>
  <Postcode id=""/>
</Person>

Understanding XPath

XPath stands for XML Path Language, designed to use queries for selecting nodes from an XML document.

Expressions: XPath uses path expressions to navigate through elements and attributes in an XML document.
Nodes: In XPath, everything is treated as nodes, including elements, attributes, and even text.
@: In XPath, this symbol is used to select attributes. For example, @id selects the id attribute of the context node.
- To select the name attribute of an employee element, you would use the XPath expression /employee/@name.

XPath example

CODE

 /Person
 /Person/First_Name
 /Person/Last_Name
 /Person/DOB
 /Person/State
 /Person/Postcode
 /Person/Postcode@id