Swiftpack.co - stefanspringer1/SwiftXML as Swift Package

Swiftpack.co is a collection of thousands of indexed Swift packages. Search packages.
See all packages published by stefanspringer1.
stefanspringer1/SwiftXML 1.2.405
A library written in Swift to process XML
⭐️ 11
🕓 2 weeks ago
iOS macOS watchOS tvOS
.package(url: "https://github.com/stefanspringer1/SwiftXML.git", from: "1.2.405")

SwiftXML

A library written in Swift to process XML.

This library is published under the Apache License v2.0 with Runtime Library Exception.

let transformation = XTransformation {

    XRule(forElements: "table") { table in
        table.insertNext {
            XElement("caption") {
                "Table: "
                table.children({ $0.name.contains("title") }).content
            }
        }
    }

    XRule(forElements: "tbody", "tfoot") { tablePart in
        tablePart
            .children("tr")
            .children("th")
            .forEach { cell in
                cell.name = "td"
            }
    }

}

NOTE

This library is not in a “final” state yet despite its high version number, i.e. there might still be bugs, or some major improvements will be done, and breaking changes might happen without the major version getting augmented. Addionally, there will be more comments in the code. Also, when such a final state is reached, the library might be further developed using a new repository URL (and the version number set back to a lower one). Further notice will be added here. See there for contact information.

We plan for a final release in early 2024. (This library will then already be used in a production environment.) For all who are already been interested in this library, thank you for your patience!

UPDATE 1 (May 2023): We changed the API a little bit recently (no more public XSpot, but you can set isolated for XText) and fixed some problems and are currently working on adding more tests to this library and to the SwiftXMLParser.

UPDATE 2 (July 2023): In order to keep the XML tree small we removed the ability to directly access the attributes of a certain name in a document, and accordingly also to formulate rules for attributes (rules for attributes were rarely used in applications). Instead of directly accessing attributes of certain names, you will have to inspect the descendants of a document (if not catching according events during parsing), maybe saving the result. An easier replacement for the lost functionality will be available when we add a validation tool: When using an appropriate schema you will then be able to look up which elements – according to the schema – could have a certain attribute set, and you can then access these elements directly.

UPDATE 3 (July 2023): Renamed havingProperties to conformingTo.

UPDATE 4 (July 2023): The namespace handling is now in a conclusive state, see the new section about limitations of the XML input and the changed section on how to handle XML namespaces.

UPDATE 5 (July 2023): In order to further streamline the library, the functionality for tracking changes (of attributes) was removed. In most cases when you have to track changes you need a better way of setting those attributes, so there was a burden whenever setting attributes, but without much use.

UPDATE 6 (August 2023): Renamed conformingTo to when.

UPDATE 7 (August 2023): In order to conform to some type checks in Swift 5.9, we have to demand macOS 13, iOS 16, tvOS 16, or watchOS 9 for Apple platforms.

UPDATE 8 (August 2023): Renamed applying to with.

UPDATE 9 (September 2023): Renamed with to applying again. Renamed when to fullfilling. Renamed hasProperties to fullfills. Their implementations for a single items is now done via protocols.

UPDATE 10 (October 2023): Instead of element(ofName:) use element(_:) to better match the other methods that take names.

UPDATE 11 (October 2023): Instead of XProduction, XProductionTemplate and XActiveProduction are now used, see the updated description below.

UPDATE 11 (October 2023): Dropping the “X” prefix for implementations of XProductionTemplate and XActiveProduction.

UPDATE 12 (October 2023): XNode.write(toFile:) is renamed to XNode.write(toPath:), and XNode.write(toFileHandle:) is renamed to XNode.write(toFile:).

UPDATE 13 (December 2023): texts is renamed to immediateTexts so as not to confuse it with allTexts, and text is renamed to allTextsCollected. immediateTextsCollected and the allTextsReversed variants are added.

UPDATE 14 (December 2023): The subscript notation with integer values for a sequence of XContent, XElement, or XText now starts counting at 1.

UPDATE 15 (December 2023): immediateTextsCollected is removed.

UPDATE 16 (December 2023): The method child(...) is renamed to firstChild(...).

UPDATE 17 (December 2023): Added some tracing capabilities for complex transformations.

UPDATE 18 (January 2024): XContentLike is renamed to XContentConvertible. When using SwiftXML, a new type can conform to XContentConvertible and as such then can be inserted as XML. The asContent property is not necessary any more and is removed, and ... as XContentConvertible (previously ... as XContentLike) should also not be necessary any more.

UPDATE 19 (March 2024): description add quotation marks for XText.


Related packages

When using SwiftXML in the context of the SwiftWorkflow framework, you might include the WorkflowUtilitiesForSwiftXML.

Properties of the library

The library reads XML from a source into an XML document instance, and provides methods to transform (or manipulate) the document, and others to write the document to a file.

The library should be efficient and applications that use it should be very intelligible.

Limitations of the XML input

  • The encoding of the source must be UTF-8 (ASCII is considered as a subset of it). The parser checks for correct UTF-8 encoding and also checks (according to the data available to the currently used Swift implementation) if a found codepoint is a valid Unicode codepoint.
  • For easier processing, declarations of namespace prefixes via xmlns:... attributes should only be at the root element.

Manipulation of an XML document

Other than some other libraries for XML, the manipulation of the document as built in memory is “in place”, i.e. no new XML document is built. The goal is to be able to apply many isolated manipulations to an XML document efficiently. But it is always possible to clone a document easily with references to or from the old version.

The following features are important:

  • All iteration over content in the document using the according library functions are lazy by default, i.e. the iteration only looks at one item at a time and does not (!) collect all items in advance.
  • While lazily iterating over content in the document in this manner, the document tree can be changed without negatively affecting the iteration.
  • Elements of a certain name can be efficiently found without having to traverse the whole tree. An according iteration proceeds in the order by which the elements have been added to the document. When iterating in this manner, newly added elements are then also processed as part of the same iteration.

The following code takes any <item> with an integer value of multiply larger than 1 and additionally inserts an item with a multiply number one less, while removing the multiply value on the existing item (the library will be explained in more detail in subsequent sections):

let document = try parseXML(fromText: """
<a><item multiply="3"/></a>
""")

document.elements("item").forEach { item in
    if let multiply = item["multiply"], let n = Int(multiply), n > 1 {
        item.insertPrevious {
            XElement("item", ["multiply": n > 2 ? String(n-1) : nil])
        }
        item["multiply"] = nil
    }
}

document.echo()

The output is:

<a><item/><item/><item/></a>

Note that in this example – just to show you that it works – each new item is being inserted before the current node but is then still being processed.

The elements returned by an iteration can even be removed without stopping the (lazy!) iteration:

let document = try parseXML(fromText: """
<a><item id="1" remove="true"/><item id="2"/><item id="3" remove="true"/><item id="4"/></a>
""")

document.traverse { content in
    if let element = content as? XElement, element["remove"] == "true" {
        element.remove()
    }
}

document.echo()

The output is:

<a><item id="2"/><item id="4"/></a>

Of course, since those iterations are regular sequences, all according Swift library functions like map and filter can be used. But in many cases, it might be better to use conditions on the content iterators (see the section on finding related content with filters) or chaining of content iterators (see the section on chained iterators).

The user of the library can also provide sets of rules to be applied (see the code at the beginning and a full example in the section about rules). In such a rule, the user defines what to do with an element or attribute with a certain name. A set of rules can then be applied to a document, i.e. the rules are applied in the order of their definition. This is repeated, guaranteeing that a rule is only applied once to the same object (if not fully removed from the document and added again, see the section below on document membership), until no more application takes places. So elements can be added during application of a rule and then later be processed by the same or another rule.

Other properties

The library uses the SwiftXMLParser to parse XML which implements the according protocol from SwiftXMLInterfaces.

Depending on the configuration of the parse process, all parts of the XML source can be retained in the XML document, including all comments and parts of an internal subset e.g. all entity or element definitions. (Elements definitions and attribute list definitions are, besides their reported element names, only retained as their original textual representation, they are not parsed into any other representation.)

In the current implementation, the XML library does not implement any validation, i.e. validation against a DTD or other XML schema, telling us e.g. if an element of a certain name can be contained in an element of another certain name. The user has to use other libraries (e.g. Libxml2Validation) for such validation before reading or after writing the document. Besides validating the structure of an XML document, validation is also important for knowing if the occurrence of a whitespace text is significant (i.e. should be kept) or not. (E.g., whitespace text between elements representing paragraphs of a text document is usually considered insignificant.) To compensate for that last issue, the user of the library can provide a function that decides if an instance of whitespace text between elements should be kept or not. Also, possible default values of attributes have to be set by the user if desired once the document tree is built.

This library gives full control of how to handle entities. Named entity references can persist inside the document event if they are not defined. Named entity references are being scored as internal or external entity references during parsing, the external entity references being those which are referenced by external entity definitions in the internal subset inside the document declaration of the document. Replacements of internal entity references by text can be done automatically according to the internal subset and/or controlled by the application.

Automated inclusion of the content external parsed entities can be configurated, the content might then be wrapped by elements with according information of the enities.

Elements or attributes with namespace prefixes are given the full name “prefix:unprefixed". See the section on handling of namespaces for motivation and about how to handle namespaces.

For any error during parsing an error is thrown and no document is then provided.

An XML tree (e.g. a document) must not be examined or changed concurrently.


NOTE

The description of the library that follows might not include all types and methods. Please see the documentation produced by DocC or use autocompletion in an according integrated development environment (IDE).


Reading XML

The following functions take a source and return an XML document instance (XDocument). The source can either be provided as a URL, a path to a file, a text, or binary data.

Reading from a URL which references a local file:

func parseXML(
    fromURL: URL,
    sourceInfo: String?,
    textAllowedInElementWithName: ((String) -> Bool)?,
    internalEntityAutoResolve: Bool,
    internalEntityResolver: InternalEntityResolver?,
    insertExternalParsedEntities: Bool,
    externalParsedEntitySystemResolver: ((String) -> URL?)?,
    externalParsedEntityGetter: ((String) -> Data?)?,
    externalWrapperElement: String?,
    keepComments: Bool,
    keepCDATASections: Bool,
    eventHandlers: [XEventHandler]?
) throws -> XDocument

And accordingly:

func parseXML(
    fromPath: String,
    ...
) throws -> XDocument
func parseXML(
    fromText: String,
    ...
) throws -> XDocument
func parseXML(
    fromData: Data,
    ...
) throws -> XDocument

If you want to be indifferent about which kind of source to process, use XDocumentSource for the source definition and use:

func parseXML(
    from: XDocumentSource,
    ...
) throws -> XDocument

The optional textAllowedInElementWithName method gets the name of the surrounding element when text is found inside an element and should notify whether text is allowed in the specific context. If not, the text is discarded is it is whitespace. If no text is allowed in the context but the text is not whitespace, an error is thrown. If you need a more specific context than the element name to decide if text is allowed, use an XEventHandler to track more specific context information.

All internal entity references in attribute values have to be replaced by text during parsing. In order to achieve this (in case that internal entity references occur at all in attribute values in the source), an InternalEntityResolver can be provided. An InternalEntityResolver has to implement the following method:

func resolve(
    entityWithName: String,
    forAttributeWithName: String?,
    atElementWithName: String?
) -> String?

This method is always called when a named entity reference is encountered (either in text or attribute) which is scored as an internal entity. It returns the textual replacement for the entity or nil. If the method returns nil, then the entity reference is not replaced by a text, but is kept. In the case of a named entity in an attribute value, an error is thrown when no replacement is given. The function arguments forAttributeWithName (name of the attribute) and atElementWithName (name of the element) have according values if and only if the entity is encountered inside an attribute value.

If internalEntityAutoResolve is set to true, the parser first tries to replace the internal entities by using the declarations in the internal subset of the document before calling an InternalEntityResolver.

The content of external parsed entities are not inserted by default, but they are if you set insertExternalParsedEntities to true. You can provides a method in the argument externalParsedEntitySystemResolver to resolved the system identitfier of the external parsed entity to an URL. You can also provide a method in the argument externalParsedEntityGetter to get the data for the system identifier (if externalParsedEntitySystemResolver is provided, then externalParsedEntitySystemResolver first has to return nil). At the end the system identifier is just added as path component to the source URL (if it exists) and the parser tries to load the entity from there.

When the content of an external parsed entitiy is inserted, you can declare an element name externalWrapperElement: the inserted content then gets wrapped into an element of that name with the information about the entity in the attributes name, systemID, and path (path being optional, as an external parsed entity might get resolved without an explicit path). (During later processing, you might want to change this representation, e.g. if the external parsed entity reference is the only content of an element, you might replace the wrapper by its content and set the according information as some attachments of the parent element, so validation of the document succeeds.)

One a more event handlers can be given a parseXML call, which implement XEventHandler from XMLInterfaces. This allows for the user of the library to catch any event during parsing like entering or leaving an element. E.g., the resolving of an internal entity reference could depend on the location inside the document (and not only on the name of the element or attribute), so this information can be collected by such an event handler.

keepComments (default: false) decides if a comment should be preserved (as XComment), else they will be discarded without notice. keepCDATASections (default: false) decides if a CDATA section should be preserved (as XCDATASection), else all CDATA sections get resolved as text.

Content of a document

An XML document (XDocument) can contain the following content:

  • XElement: an element
  • XText: a text
  • XInternalEntity: an internal entity reference
  • XExternalEntity: an external entity reference
  • XCDATASection: a CDATA section
  • XProcessingInstruction: a processing instruction
  • XComment: a comment
  • XLiteral: containing text that is meant to be serialized “as is”, i.e. no escaping e.g. of < and & is done, it could contain XML code that is to be serialized literally, hence its name

XLiteral is never the result of parsing XML, but might get added by an application. Subsequent XLiteral content is (just like XText, see the section on handling of text) always automatically combined.

Those content are of type type XContent, whereas the more general type XNode might be content or an XDocument.

The following is read from the internal subset:

  • XInternalEntityDeclaration: an internal entity declaration
  • XExternalEntityDeclaration: an external entity declaration
  • XUnparsedEntityDeclaration: a declaration of an unparsed external entity
  • XNotationDeclaration: a notation declaration
  • XParameterEntityDeclaration: a parameter entity declaration
  • XElementDeclaration: an element declaration
  • XAttributeListDeclaration: an attribute list declaration

They can be accessed via property declarationsInInternalSubset.

A document gets the following additional properties from the XML source (some values might be nil:

  • encoding: the encoding from the XML declaration
  • publicID: the public identifier from the document type declaration
  • sourcePath: the source to the XML document
  • standalone: the standalone value from the XML declaration
  • systemID: the system identifier from the document type declaration
  • xmlVersion: the XML version from the XML declaration

When not set explicitely in the XML source, some of those values are set to a sensible value.

Displaying XML

When printing a content via print(...), only a top-level represenation like the start tag is printed and never the whole tree. When you would like to print the whole tree or document, use:

func echo(pretty: Bool, indentation: String, terminator: String)

pretty defaults to false; if it is set to true, linebreaks and spaces are added for pretty print. indentation defaults to two spaces, terminator defaults to "\n", i.e. a linebreak is then printed after the output.

With more control:

func echo(usingProductionTemplate: XProductionTemplate, terminator: String)

Productions are explained in the next section.

When you want a serialization of a whole tree or document as text (String), use the following method:

func serialized(pretty: Bool) -> String

pretty again defaults to false and has the same effect.

With more control:

func serialized(usingProductionTemplate: XProductionTemplate) -> String

Do not use serialized to print a tree or document, use echo instead, because using echo is more efficient in this case.

Writing XML

Any XML node (including an XML document) can be written, including the tree of nodes that is started by it, via the following methods.

func write(toURL: URL, usingProductionTemplate: XProductionTemplate) throws
func write(toPath: String, usingProductionTemplate: XProductionTemplate) throws
func write(toFile: FileHandle, usingProductionTemplate: XProductionTemplate) throws
func write(toWriter: Writer, usingProductionTemplate: XProductionTemplate) throws

You can also use the WriteTarget protocol to allow all the above possiblities:

func write(to writeTarget: WriteTarget, usingProductionTemplate: XProductionTemplate) throws

By the argument usingProductionTemplate: you can define a production, i.e. details of the serialization, e.g. if linebreaks are inserted to make the result look pretty. Its value defaults a an instance of XActiveProductionTemplate, which will give a a standard output.

The definition of such a production comes in two parts, a template that can be initialized with values for a further configuration of the serialization, and an active production which is to be applied to a certain target. This way the user has the ability to define completely what the serialization should look like, and then apply this definition to one or several serializations. In more detail:

A XProductionTemplate has a method activeProduction(for writer: Writer) -> XActiveProduction which by using the writer initializes an XActiveProduction where the according events trigger a writing to the writer. The configuration for such a production are to be provided via arguments to the initializer of the XProductionTemplate.

So an XActiveProduction defines how each part of the document is written, e.g. if > or " are written literally or as predefined XML entities in text sections. The production in the above function calls defaults to an instance of DefaultProductionTemplate which results in instances of ActiveDefaultProduction. ActiveDefaultProduction should be extended if only some details of how the document is written are to be changed. The productions ActivePrettyPrintProduction (which might be used by defining an PrettyPrintProductionTemplate) and ActiveHTMLProduction (which might be used by defining an HTMLProductionTemplate) already extend ActiveDefaultProduction, which might be used to pretty-print XML or output HTML. (Note that HTMLProductionTemplate can be given a NamespaceReference to consider a possible namespace prefix for the HTML elements.) But you also extend one of those classes youself, e.g. you could override func writeText(text: XText) and func writeAttributeValue(name: String, value: String, element: XElement) to again write some characters as named entity references. Or you just provide an instance of DefaultProduction itself and change its linebreak property to define how line breaks should be written (e.g. Unix or Windows style). You might also want to consider func sortAttributeNames(attributeNames: [String], element: XElement) -> [String] to sort the attributes for output.

Example: write a linebreak before all elements:

class MyProduction: DefaultProduction {

    override func writeElementStartBeforeAttributes(element: XElement) throws {
        try write(linebreak)
        try super.writeElementStartBeforeAttributes(element: element)
    }

}

try document.write(toFile: "myFile.xml", usingProduction: MyProduction())

For generality, the following method is provided to apply any XActiveProduction to a node and its contained tree:

func applyProduction(activeProduction: XActiveProduction) throws

Cloning and document versions

Any node (including an XML document) can be cloned, including the tree of nodes that is started by it, using the following method:

func clone() -> XNode

(The result will be more specific if the subject is known to be more specific.)

Any content and the document itself possesses the property backLink that can be used as a relation between a clone and the original node. If you create a clone by using the clone() method, the backLink value of a node in the clone points to the original node. So when working with a clone, you can easily look at the original nodes.

Note that the backLink reference references the original node weakly, i.e. if you do not save a reference to the original node or tree then the original node disapears and the backLink property will be nil.

If you would like to use cloning to just save a version of your document to a copy, use its following method:

func makeVersion()

In that case a clone of the document will be created, but with the backLink property of an original node pointing to the clone, and the backLink property of the clone will point to the old backLink value of the original node. I.e. if you apply saveVersion() several times, when following the backLink values starting from a node in your original document, you will go through all versions of this node, from the newer ones to the older ones. The backLinks property gives you exactly that chain of backlinks. Other than when using clone(), a strong reference to such a document version will be remembered by the document, so the nodes of the clone will be kept. Use forgetVersions(keeping:Int) on the document in order to stop this remembering, just keeping the last number of versions defined by the argument keeping (keeping defaults to 0). In the oldest version then still remembered or, if no remembered version if left, in the document itself all backLink values will then be set to nil.

The finalBackLink property follows the whole chain of backLink values and gives you the last value in this chain.

Sometimes, only a “shallow” clone is needed, i.e. the node itself without the whole tree of nodes with the node as root. In this case, just use:

func shallowClone(forwardref: Bool) -> XNode

The backLink is then set just like when using clone().

Content properties

Source range

If the parser (as it is the case with the SwiftXMLParser) reports the where a part of the document it is in the text (i.e. at what line and column it starts and at what line and column it ends), the property sourceRange: XTextRange (using XTextRange from SwiftXMLInterfaces) returns it for the respective node:

Example:

let document = try parseXML(fromText: """
<a>
    <b>Hello</b>
</a>
""", textAllowedInElementWithName: { $0 == "b" })

document.allContent.forEach { content in
    if let sourceRange = content.sourceRange {
        print("\(sourceRange): \(content)")
    }
    else {
        content.echo()
    }
}

Output:

1:1 - 3:4: <a>
2:5 - 2:16: <b>
2:8 - 2:12: Hello

Element names

Element names can be read and set by the using the property name of an element. After setting of a new name different from the existing one, the element is registered with the new name in the document, if it is part of a document. Setting the same name does not change anything (it is an efficient non-change).

Text

For a text content (XText) its text can be read and set via its property value. So there is no need to replace a XText content by another to change text. Please also see the section below on handling of text.

Changing and reading attributes

The attributes of an element can be read and set via the “index notation”. If an attribute is not set, nil is returned; reversely, setting an attribute to nil results in removing it. Setting an attribute with a new name or removing an attribute changes the registering of attributes in the document, if the element is part of a document. Setting a non-nil value of an attribute that already exists is an efficient non-change concerning the registering if attributes.

Example:

// setting the "id" attribute to "1":
myElement["id"] = "1"

// reading an attribute:
if let id = myElement["id"] {
    print("the ID is \(id)")
}

You can also get a sequence of attribute values (optional Strings) from a sequence of elements.

Example:

let document = try parseXML(fromText: """
    <test>
      <b id="1"/>
      <b id="2"/>
      <b id="3"/>
    </test>
    """)
print(document.children.children["id"].joined(separator: ", "))

Result:

1, 2, 3

If you want to get an attribute value and at the same time remove the attribute, use the method pullAttribute(...) of the element.

To get the names of all attributes of an element, use:

var attributeNames: [String]

Note that you also can a (lazy) sequence of the attribute values of a certain attribute name of a (lazy) sequence of elements by using the same index notation:

print(myElement.children("myChildName")["myAttributeName"].joined(separator: ", "))

Attachments

All nodes can have “attachments”. Those are objects that can be attached via a textual key. Those attachments are not considered as belonging to the formal XML tree.

Those attachements are realized as a dictionary attached as a member of each node.

You can also set attachments immediately when creating en element or a document by using the argument attached: of the initializer. (Note that in this argument, some values might be nil for convenience.)

XPath

Get the XPath of a node via:

var xPath: String

Traversals

Traversing a tree depth-first starting from a node (including a document) can be done by the following methods:

func traverse(down: (XNode) throws -> (), up: ((XNode) throws -> ())? = nil) rethrows
func traverse(down: (XNode) async throws -> (), up: ((XNode) async throws -> ())? = nil) async rethrows

For a “branch”, i.e. a node that might contain other nodes (like an element, opposed to e.g. text, which does not contain other nodes), when returning from the traversal of its content (also in the case of an empty branch) the closure given the optional up: argument is called.

Example:

document.traverse { node in
    if let element = node as? XElement {
        print("entering element \(element.name)")
    }
}
up: { node in
    if let element = node as? XElement {
        print("leaving element \(element.name)")
    }
}

Note that the root of the traversal is not to be removed during the traversal.

Direct access to elements

As mentioned and the general description, the library allows to efficiently find elements of a certain name in a document without having to traverse the whole tree.

Finding the elements of a certain name:

func elements(_: String) -> XElementsOfSameNameSequence

Example:

myDocument.elements("paragraph").forEach { paragraph in
    if let id = paragraph["id"] {
        print("found paragraph with ID \"\(ID)\"")
    }
}

Find the elements of several name alternatives by using several names in elements(_:). Note that just like the methods for single names, what you add during the iteration will then also be considered.

Finding related content

Starting from some content, you might want to find related content, e.g. its children. The names chosen for the accordings methods come from the idea that all content have a natural order, namely the order of a depth-first traversal, which is the same order in which the content of an XML document is stored in a text file. This order gives a meaning to method names such a nextTouching. Note that, other than for the iterations you get via elements(_:), even nodes that stay in the same document can occur in such an iteration sevaral times if moved accordingly during the iteration.

Sequences returned are always lazy sequences, iterating through them gives items of the obvious type. As mentioned in the general description of the library, manipulating the XML tree during such an iteration is allowed.

Finding the document the node is contained in:

var document: XDocument?

Finding the parent element:

var parent: XElement?

All its ancestor elements:

var ancestors: XElementSequence

Get the first content of a branch:

var firstContent: XContent?

Get the last content of a branch:

var lastContent: XContent?

If there is exactly one node contained, get it, else get nil:

var singleContent: XContent?

The direct content of a document or an element (“direct” means that their parent is this document or element):

var content: XContentSequence

The direct content that is an element, i.e. all the children:

var children: XElementSequence

The direct content that is text:

var immediateTexts: XTextSequence

For the content and children sequences, there also exist the sequences contentReversed, childrenReversed, and immediateTextsReversed which iterate from the last corresponding item to the first.

All content in the tree of nodes that is started by the node itself, without the node itself, in the order of a depth-first traversal:

var allContent: XContentSequence

All content in the tree of nodes that is started by the node, starting with the node itself:

var allContentIncludingSelf: XContentSequence

All texts in the tree:

var allTexts: XTextSequence

The descendants, i.e. all content in the tree of nodes that is started by the node, without the node itself, that is an element:

var descendants: XElementSequence

If a node is an element, the element itself and the descendants, starting with the element itself:

var descendantsIncludingSelf: XElementSequence

All texts in the tree of nodes that is started by the node itself, without the node itself, in the order of a depth-first traversal:

var allTexts: XTextSequence

The same but only for the nodes contained as direct content:

var immediateTexts: XTextSequence

The (direct) content of an branch (element or document) are “siblings” to each other.

The content item previous to the subject:

var previousTouching: XContent?

The content item next to the subject:

var nextTouching: XContent?

(Note that for autocompletion it might be better to start type “touch...” instead of “prev...” or “next...”.)

You might also just be interested if a previous or next node exists:

var hasPrevious: Bool
var hasNext: Bool

The following very short method names previous and next actually mean “the previous content” and “the next content”, repectively. Those method names are chosen to be so short because they are such a common use case.

All nodes previous to the node (i.e. the previous siblings) on the same level, i.e. of the same parent, in the order from the node:

var previous: XContentSequence

Of those, the ones that are elements:

var previousElements: XElementSequence

Analogously, the content next to the node:

var next: XContentSequence

Of those, the ones that are elements:

var nextElements: XElementSequence

Example:

myElement.descendants.forEach { descendant in
    print("the name of the descendant is \(descendant.name)")
}

Note that a sequence might be used several times:

let document = try parseXML(fromText: """
<a><c/><d/><e/></a>
""")

let insideA = document.children.children

insideA.echo()
print("again:")
insideA.echo()

Output:

<c/>
<d/>
<e/>
again:
<c/>
<d/>
<e/>

Once you have such a sequence, you can get the first item in the sequence via its property first (which is introduced by this package in addition to the already defined first(where:)).

The usual methods of sequences can be used. E.g., use mySequence.dropFirst(n) to drop the first n items of the sequence mySequence. E.g. to get the third item of the sequence, use ``mySequence.dropFirst(2).first`.

Note that there is no property getting you the last item of those sequences, as it would be quite inefficient. Better use contentReversed or childrenReversed in combination with first.

Test if something exists in a sequence by using exist:

var exist: Bool

Note that after using exist, you can still iterate normally along the same sequence, without loosing an item.

Test if nothing exists in a sequence by using absent:

var absent: Bool

If you would like to test if certain items exist, and many cases you would also then use those items. The property existing of a sequence of content or elements returns the sequence itself if items exist, and nil otherwise:

var existing: XContentSequence?
var existing: XElementSequence?

In the following example, a sequence is first tested for existing items and, if items exist, then used:

let document = try parseXML(fromText: """
<a><c/><b id="1"/><b id="2"/><d/><b id="3"/></a>
""")

if let theBs = document.descendants("b").existing {
    theBs.echo()
}

Note that what you get by using existing still is a lazy sequence, i.e. if you change content between the existing test and using its result, then there might be no more items left to be found.

You may also ask for the previous or next content item in the tree, in the order of a depth-first traversal. E.g. if a node is the last node of a subtree starting at a certain element and the element has a next sibling, this next sibling is “the next node in the tree” for that last node of the subtree. Getting the next or previous node in the tree is very efficient, as the library keep track of them anyway.

The next content item in the tree:

var nextInTreeTouching: XContent?

The previous content item in the tree:

var previousInTreeTouching: XContent?

Find all text contained the tree of a node and compose them into a single String:

var allTextsCollected: String

You may use these text collecting properties even when you know that there is only one text to be “collected”, this case is efficiently implemented.

You might also turn a single content item or, more specifically, an element into an appropriate sequence using the following methods:

For any content:

var asSequence: XContentSequence

For an element:

var asElementSequence: XElementSequence

(These two methods are used in the tests of the library.)

Finding related nodes with filters

All of the methods in the previous section that return a sequence also allow a condition as a first argument for filtering. We distinguish between the case of all items of the sequence fullfilling a condition, the case of all items while a condition is fullfilled, and the case of all items until a condition is fullfilled (excluding the found item where the condition fullfilled):

func content((XContent) -> Bool) -> XContentSequence
func content(while: (XContent) -> Bool) -> XContentSequence
func content(until: (XContent) -> Bool) -> XContentSequence
func content(untilAndIncluding: (XContent) -> Bool) -> XContentSequence

The untilAndIncluding version also stops where the condition is fullfilled, but includes the according item.

Sequences of a more specific type are returned in sensible cases.

Example:

let document = try parseXML(fromText: """
<a><b/><c take="true"/><d/><e take="true"/></a>
""")

document
    .descendants({ element in element["take"] == "true" })
    .forEach { descendant in
        print(descendant)
    }

Output:

<c take="true">
<e take="true">

Note that the round parentheses “(...)” around the condition in the example is needed to distinguish it from the while: and until: versions. (There is no where: argument name, because without it the less common case while: – and to a lesser degree until: – is more easily visually distinguished from it, the more common case being syntactically the shortest. This plays out well in actual code.)

There also exist a shortcut for the common of filtering elements according to a name:

document
    .descendants("paragraph")
    .forEach { _ in
        print("found a paragraph!")"
    }

You can also use multiple names (e.g. descendants("paragraph", "table")). If no name is given, all elements are given in the result regardless the name, e.g. children() means the same as children.

If you know that there at most one child element with a certain name, use the following method (it returns the first child with this name if it exist):

func firstChild(_ name: String) -> XElement?

You might then also consider alternative names (giving you the first child where the name matches):

func firstChild(_ names: String...) -> XElement?

If you want to get the first ancestor with a certain name, use one of the following methods:

func ancestor(_ name: String) -> XElement?
func ancestor(_ names: String...) -> XElement?

Chained iterators

Iterators can also be chained. The second iterator is executed on each of the node encountered by the first iterator. All this iteration is lazy, so the first iterator only searches for the next node if the second iterator is done with the current node found by the first iterator.

Example:

let document = try parseXML(fromText: """
<a>
    <b>
        <c>
            <d/>
        </c>
    </b>
</a>
""")

document.descendants.descendants.forEach { print($0) }

Output:

<b>
<c>
<d>
<c>
<d>
<d>

Also, in those chains operations finding single nodes when applied to a single node like parent also work, and you can use e.g. insertNext (see the section on tree manipulations), or with (see the next section on constructing XML), or echo().

When using an index with a String, you get a sequence of the according attribute values (where set):

for childID in element.children["id"] {
    print("found child ID \(childID)")
}

Note that when using an Int as subscript value for a sequence of content, you get the child of the according index:

if let secondCHild = element.children[2] {
    print("second child: \(secondChild)")
}

NOTE

If you use this subscript notation [n] for a sequence of XContent, XElement, or XText, then – despite using integer values – this is not (!) a random access to the elements (each time using such a subscript, the sequence is followed until the according item is found by counting), and the counting starts at 1 as in the XPath language, and not at 0 as e.g. for Swift arrays.

You should see this integer subscript more as a subscript with names, the integer values being the names that the positions are given in the XML, where counting from 1 is common.


Constructing XML

Constructing an empty element

When constructing an element (without content), the name is given as the first (nameless) argument and the attribute values are given as (nameless) a dictionary.

Example: constructing an empty “paragraph” element with attributes id="1" and style="note":

let myElement = XElement("paragraph", ["id": "1", "style": "note"])

About the insertion of content

We would first like to give some important hints before we explain the corresponding functionalities in detail.

Note that when inserting content into an element or document and that content already exists somewhere else, the inserted content is moved from its orginal place, and not copied. If you would like to insert a copy, insert the result of the clone() method of the content.

Be “courageous” when formulating your code, more might function than you might have thought. Anticipating the explanations in the following sections, e.g. the following code examples do work:

Moving the “a” children and the “b” children of an element to the beginning of the element:

element.addFirst {
  element.children(“a”)
  element.children(“b”)
}

As the content is first constructed and then inserted, there is no inifinite loop here.

Note that in the result, the order of the content is just like defined inside the parentheses {...}, so in the example inside the resulting element there are first the “a” children and then the “b” children.

Wrap an element with another element:

element.replace {
   XElement("wrapper") {
      element
   }
}

The content that you define inside parentheses {...} is constructed from the inside to the outside. From the notes above you might then think that element in the example is not as its original place any more when the content of the “wrapper” element has been constructed, before the replacement could actually happen. Yes, this is true, but nevertheless the replace method still knows where to insert this “wrapper” element. The operation does work as you would expect from a naïve perspective.

An instance of any type conforming to XContentConvertible (it has to implement its collectXML(by:) method) can be inserted as XML:

struct MyStruct: XContentConvertible {
    
    let text1: String
    let text2: String
    
    func collectXML(by xmlCollector: inout XMLCollector) {
        xmlCollector.collect(XElement("text1") { text1 })
        xmlCollector.collect(XElement("text2") { text2 })
    }
    
}

let myStruct1 = MyStruct(text1: "hello", text2: "world")
let myStruct2 = MyStruct(text1: "greeting", text2: "you")

let element = XElement("x") {
    myStruct1
    myStruct2
}

element.echo(pretty: true)

Result:

<x>
  <text1>hello</text1>
  <text2>world</text2>
  <text1>greeting</text1>
  <text2>you</text2>
</x>

For XContentConvertible there is also the xml property that returns an according array of XContent.

Defining content

When constructing an element, its contents are given in parentheses {...} (those parentheses are the builder argument of the initializer).

let myElement = XElement("div") {
    XElement("hr")
    XElement("paragraph") {
        "Hello World"
    }
    XElement("hr")
}

(The text "Hello World" could also be given as XText("Hello World"). The text will be converted in such an XML node automatically.)

The content might be given as an array or an appropriate sequence:

let myElement = XElement("div") {
    XElement("hr")
    myOtherElement.content
    XElement("hr")
}

When not defining content, using map might be a sensible option:

let element = XElement("z") {
    XElement("a") {
        XElement("a1")
        XElement("a2")
    }
    XElement("b") {
        XElement("b1")
        XElement("b2")
    }
}

element.children.map{ $0.children.first }.forEach { print($0?.name ?? "-") }

Output:

a1
b1

The same applies to e.g. the filter method, which, besides letting the code look more complex when used instead of the filter options described above, is not a good option when defining content.

The content of elements containing other elements while defining their content is being built from the inside to the ouside: Consider the following example:

let b = XElement("b")

let a = XElement("a") {
    b
    "Hello"
}

a.echo(pretty: true)

print("\n------\n")

b.replace {
    XElement("wrapper1") {
        b
        XElement("wrapper2") {
            b.next
        }
    }
}

a.echo(pretty: true)

First, the element “wrapper2” is built, and at that moment the sequence b.next contains the text "Hello". So we will get as output:

<a><b/>Hello</a>

------

<a>
  <wrapper1>
    <b/>
    <wrapper2>Hello</wrapper2>
  </wrapper1>
</a>

Document membership in constructed elements

Elements that are part of a document (XDocument) are registered in the document. The reason is that this allows fast access to elements and attributes of a certain name via elements(_:) and the exact functioning of rules (see the section below on rules).

In the moment of constructing a new element with its content defined in {...} brackets during construction, the element is not part any document. The nodes inserted to it leave the document tree, but they are not (!) unregistered from the document. I.e. the iteration elements(_:) will still find them, and according rules will apply to them. The reason for this behaviour is the common case of the new element getting inserted into the same document. If the content of the new element would first get unregistered from the document and then get reinserted into the same document again, they would then count as new elements, and the mentioned iterations might iterate over them again.

If you would like to get the content a newly built element to get unregistered from the document, use its method adjustDocument(). This method diffuses the current document of the element to its content. For a newly built element this document is nil, which unregisters a node from its document. You might also set the attribute adjustDocument to true in the initializer of the element to automatically call adjustDocument() when the building of the new element is accomplished. This call or setting to adjust of the document is only necessary at the top-level element, it is dispersed through the whole tree.

Note that if you insert an element into another document that is part of a document, the new child gets registered in the document of its new parent if not already registered there (and unregistered from any different document where it was registered before).

Example: a newly constructed element gets added to a document:

let document = try parseXML(fromText: """
<a><b id="1"/><b id="2"/></a>
""")

document.elements("b").forEach { element in
    print("applying the rule to \(element)")
    if element["id"] == "2" {
        element.insertNext {
            XElement("c") {
                element.previous
            }
        }
    }
}

print("\n-----------------\n")

document.echo()

Output:

applying the rule to <b id="1">
applying the rule to <b id="2">

-----------------

<a><b id="2"/><c><b id="1"/></c></a>

As you can see from the print commands in the last example, the element <b id="1"> does not lose its “connection” to the document (although it seems to get added again to it), so it is only iterated over once by the iteration.

Tree manipulations

Besides changing the node properties, an XML tree can be changed by the following methods. Some of them return the subject itself as a discardable result. For the content specified in {...} (the builder) the order is preserved.

Add nodes at the end of the content of an element or a document respectively:

func add(builder: () -> [XContent])

Add nodes to the start of the content of an element or a document respectively:

func addFirst(builder: () -> [XContent])

Add nodes as the nodes previous to the node:

func insertPrevious(_ insertionMode: InsertionMode = .following, builder: () -> [XContent])

Add nodes as the nodes next to the node:

func insertNext(_ insertionMode: InsertionMode = .following, builder: () -> [XContent])

A more precise type is returned from insertPrevious and insertNext if the type of the subject is more precisely known.

By using the next two methods, a node gets removed.

Remove the node from the tree structure and the document:

func remove()

You might also use the method removed() of a node to remove the node but also use the node.

Replace the node by other nodes:

func replace(_ insertionMode: InsertionMode = .following, builder: () -> [XContent])

Note that the content that replaces a node is allowed to contain the node itself.

Clear the contents of an element or a document respectively:

func clear()

Test if an element or a document is empty:

var isEmpty: Bool

Set the contents of an element or a document respectively:

func setContent(builder: () -> [XContent])

Example:

myDocument.elements("table").forEach { table in
    table.insertNext {
        XElement("legend") {
            "this is the table legend"
        }
        XElement("caption") {
            "this is the table caption"
        }
    }
}

Note that by default iterations continue with new nodes inserted by insertPrevious or insertNext also being considered. In the following cases, you have to add the .skipping directive to get the output as noted below (in the second case, you even get an infinite loop if you do not set .skipping):

let element = XElement("top") {
    XElement("a1") {
        XElement("a2")
    }
    XElement("b1") {
        XElement("b2")
    }
    XElement("c1") {
        XElement("c2")
    }
}

element.echo(pretty: true)

print("\n---- 1 ----\n")

element.content.forEach { content in
    content.replace(.skipping) {
        content.content
    }
}

element.echo(pretty: true)

print("\n---- 2 ----\n")

element.contentReversed.forEach { content in
    content.insertPrevious(.skipping) {
        XElement("I" + ((content as? XElement)?.name ?? "?"))
    }
}

element.echo(pretty: true)

Output:

<top>
  <a1>
    <a2/>
  </a1>
  <b1>
    <b2/>
  </b1>
  <c1>
    <c2/>
  </c1>
</top>

---- 1 ----

<top>
  <a2/>
  <b2/>
  <c2/>
</top>

---- 2 ----

<top>
  <Ia2/>
  <a2/>
  <Ib2/>
  <b2/>
  <Ic2/>
  <c2/>
</top>

Note that there is no such mechanism to skipping inserted content when not using insertPrevious, insertNext, or replace, e.g. when using add. Consider the combination descendants.add: there is then no “natural” way to correct the traversal of the tree. (A more common use case would be something like descendants("table").add { XElement("caption") }, so this should not be a problem in common cases, but something you should be aware of.)

When using insertNext, replace etc. in chained iterators, what happens is that the definition of the content in the parentheses {...} get executed for each item in the sequence. You might should use the collect function to build content specifically for the current item instead. E.g. in the last example, you might use with the same result:

print("\n---- 1 ----\n")

element.content.replace { content in
    collect {
        content.content
    }
}

element.echo(pretty: true)

print("\n---- 2 ----\n")

element.contentReversed.insertPrevious { content in
    find {
        XElement("I" + ((content as? XElement)?.name ?? "?"))
    }
}

element.echo(pretty: true)

You may also not use collect:

let e = XElement("a") {
    XElement("b")
    XElement("c")
}

for descendant in e.descendants({ $0.name != "added" }) {
    descendant.add { XElement("added") }
}

e.echo(pretty: true)

Output:

<a>
  <b>
    <added/>
  </b>
  <c>
    <added/>
  </c>
</a>

Note that a new <added/> is created each time. From what has already bee said, it should be clear that this “duplication” does not work with existing content (unless you use clone() or shallowClone()):

let myElement = XElement("a") {
    XElement("to-add")
    XElement("b")
    XElement("c")
}

for descendant in myElement.descendants({ $0.name != "to-add" }) {
    descendant.add {
        myElement.descendants("to-add")
    }
}

myElement.echo(pretty: true)

Output:

<a>
  <b/>
  <c>
    <to-add/>
  </c>
</a>

As a general rule, when inserting a content, and that content is already part of another element or document, that content does not get duplicated, but removed from its original position.

Use clone() (or shallowClone()) when you actually want content to get duplicated, e.g. using myElement.descendants("to-add").clone() in the last example would then output:

<a>
  <to-add/>
  <b>
    <to-add/>
  </b>
  <c>
    <to-add/>
    <to-add/>
  </c>
</a>

By default, When you insert content, this new content is also followed (insertion mode .following), as this best reflects the dynamic nature of this library. If you do not want this, set .skipping as first argument of insertPrevious or insertNext. For example, consider the following code:

let myElement = XElement("top") {
    XElement("a")
}

myElement.descendants.forEach { element in
    if element.name == "a" {
        element.insertNext() {
            XElement("b")
        }
    }
    else if element.name == "b" {
        element.insertNext {
            XElement("c")
        }
    }
}

myElement.echo(pretty: true)

Output:

<top>
  <a/>
  <b/>
  <c/>
</top>

When <b/> gets inserted, the traversal also follows this inserted content. When you would like to skip the inserted content, use .skipping as the first argument of insertNext:

    ...
        element.insertNext(.skipping) {
            XElement("b")
        }
    ...

Output:

<top>
  <a/>
  <b/>
</top>

Similarly, if you replace a node, the content that gets inserted in place of the node is by default included in the iteration. Example: Assume you would like to replace every occurrence of some <bold> element by its content:

let document = try parseXML(fromText: """
    <text><bold><bold>Hello</bold></bold></text>
    """)
document.descendants("bold").forEach { b in b.replace { b.content } }
document.echo()

The output is:

<text>Hello</text>

Handling of text

Subsequent text nodes (XText) are always automatically combined, and text nodes with empty text are automatically removed. The same treatment is applied to XLiteral nodes.

This can be very convenient when processing text, e.g. it is then very straightforward to apply regular expressions to the text in a document. But there might be some stumbling blocks involved here, when the different behaviour of text nodes and other nodes affects the result of your manipulations.

You can avoid merging of text text with other texts by setting the isolated property to true (you can also choose to set this value during initialization of an XText). Consider the following example where the occurrences of a search text gets a greenish background. In this example, you do not want part to be added to text in the iteration:

let searchText = "world"

document.traverse { node in
    if let text = node as? XText {
        if text.value.contains(searchText) {
            text.isolated = true
            var addSearchText = false
            text.value.components(separatedBy: searchText).forEach { part in
                text.insertPrevious {
                    addSearchText ? XElement("span", ["style": "background:LightGreen"]) {
                        searchText
                    } : nil
                    part
                }
                addSearchText = true
            }
            text.remove()
            text.isolated = false
        }
    }
}

document.echo()

Output:

<a>Hello <span style="background:LightGreen">world</span>, the <span style="background:LightGreen">world</span> is nice.</a>

Note that when e.g. inserting nodes, the XText nodes of them are then treated as being isolated while being moved.

A String can be used where an XText is required, e.g. you can write "Hello" as XText".

XText, as well as XLiteral and XCDATASection, conforms to the XTextualContentRepresentation protocol, i.e. they all have a String property of name value that can be read and set and which represents content as it would be written into the serialized document (with some character escapes necessary in the case of XText when it is being written). Note that XComment does not conform to the XTextualContentRepresentation protocol.

Rules

When you only want to apply a few changes to a document, just go directly to the few according elements and apply the changes you want. But if you would like to transform a whole document into “something else”, you need a better tool to organise your manipulations of the document, you need a “transformation”.

As mentioned in the general description, a set of rules XRule in the form of a transformation instance of type XTransformation can be used as follows.

In a rule, the user defines what to do with elements or attributes certain names. The set of rules can then be applied to a document, i.e. the rules are applied in the order of their definition. This is repeated, guaranteeing that a rule is only applied once to the same object (if not removed from the document and added again), until no application takes place. So elements can be added during application of a rule and then later be processed by the same or another rule.

Example:

let document = try parseXML(fromText: """
<a><formula id="1"/></a>
""")

var count = 1

let transformation = XTransformation {

    XRule(forElements: "formula") { element in
        print("\n----- Rule for element \"formula\" -----\n")
        print("  \(element)")
        if count == 1 {
            count += 1
            print("  add image")
            element.insertPrevious {
                XElement("image", ["id": "\(count)"])
            }

        }
    }

    XRule(forElements: "image") { element in
        print("\n----- Rule for element \"image\" -----\n")
        print("  \(element)")
        if count == 2 {
            count += 1
            print("  add formula")
            element.insertPrevious {
                XElement("formula", ["id": "\(count)"])
            }
        }
    }

}

transformation.execute(inDocument: document)

print("\n----------------------------------------\n")

document.echo()

----- Rule for element "formula" -----

  <formula id="1">
  add image

----- Rule for element "image" -----

  <image id="2">
  add formula

----- Rule for element "formula" -----

  <formula id="3">

----------------------------------------

<a><formula id="3"/><image id="2"/><formula id="1"/></a>

As a side note, for such an XTransformation the lengths of the element names do not really matter: apart from the initialization of the transformation before the execution and from what happens inside the rules, the appliance of the rules is not less efficient if the element names are longer.

Instead of using a transformation with a very large number of rules, you should use several transformations, each dedicated to a separate “topic”. E.g. for some document format you might first transform the inline elements and then the block elements. Splitting a transformation into several transformations practically does not hurt performance.

Note that the order of the rules matters: If you need to look up e.g. the parent of the element in a rule, it is important to know if this parent has already been changed by another rule, i.e. if a preceding rule has transformed this element. An example is given in the following section “Transformations with inverse order”. The usage of several transformations as described in the preciding paragraph might help here. Methods to work with better contextual information are described in the sections “Transformations with attachments for context information”, “Transformations with document versions”, and “Transformations with traversals” below.

Also note that using an XTransformation you can only transform a whole document. In the section “Transformations with traversals” below, another option is described for transforming any XML tree.

A transformation can be stopped by calling stop() on the transformation, although that only works indirectly:

var transformationAlias: XTransformation? = nil

let transformation = XTransformation {

    XRule(forElements: "a") { _ in
        transformationAlias?.stop()
    }

}

transformationAlias = transformation

transformation.execute(inDocument: myDocument)

Transformations with inverse order

As noted in the last section, the order of rules a crucial in some transformation, e.g. if the original context is important.

The “inverse order” of rules goes from the inner elements to the outer element so that the context is still unchanged when the rule applies, note the lookup of element.parent?.name to differentiate the color of the text:

let document = try parseXML(fromText: """
    <document>
        <section>
            <hint>
                <paragraph>This is a hint.</paragraph>
            </hint>
            <warning>
                <paragraph>This is a warning.</paragraph>
            </warning>
        </section>
    </document>
    """, textAllowedInElementWithName: { $0 == "paragraph" })

let transformation = XTransformation {

    XRule(forElements: "paragraph") { element in
        let style: String? = if element.parent?.name == "warning" {
            "color:Red"
        } else {
            nil
        }
        element.replace {
            XElement("p", ["style": style]) {
                element.content
            }
        }
    }

    XRule(forElements: "hint", "warning") { element in
        element.replace {
            XElement("div") {
                XElement("p", ["style": "bold"]) {
                    element.name.uppercased()
                }
                element.content
            }
        }
    }
}

transformation.execute(inDocument: document)

document.echo(pretty: true)

Result:

<document>
  <section>
    <div>
      <p style="bold">HINT</p>
      <p>This is a hint.</p>
    </div>
    <div>
      <p style="bold">WARNING</p>
      <p style="color:Red">This is a warning.</p>
    </div>
  </section>
</document>

This method might not be fully applicable in some transformations.

Transformations with attachments for context information

To have information about the context in the original document of transformed elements, attachements might be used. See how in the following code attached: ["source": element.name] is used in the construction of the div element, and how this information is then used in the rules for the paragraph element (the input document is the same as in the section “Transformations with inverse order” above; note that the inverse order described in that section is not used here):

let transformation = XTransformation {

    XRule(forElements: "hint", "warning") { element in
        element.replace {
            XElement("div", attached: ["source": element.name]) {
                XElement("p", ["style": "bold"]) {
                    element.name.uppercased()
                }
                element.content
            }
        }
    }

    XRule(forElements: "paragraph") { element in
        let style: String? = if element.parent?.attached["source"] as? String == "warning" {
            "color:Red"
        } else {
            nil
        }
        element.replace {
            XElement("p", ["style": style]) {
                element.content
            }
        }
    }
}

transformation.execute(inDocument: document)

document.echo(pretty: true)

The result is the same as in the section “Transformations with inverse order” above.

Transformations with document versions

As explained in the above section about rules, sometimes you need to know the original context of a transformed element. For this you can use document versions, as explained below.

Note that this method comes with an penalty regarding efficiency because to need to create a (temparary) clone, but for very difficult transformations that might come in handy. The method might be used when you need to examine the orginal context in a complex way.

You first create a document version (this creates a clone such that your current document contains backlinks to the clone), and in certian rules, you might then copy the backlink from the node to be replaced by using the withBackLinkFrom: argument in the creation of an element (the input document is the same as in the section “Transformations with inverse order” above):

let transformation = XTransformation {

    XRule(forElements: "hint", "warning") { element in
        element.replace {
            XElement("div", withBackLinkFrom: element) {
                XElement("p", ["style": "bold"]) {
                    element.name.uppercased()
                }
                element.content
            }
        }
    }

    XRule(forElements: "paragraph") { element in
        let style: String? = if element.parent?.backLink?.name == "warning" {
            "color:Red"
        } else {
            nil
        }
        element.replace {
            XElement("p", ["style": style]) {
                element.content
            }
        }
    }
}

// make a clone with inverse backlinks,
// pointing from the original document to the clone:
document.makeVersion()

transformation.execute(inDocument: document)

// remove the clone:
document.forgetLastVersion()

document.echo(pretty: true)

The result is the same as in the section “Transformations with inverse order” above.

Transformations with traversals

There is also another possibility for formulating transformations which uses traversals and which and can also be applied to parts of a document or to XML trees that are not part of a document.

As the XML tree can be changed during a traversal, you can traverse an XML tree and change the tree during the traversal by e.g. formulating manipulations according to the name of the current element inside a switch statement.

If you then formulate manipulations during the down direction of the traversal, you know that parents or other ancestors of the current node have already been transformed. Conversely, if you formulate manipulations only inside the up: traversal part and never manipulate any ancestors of the current element, you know that the parent and other ancestors are still the original ones (the input document is the same as in the section “Transformations with inverse order” above):

for section in document.elements("section") {
    section.traverse { node in
        // -
    } up: { node in
        if let element = node as? XElement {
            guard node !== section else { return }
            switch element.name {
            case "paragraph":
                let style: String? = if element.parent?.name == "warning" {
                    "color:Red"
                } else {
                    nil
                }
                element.replace {
                    XElement("p", ["style": style]) {
                        element.content
                    }
                }
            case "hint", "warning":
                element.replace {
                    XElement("div") {
                        XElement("p", ["style": "bold"]) {
                            element.name.uppercased()
                        }
                        element.content
                    }
                }
            default:
                break
            }
        }
    }
}

document.echo(pretty: true)

As the root of the traversal is not to be removed during the traversal, there is an according guard statement.

The result is the same as in the section “Transformations with inverse order” above.

Note that when using traversals for transforming an XML tree, using several transformations instead of one does have a negative impact on efficiency.

Handling of namespaces

The library is very strong when it comes to tracking elements of a certain name and formulating according rules. Adding an additional layer by supporting namespaces directly at those points would make the implementation of the library more complicated and less efficient. Let us see then how one would then handle XML documents which are using namespaces.

First, you can always look up the namespace prefix settings (attributes xmlns:...) in your document. As mentioned in the section about limitations of the XML input, the annotations of namespace prefixes via xmlns:... attributes should only be at the root element of the XML source. There are then the following two helper methods to help you with the task of handling the namespaces:

Read the the full prefix for a namespace URL string from the root element:

XDocument.fullPrefix(forNamespace:) -> String

“Full” means that a closing : is added automatically. If no prefix is defined, an empty string is returned.

Get a map from the namespace URL strings to the full prefixes from the root element:

XDocument.fullPrefixesForNamespaces

When you then like to access or change elements in that namespace, add the according prefix dynamically in your code:

let fullMathMLPrefix = myDocument.fullPrefix(forNamespace: "http://www.w3.org/1998/Math/MathML")

let transformation = XTransformation {

    XRule(forElements: "\(fullMathMLPrefix)a") { a in
        ...
    }

    ...

If you would like to add a namespace declaration at the root element, use the following method:

XDocument.setNamespace(:withPossiblyFullPrefix:)

Here the prefix might be a “full” prefix, i.e. it could contain a closing :. An existing namespace declaration for the same namespace but with another prefix is not (!) removed.

Note these three helper methods are also avalaible for an element.

Using async/await

You can use traverse with closures using await. And you can use the async property of the Swift Async Algorithms package (giving a AsyncLazySequence) to apply map etc. with closures using await (e.g. element.children.async.map { await a.f($0) }).

Currently the SwiftXML packages defined a forEachAsync method for closure arguments using await, but this method might be removed in future versions of the package if the Swift Async Algorithms package should define it for AsyncLazySequence.

Convenience extensions

XContent has the following extensions that are very convenient when working with XML in a complex manner:

  • applying: apply some changes to an instance and return the instance
  • fullfilling: test a condition for an instance and return it the condition is true, else return nil
  • fullfills: test a condition on an instance return its result

(fullfilling is, in principle, a variant of the filter method for just one item.)

It is difficult to show the convenience of those extension with simple examples, where is easy to formulate the code without them. But they come in handy if the situation gets more complex.

Example:

let element1 = XElement("a") {
    XElement("child-of-a") {
        XElement("more", ["special": "yes"])
    }
}

let element2 = XElement("b")

if let childOfA = element1.fullfilling({ $0.name == "a" })?.children.first,
   childOfA.children.first?.fullfills({ $0["special"] == "yes" && $0["moved"] != "yes"  }) == true {
    element2.add {
        childOfA.applying { $0["moved"] = "yes" }
    }
}

element2.echo()

Result:

<b><child-of-a moved="yes"><more special="yes"/></child-of-a></b>

applying is also predefined for a content sequence or a element sequence where it is shorter than using the map method in the general case (where a return statement might have to be included) and you can directly use it to define content (without the asContent property decribed above):

let myElement = XElement("a") {
    XElement("b", ["inserted": "yes"]) {
        XElement("c", ["inserted": "yes"])
    }
}

print(Array(myElement.descendants.applying{ $0["inserted"] = "yes" }))

Result:

[<b inserted="yes">, <c inserted="yes">]

Tools

copyXStructure

public func copyXStructure(from start: XContent, to end: XContent, upTo: XElement? = nil, correction: ((StructureCopyInfo) -> XContent)?) -> XContent?

Copies the structure from start to end, optionally up to the upTo value. start and end must have a common ancestor. Returns nil if there is no common ancestor. The returned element is a clone of the upTo value if a) it is not nil and b) upTo is an ancestor of the common ancestor or the ancestor itself. Else it is the clone of the common ancestor (but generally with a different content in both cases). The correction can do some corrections.

Debugging

If one uses multiple instances of XRule bundled into a XTRansformation to transform a whole document, in can be useful to know which actions belonging to which rules "touched" an element. In debug builds all filenames and line numbers that are executed by a transformation during execution are recorded in the encounteredActionsAt property.

Swiftpack is being maintained by Petr Pavlik | @ptrpavlik | @swiftpackco | API | Analytics