Swiftpack.co - Package - iabudiab/HTMLKit

HTMLKit

HTMLKit Logo

An Objective-C framework for your everyday HTML needs.

Build Status codecov Carthage Compatible CocoaPods Compatible CocoaDocs Platform License MIT

Quick Overview

HTMLKit is a WHATWG specification-compliant framework for parsing and serializing HTML documents and document fragments for iOS and OSX. HTMLKit parses real-world HTML the same way modern web browsers would.

HTMLKit provides a rich DOM implementation for manipulating and navigating the document tree. It also understands CSS3 selectors making node-selection and querying the DOM a piece of cake.

DOM Validation

DOM mutations are validated as described in the WHATWG DOM Standard. Invalid DOM manipulations throw hierarchy-related exceptions. You can disable these validations, which will also increase the performance by about 20-30%, by defining the HTMLKIT_NO_DOM_CHECKS compiler constant.

Tests

HTMLKit passes all of the HTML5Lib Tokenizer and Tree Construction tests. The html5lib-tests is configured as a git-submodule. If you plan to run the tests, do not forget to pull it too.

The CSS3 Selector implementation is tested with an adapted version of the CSS3 Selectors Test Suite, ignoring the tests that require user interaction, session history, and scripting.

Does it Swift?

Check out the playground!

Installation

Carthage

Carthage is a decentralized dependency manager that builds your dependencies and provides you with binary frameworks.

If you don't have Carthage yet, you can install it with Homebrew using the following command:

$ brew update
$ brew install carthage

To add HTMLKit as a dependency into your project using Carthage just add the following line in your Cartfile:

github "iabudiab/HTMLKit"

Then run the following command to build the framework and drag the built HTMLKit.framework into your Xcode project.

$ carthage update

CocoaPods

CocoaPods is a dependency manager for Cocoa projects.

If you don't have CocoaPods yet, you can install it with the following command:

$ gem install cocoapods

To add HTMLKit as a dependency into your project using CocoaPods just add the following in your Podfile:

target 'MyTarget' do
  pod 'HTMLKit', '~> 3.1'
end

Then, run the following command:

$ pod install

Swift Package Manager

Swift Package Manager is the package manager for the Swift programming language.

Add HTMLKit to your Package.swift dependecies:

.Package(url: "https://github.com/iabudiab/HTMLKit", majorVersion: 3)

Then run:

$ swift build

Manually

1- Add HTMLKit as git submodule

$ git submodule add https://github.com/iabudiab/HTMLKit.git

2- Open the HTMLKit folder and drag'n'drop the HTMLKit.xcodeproj into the Project Navigator in Xcode to add it as a sub-project.

3- In the General panel of your target add HTMLKit.framework under the Embedded Binaries

Parsing

Parsing Documents

Given some HTML content, you can parse it either via the HTMLParser or instatiate a HTMLDocument directly:

NSString *htmlString = @"<div><h1>HTMLKit</h1><p>Hello there!</p></div>";

// Via parser
HTMLParser *parser = [[HTMLParser alloc] initWithString:htmlString];
HTMLDocument *document = [parser parseDocument];

// Via static initializer
HTMLDocument *document = [HTMLDocument documentWithString:htmlString];

Parsing Fragments

You can also prase HTML content as a document fragment with a specified context element:

NSString *htmlString = @"<div><h1>HTMLKit</h1><p>Hello there!</p></div>";

HTMLParser *parser = [[HTMLParser alloc] initWithString: htmlString];

HTMLElement *tableContext = [[HTMLElement alloc] initWithTagName:@"table"];
NSArray *nodes = [parser parseFragmentWithContextElement:tableContext];

for (HTMLNode *node in nodes) {
	NSLog(@"%@", node.outerHTML);
}

// The same parser instance can be reusued:
HTMLElement *bodyContext = [[HTMLElement alloc] initWithTagName:@"body"];
nodes = [parser parseFragmentWithContextElement:bodyContext];

The DOM

The DOM tree can be manipulated in several ways, here are just a few:

  • Create new elements and assign attributes
HTMLElement *description = [[HTMLElement alloc] initWithTagName:@"meta"  attributes: @{@"name": @"description"}];
description[@"content"] = @"HTMLKit for iOS & OSX";
  • Append nodes to the document
HTMLElement *head = document.head;
[head appendNode:description];

HTMLElement *body = document.body;
NSArray *nodes = @[
	[[HTMLElement alloc] initWithTagName:@"div" attributes: @{@"class": @"red"}],
	[[HTMLElement alloc] initWithTagName:@"div" attributes: @{@"class": @"green"}],
	[[HTMLElement alloc] initWithTagName:@"div" attributes: @{@"class": @"blue"}]
];
[body appendNodes:nodes];
  • Enumerate child elements and perform DOM editing
[body enumerateChildElementsUsingBlock:^(HTMLElement *element, NSUInteger idx, BOOL *stop) {
	if ([element.tagName isEqualToString:@"div"]) {
		HTMLElement *lorem = [[HTMLElement alloc] initWithTagName:@"p"];
		lorem.textContent = [NSString stringWithFormat:@"Lorem ipsum: %lu", (unsigned long)idx];
		[element appendNode:lorem];
	}
}];
  • Remove nodes from the document
[body removeChildNodeAtIndex:1];
[head removeAllChildNodes];
[body.lastChild removeFromParentNode];
  • Manipulate the HTML directly
greenDiv.innerHTML = @"<ul><li>item 1<li>item 2";
  • Navigate to child and sibling nodes
HTMLNode *firstChild = body.firstChild;
HTMLNode *greenDiv = firstChild.nextSibling;
  • Iterate the DOM tree with custom filters
HTMLNodeFilterBlock *filter =[HTMLNodeFilterBlock filterWithBlock:^ HTMLNodeFilterValue (HTMLNode *node) {
	if (node.childNodesCount != 1) {
		return HTMLNodeFilterReject;
	}
	return HTMLNodeFilterAccept;
}];

for (HTMLElement *element in [body nodeIteratorWithShowOptions:HTMLNodeFilterShowElement filter:filter]) {
	NSLog(@"%@", element.outerHTML);
}
  • Create and manipulate DOM Ranges
HTMLDocument *document = [HTMLDocument documentWithString:@"<div><h1>HTMLKit</h1><p id='foo'>Hello there!</p></div>"];
HTMLRange *range = [[HTMLRange alloc] initWithDocument:document];

HTMLNode *paragraph = [document querySelector:@"#foo"];
[range selectNode:paragraph];
[range extractContents];

CSS3 Selectors

All CSS3 Selectors are supported except for the pseudo-elements (::first-line, ::first-letter, ...etc.). You can use them the way you always have:

// Given the document:
NSString *htmlString = @"<div><h1>HTMLKit</h1><p class='greeting'>Hello there!</p><p class='description'>This is a demo of HTMLKit</p></div>";
HTMLDocument *document = [HTMLDocument documentWithString: htmlString];

// Here are some of the supported selectors
NSArray *paragraphs = [document querySelectorAll:@"p"];
NSArray *paragraphsOrHeaders = [document querySelectorAll:@"p, h1"];
NSArray *hasClassAttribute = [document querySelectorAll:@"[class]"];
NSArray *greetings = [document querySelectorAll:@".greeting"];
NSArray *classNameStartsWith_de = [document querySelectorAll:@"[class^='de']"];

NSArray *hasAdjacentHeader = [document querySelectorAll:@"h1 + *"];
NSArray *hasSiblingHeader = [document querySelectorAll:@"h1 ~ *"];
NSArray *hasSiblingParagraph = [document querySelectorAll:@"p ~ *"];

NSArray *nonParagraphChildOfDiv = [document querySelectorAll:@"div :not(p)"];

HTMLKit also provides API to create selector instances in a type-safe manner without the need to parse them first. The previous examples would like this:

NSArray *paragraphs = [document elementsMatchingSelector:typeSelector(@"p")];
NSArray *paragraphsOrHeaders = [document elementsMatchingSelector:
	anyOf(@[
		typeSelector(@"p"), typeSelector(@"h1")
	])
];

NSArray *hasClassAttribute = [document elementsMatchingSelector:hasAttributeSelector(@"class")];
NSArray *greetings = [document elementsMatchingSelector:classSelector(@"greeting")];
NSArray *classNameStartsWith_de = [document elementsMatchingSelector:attributeSelector(CSSAttributeSelectorBegins, @"class", @"de")];

NSArray *hasAdjacentHeader = [document elementsMatchingSelector:adjacentSiblingSelector(typeSelector(@"h1"))];
NSArray *hasSiblingHeader = [document elementsMatchingSelector:generalSiblingSelector(typeSelector(@"h1"))];
NSArray *hasSiblingParagraph = [document elementsMatchingSelector:generalSiblingSelector(typeSelector(@"p"))];

NSArray *nonParagraphChildOfDiv = [document elementsMatchingSelector:
	allOf(@[
		childOfElementSelector(typeSelector(@"div")),
		not(typeSelector(@"p"))
	])
];

Here are more examples:

HTMLNode *firstDivElement = [document firstElementMatchingSelector:typeSelector(@"div")];

NSArray *secondChildOfDiv = [firstDivElement querySelectorAll:@":nth-child(2)"];
NSArray *secondOfType = [firstDivElement querySelectorAll:@":nth-of-type(2n)"];

secondChildOfDiv = [firstDivElement elementsMatchingSelector:nthChildSelector(CSSNthExpressionMake(0, 2))];
secondOfType = [firstDivElement elementsMatchingSelector:nthOfTypeSelector(CSSNthExpressionMake(2, 0))];

NSArray *notParagraphAndNotDiv = [firstDivElement querySelectorAll:@":not(p):not(div)"];
notParagraphAndNotDiv = [firstDivElement elementsMatchingSelector:
	allOf([
		not(typeSelector(@"p")),
		not(typeSelector(@"div"))
	])
];

One more thing! You can also create your own selectors. You either subclass the CSSSelector or just use the block-based wrapper. For example the previous selector can be implemented like this:

CSSSelector *myAwesomeSelector = namedBlockSelector(@"myAwesomeSelector", ^BOOL (HTMLElement *element) {
	return ![element.tagName isEqualToString:@"p"] && ![element.tagName isEqualToString:@"div"];
});
notParagraphAndNotDiv = [firstDivElement elementsMatchingSelector:myAwesomeSelector];

Change Log

See the CHANGELOG.md for more info.

License

HTMLKit is available under the MIT license. See the LICENSE file for more info.

Github

link
Stars: 191

Dependencies

Used By

Total: 0

Releases

HTMLKit 3.1.0 - 2019-08-20 16:23:12

Release on 2019.08.20

Added

  • HTMLTreeVisitor that walks the DOM in tree order
  • New HTML serialization implementation based on visitor pattern

Fixes

  • HTML serialization for deeply nested DOM trees (issue #33)
  • Occasional Internal Consistency exceptions when deallocating node iterator (issue #36)

HTMLKit 3.0.0 - 2019-03-28 21:22:05

Released on 2019.03.28

Breaking Change

  • Introduce prefix for NSString and NSCharacterSet categories to prevent collision with existing code (issue #35)

HTMLKit 2.1.5 - 2018-07-16 20:17:43

Released on 2018.07.16

Fixes

  • Parser would handle foreign attributes incorrectly (issue #30)

HTMLKit 2.1.4 - 2018-05-01 18:10:05

Released on 2018.05.01

Fixes

  • gt(n), lt(n) and eq(n) selectors would select wrong elements for the zero-index (issue #25)

HTMLKit 2.1.3 - 2018-03-21 21:54:42

Released on 2018.03.21

Fixes

  • HTMLElement clone would return an immutable dictionary for attributes (issue #20)
    • Fixed by @CRivlaldo in PR #24
  • HTMLNodeFilterBlock would behave differently on simulator and device (issue #22)
    • Fixed by @CRivlaldo in PR #23

HTMLKit 2.1.2 - 2017-11-06 21:06:15

Released on 2017.11.6

Fixes

  • HTMLText serialization (issue #16)
  • HTMLElement attribute value serialization (issue #17)

HTMLKit 2.1.1 - 2017-10-13 21:06:44

Released on 2017.10.13

Hotfix

  • Fixed documentation comments
    • Should fix CocoaDocs generation and percentage

HTMLKit 2.1.0 - 2017-10-12 20:45:45

Released on 2017.10.12

Added

Updated

  • Project for Xcode 9
  • Travis config for iOS 11.0, macOS 10.13, tvOS 11.0 and watchOS 4.0
  • Updated HTML5Lib-Tests submodule (cbafeba)

HTMLKit 2.0.6 - 2017-05-02 13:51:48

Released on 2017.05.02

Added

  • Memory consumption improvements (issue #10)
    • Allocate childNodes collection in HTMLNode only when inserting child nodes
    • Replace NSStringFromSelector calls with constants in HTMLNode validations
    • Improve reverseObjectEnumerator usage while parsing HTML
    • Rewrite internal logic of the HTMLStackOfOpenElements to prevent excessive allocations

HTMLKit 2.0.5 - 2017-04-19 12:23:14

Released on 2017.04.19

Fixed

  • Xcode 8.3 issue with modulemaps
    • Temporary workaround (renamed modulemap file)
  • Memory Leaks in CSSInputStream

Added

  • Minor memory consumption improvements
    • Collections for child nodes or attributes of HTML Nodes or Elements are allocated lazily
    • Underlying data string of CharacterData is allocated on first access
    • Autorelease pool for the main HTMLTokenizer loop

HTMLKit 2.0.4 - 2017-04-19 11:12:39

Released on 2017.04.2

Fixed

  • Testing with Swift 3.1
    • Fixed by @tali in PR #8

Deprecated

  • HTMLRange initializers with typo
    • initWithDowcument:startContainer:startOffset:endContainer:endOffset:

HTMLKit 2.0.3 - 2017-03-05 23:32:53

Released on 2017.03.6

Fixed

  • Compilation for Swift 3.1
    • Fixed by @tali in PR #6

HTMLKit 2.0.2 - 2017-02-26 20:47:00

Released on 2017.02.26

Fixed

  • Retain cycles in HTMLNodeIterator (issue #4)
  • Retain cycles in HTMLRange (issue #5)
  • The layout of HTMLKit tests module for Swift Package Manager

HTMLKit 2.0.1 - 2017-02-20 22:05:13

Released on 2017.02.20

Hotifx

  • Set INSTALL_PATH and DYLIB_INSTALL_NAME_BASE to @rpath for macOS target
    • This fixes embedding HTMLKit in a Cocoa application

HTMLKit 2.0.0 - 2017-02-11 18:24:06

Released on 2017.02.11

Spec Change

Updated

  • Updated HTML5Lib-Tests submodule (13f1805)

HTMLKit 1.1.0 - 2017-01-14 23:03:56

Released on 2017.01.14

Added

  • DOM Ranges implementation (spec)
  • HTMLChatacterData as base class for HTMLText & HTMLComment
    • HTMLText and HTMLComment no longer extend HTMLNode directly
  • splitText implementation for HTMLText nodes
  • index property for HTMLNode
  • cloneNodeDeep method for HTMLNode

Deprecated

  • appendString method in HTMLText in favor of appendData from the supperclass HTMLCharacterData

HTMLKit 1.0.0 - 2016-09-28 00:11:50

Released on 2016.09.28

Added

  • Jazzy configuration file
  • Example HTMLKit project

Updated

  • Project for Xcode 8
  • Playground syntax for Swift 3
  • Travis config for iOS 10.0, macOS 10.12, tvOS 10.0 and watchOS 3.0
  • Deployment targets to macOS 10.9, iOS 9.0, tvOS 9.0 and watchOS 2.0

Fixed

  • Nullability annotation in CSSSelectorParser class
  • Missing lightweight generics in HTMLParser, HTMLNode & HTMLElement

HTMLKit 0.9.4 - 2016-09-03 15:25:00

Released on 2016.09.03

Added

  • Swift Package Manager support

HTMLKit 0.9.3 - 2016-07-16 13:02:02

Released on 2016.07.16

This release passes all html5lib-tests as of 2016.07.16

Added

  • watchOS and tvOS targets
  • Updated HTML5Lib-Tests submodule (c305da7)

HTMLKit 0.9.2 - 2016-05-18 19:19:00

Released on 2016.05.18

This release passes all tokenizer and tree-construction html5lib-tests as of 2016.05.18

Added

  • Handling for <menu> and <menuitem>
  • Changelog

Changed

  • Updated adoption agency algorithm according to the latest specification, see:
  • <isindex> is completely removed from the spec now, therefore it is dropped from the implementation
  • Tokenizer and Tree-Construction tests are now generated dynamically
  • Test failures are collected by a XCTestObservation for better reporting

Fixed

  • Parser now checks the qualified name instead of the local name when handling elements in the MathML and SVG namespaces

HTMLKit 0.9.1 - 2016-02-01 21:05:55

Released on 2016.01.29

Added

  • Travis-CI integration.
  • CocoaPods spec.

Changed

  • Warnings are treated as errors.

Fixed

  • Warnings related to format specifier and loss of precision due to NS(U)-integer usage.
  • Replaced @returns with @return throughout the documentation to play nicely with Jazzy.
  • Some README examples used Swift syntax.

HTMLKit 0.9.0 - 2016-02-01 21:05:36

Released on 2015.12.23

This is the first public release of HTMLKit.

Added

  • iOS & OSX Frameworks.
  • Source code documentation.
  • CSS Selectors extension (analogous to jQuery selectors).
  • DOMTokenList for malipulating HTMLElements attributes as a list, e.g. class.
  • Handling for <ruby> elements in the Parser implementation.
    • Updated HTML5Lib-Tests submodule (56c435f)
  • Xcode Playground with Swift documentation.

Removed

  • Unused namespaces.
  • Historical node types.

Fixed

  • lt, gt & eq CSS Selectors method declarations.

HTMLKit 0.3.0 - 2016-02-01 21:05:18

Released on 2015.11.29

Added

  • CSS3 Selectors support.
  • Nullability annotations.
  • HTMLNode properties for previous and next sibling elements.
  • HTMLNode methods for accessing child elements (analogous to child nodes).
  • NSCharacterSet category for HTML-related character sets.

Fixed

  • InputStreamReader's reconsume-logic that is required by the CSS Parser.

HTMLKit 0.2.0 - 2016-02-01 21:04:59

Released on 2015.06.06

Added

  • HTMLDocument methods to access root, head & body elements.
  • innerHTML implementation for the HTMLElement.
  • HTMLNode methods to append, prepend, check containment and descendancy of nodes.
  • HTMLNode methods to enumerate child nodes.
  • Implementations for NodeIterator and NodeFilter
  • Implementation for TreeWalker
  • Validation for DOM manipulations.
  • Tests for the DOM implementation.

Changed

  • type property renamed to nodeType in HTMLNode.
  • firstChildNode and lastChildNode renamed to firtChild and lastChild in HTMLNode.

Removed

  • baseURI proeprty from HTMLNode
  • HTMLNodeTreeEnumerator is superseded by the HTMLNodeIterator.

HTMLKit 0.1.0 - 2016-02-01 21:04:35

Released on 2015.04.20

Added

  • Initial release.
  • Initial DOM implementation.
  • Tokenizer and Parser pass all HTML5Lib tokenizer and tree construction tests except for <ruby> elements.