Swiftpack.co - Package - tsolomko/SWCompression

SWCompression

Swift 4.1 GitHub license Build Status

A framework with (de)compression algorithms and functions for processing various archives and containers.

What is this?

SWCompression — is a framework with a collection of functions for:

  1. Decompression (and sometimes compression) using different algorithms.
  2. Reading (and sometimes writing) archives of different formats.
  3. Reading (and sometimes writing) containers such as ZIP, TAR and 7-Zip.

It also works both on Apple platforms and Linux.

All features are listed in the tables below. "TBD" means that feature is planned but not implemented (yet).

| | Deflate | BZip2 | LZMA/LZMA2 | | ------------- | ------- | ----- | ---------- | | Decompression | ✅ | ✅ | ✅ | | Compression | ✅ | ✅ | TBD |

| | Zlib | GZip | XZ | ZIP | TAR | 7-Zip | | ----- | ---- | ---- | --- | --- | --- | ----- | | Read | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Write | ✅ | ✅ | TBD | TBD | ✅ | TBD |

Also, SWCompression is written with Swift only.

Installation

SWCompression can be integrated into your project using Swift Package Manager, CocoaPods or Carthage.

Swift Package Manager

Add SWCompression to you package dependencies and specify it as a dependency for your target, e.g.:

import PackageDescription

let package = Package(
    name: "PackageName",
    dependencies: [
        .package(url: "https://github.com/tsolomko/SWCompression.git",
                 from: "4.5.0")
    ],
    targets: [
        .target(
            name: "TargetName",
            dependencies: ["SWCompression"]
        )
    ]
)

More details you can find in Swift Package Manager's Documentation.

CocoaPods

Add pod 'SWCompression', '~> 4.5' and use_frameworks! to your Podfile.

To complete installation, run pod install.

If you need only some parts of framework, you can install only them using sub-podspecs. Available subspecs:

  • SWCompression/BZip2
  • SWCompression/Deflate
  • SWCompression/Gzip
  • SWCompression/LZMA
  • SWCompression/LZMA2
  • SWCompression/SevenZip
  • SWCompression/TAR
  • SWCompression/XZ
  • SWCompression/Zlib
  • SWCompression/ZIP

"Optional Dependencies"

For both ZIP and 7-Zip there is a most commonly used compression method. This is Deflate for ZIP and LZMA/LZMA2 for 7-Zip. Thus, SWCompression/ZIP subspec has SWCompression/Deflate subspec as a dependency and SWCompression/LZMA subspec is a dependency for SWCompression/SevenZip.

But both of these formats support other compression methods as well, and some of them are implemented in SWCompression. For CocoaPods configurations there are some sort of 'optional dependencies' for such compression methods.

"Optional dependency" in this context means that SWCompression/ZIP or SWCompression/7-Zip will support particular compression methods only if a corresponding subspec is expicitly specified in your Podfile and installed.

List of "optional dependecies":

  • For SWCompression/ZIP:
    • SWCompression/BZip2
    • SWCompression/LZMA
  • For SWCompression/SevenZip:
    • SWCompression/BZip2
    • SWCompression/Deflate

Note: If you use Swift Package Manager or Carthage you always have everything (ZIP and 7-Zip are built with Deflate, BZip2 and LZMA/LZMA2 support).

Carthage

Add to your Cartfile github "tsolomko/SWCompression" ~> 4.5.

Then run carthage update.

Finally, drag and drop SWCompression.framework from Carthage/Build folder into the "Embedded Binaries" section on your targets' "General" tab in Xcode.

SWCompression uses BitByteData framework, so Carthage will also download it, and you should drag and drop BitByteData.framework file into the "Embedded Binaries" as well.

Usage

Basic Example

If you'd like to decompress "deflated" data just use:

// let data = <Your compressed data>
let decompressedData = try? Deflate.decompress(data: data)

However, it is unlikely that you will encounter deflated data outside of any archive. So, in case of GZip archive you should use:

let decompressedData = try? GzipArchive.unarchive(archiveData: data)

Handling Errors

Most SWCompression functions can throw an error and you are responsible for handling them. If you look at list of available error types and their cases, you may be frightened by their number. However, most of these cases (such as XZError.wrongMagic) exist for diagnostic purposes.

Thus, you only need to handle the most common type of error for your archive/algorithm. For example:

do {
    // let data = <Your compressed data>
    let decompressedData = try XZArchive.unarchive(archive: data)
} catch let error as XZError {
    <handle XZ related error here>
} catch let error {
    <handle all other errors here>
}

Or, if you don't care about errors at all, use try?.

Documentation

Every function or type of SWCompression's public API is documented. This documentation can be found at its own website.

Sophisticated example

There is a small command-line program, "swcomp", which is included in this repository in "Sources/swcomp". To build it you need to uncomment several lines in "Package.swift" and run swift build -c release.

Contributing

Whether you find a bug, have a suggestion, idea or something else, please create an issue on GitHub.

In case you have encoutered a bug, it would be especially helpful if you attach a file (archive, etc.) that caused the bug to happen.

If you'd like to contribute code, please create a pull request on GitHub.

Note: If you are considering working on SWCompression, please note that Xcode project (SWCompression.xcodeproj) was created manually and you shouldn't use swift package generate-xcodeproj command.

Executing tests locally

If you'd like to run tests on your computer, you need to do an additional step after cloning this repository:

git submodule update --init --recursive

This command downloads files which are used for testing. These files are stored in a separate repository. Git LFS is used for storing them which is the reason for having them in the separate repository, since Swift Package Manager have some problems with Git LFS-enabled repositories (installing git-lfs locally with --skip-smudge option is required to solve these problems).

Note: You can also use "Utils/prepare-workspace-macos.sh" script from the repository, which not only downloads test files but also downloads dependencies.

Performance

Usage of whole module optimizations is recommended for best performance. These optimizations are enabled by default for Release configurations.

Tests Results document contains results of benchmarking of various functions.

Why?

First of all, existing solutions for work with compression, archives and containers have certain disadvantages. They might not support a particular compression algorithm or archive format and they all have different APIs, which sometimes can be slightly confusing for users. This project attempts to provide missing (and sometimes existing) functionality through unified API which is easy to use and remember.

Secondly, it may be important to have a compression framework written completely in Swift, without relying on either system libraries or solutions implemented in different languages. Additionaly, since SWCompression is written fully in Swift without Objective-C, it can also be used on Linux.

Future plans

See 5.0 Update Project for the list of planned API changes and new features.

  • Performance...
  • Better Deflate compression.
  • Something else...

Support Financially

If you would like to support this project or me financially you can do so via PayPal using this link.

License

MIT licensed

References

Github

link
Stars: 31
Help us keep the lights on

Dependencies

Used By

Total:

Releases

4.5.0 - Sep 11, 2018

  • Added LZMAProperties struct with simple member-wise initializer.
  • Added LZMA.decompress(data:properties:uncompressedSize:) function (with uncompressedSize argument being optional) which allows to specify LZMA properties.
    • Useful in situations when properties are known from some external source instead of being encoded at the beginning of data.
    • Note, that these new APIs are intended to be used by expert users and as such no validation is performed on LZMA properties values.
  • Added support for Delta "filter" in both XZ archives and 7-Zip containers.
  • Added support for SHA-256 check type in XZ archives.
    • As a result XZError.checkTypeSHA256 is now never thrown and will be removed in the next major update.
  • Added ZipEntryInfo.crc property.
  • Fixed a problem where XZArchive.unarchive and XZArchive.splitUnarchive functions would produce incorrect result when more than one "filter" was used (though it was practically impossible to encounter this issue since only one filter was supported (LZMA2) until this update).
  • Reduced in-memory size of ZipEntryInfo instances.
    • Some rough estimates indicate that the reduction is up to 68%.
  • Clarified documentation for LZMA.decompress(data:) to explain expectation about data argument.
    • Particularly, it is explained that it expects LZMA properties encoded with standard LZMA encoding scheme at the beginning of data.
  • swcomp changes:
    • zip -i command now also prints CRC32 for all entries.
    • -v is now accepted as an alias for --verbose option.

4.5.0-test - Sep 6, 2018

This is the first and only test release for the upcoming 4.5.0 update. It includes new LZMAProperties APIs, support for SHA-256 check for XZ archives and support for delta filter in 7-ZIP and XZ, as well as a couple of fixes.

Known issue: no documentation for new APIs.

4.4.0 - Aug 9, 2018

A couple of side notes before diving into release notes:

  1. I've started a github project board where I am going to track and plan changes and additions for 5.0 Update.
  2. If you ever wanted to financially support either this project or me you can now do so using this link.

Creating TAR containers

The main addition in this update is a set of APIs which allow to create a new TAR container.

  • Added TarContainer.create(from:) function which creates a new TAR container with provided array of TarEntry objects as its content and generates container's Data.
  • Added TarCreateError error type with a single case utf8NonEncodable. Comment: This enum is planned to be merged with TarError in 5.0. A new enum had to be created since otherwise it would be a breaking change to introduce a new case to already existing enum.

To enable reasonable usage scenarios for these new APIs, additional changes to existing APIs have been made:

  • TarEntry.info and TarEntry.data are now var-properties (instead of let).
  • Accessing setter of TarEntry.data now automatically updates TarEntry.info.size with data.count value (or 0 if data is nil). Comment: Maintaining consistency between these two properties is extremely important for producing correct and valid containers.
  • Added (or, rather, made public) TarEntry.init(info:data:) initializer.
  • Most public properties of TarEntryInfo are now var-properties (instead of let). Exceptions: size and type. Comment: Property size is kept read-only for reasons mentioned above. The reason for not allowing mutating type property is more vague: it is hard to imagine usage scenario where changing the type of an entry makes sense. Moreover, there are some concerns about (potential future) behavior in more generic context with type-erased ContainerEntryInfo objects, etc.
  • Added TarEntryInfo.init(name:type:) initializer.

I do realize that this set of APIs is somewhat limited. For example, it is not easy to convert ZIP container (array of ZipEntry objects) to TAR using these new additions. But rest assured, there are plans to provide more generic functionality for creating new containers in the future (something like TarContainer.create(from entries: [ContainerEntry]) throws -> Data).

Other Changes

  • Improved compatibility with other TAR implementations:
    • All string fields of TAR headers are now treated as UTF-8 strings. Comment: This is compatible with previous behavior since ASCII strings are UTF-8 strings.
    • Non-well-formed numeric fields of TAR headers no longer cause TarError.wrongField to be thrown and instead result in nil values of corresponding properties of TarEntryInfo (exception: size field). Comment: Mainly, this change was made to accommodate situations when a TAR header field is absent (i.e. filled with NULLs). Absent size field is still not accepted since its value impacts the structure of the container. This particular behavior is consistent with other implementations.
    • Base-256 encoding of numeric fields, which is sometimes used for very big or negative values, is now supported.
    • Leading NULLs and whitespaces in numeric fields are now correctly skipped.
    • Sun Extended Headers are now processed as local PAX extended headers instead of being considered entries with .unknown type.
    • GNU TAR format features for incremental backups are now partially supported (access and creation time).
  • TarContainer.formatOf now correctly returns TarFormat.gnu when GNU format "magic" field is encountered.
  • A new (copy) Data object is now created for TarEntry.data property instead of using a slice of input container data. Comment: This change makes indices of TarEntry.data zero-based which is consistent with other containers. This should also prevent keeping in memory Data for the entire container until the TarEntry object is destroyed.
  • Fixed incorrect file name of TAR entries from containers with GNU TAR format-specific features being used.
  • Fixed TarError.wrongPaxHeaderEntry error being thrown when header with multi-byte UTF-8 characters is encountered.
  • Fixed incorrect values of TarEntryInfo.ownerID, groupID, deviceMajorNumber and deviceMinorNumber properties (previously, they were assumed to be encoded as decimal numbers).
  • Slightly improved performance of LZMA/LZMA2 operations by making internal classes declared as final.
  • swcomp changes:
    • Added -c, --create option to tar command which creates a new TAR container.
    • Output of bencmark commands is now properly flushed on non-Linux platforms.
    • Results for omitted iterations of benchmark commands are now also printed.
    • Iteration number in benchmark commands is now printed with leading zeros.
    • Fixed compilation error on Linux platforms due to ObjCBool no longer being an alias for Bool.

4.4.0-test - Aug 5, 2018

This is the first and only test release for the upcoming 4.4.0 update. It includes functionality for creating new TAR containers as well as numerous fixes for TAR open/info functions.

Known issue: no documentation for new APIs.

4.3.0 - Apr 29, 2018

ZIP Custom Extra Fields

ZIP format provides capabilities to define third-party extra fields, so it is impossible for SWCompression to support all possible extra fields. In this update several APIs were added which allow users to define their own extra fields (aka "custom extra fields") and make SWCompression recognize them. All extra fields previously supported by SWCompression (aka "standard extra fields") are still supported.

  • Added ZipExtraField protocol.
  • Added ZipExtraFieldLocation enum.
  • Added ZipContainer.customExtraFields property.
  • Added ZipEntryInfo.customExtraFields property.

To add support of a custom extra field one must first create a type which conforms to ZipExtraField protocol. Then it must be added to ZipContainer.customExtraFields dictionary with the key equal to the id property of the type being added. If during execution of open(container:) or info(container:) functions custom extra field is found it will be processed using initializer of the provided type and stored in ZipEntryInfo.customExtraFields property of entry where this extra field was found.

Note: It is impossible to define custom extra field with the same ID as any of the standard extra fields and make SWCompression use user-defined extra field instead of the standard one (i.e. SWCompression first checks if ID is one of the standard IDs and then tries to find it in ZipContainer.customExtraFields dictionary).

TAR Formats

  • Added TarContainer.Format enum which represents various formats of TAR containers.
  • Added TarContainer.formatOf(container:) function which returns format of the TAR container.
  • Added -f, --format option to swcomp's tar command which prints format of the TAR container.

Comment: In the context of TAR containers "format" means a set of extensions to the basic TAR container layout which must be supported to successfully process given container.

Benchmark changes

  • Number of benchmark iterations increased from 6 to 10.
  • Benchmarks now have a zeroth iteration which is excluded from averages.

Comment: For some reason when benchmarked functions are being executed for the first time they perform significantly worse than any of the following iterations. So it was decided to drop this "zeroth" iteration from calculating of averages. This change, of course, artificially improves benchmark results, but, hopefully, makes them more reliable. On the other hand, the increase in number of iterations aims to improve accuracy of benchmarks in general.

Other changes

  • Updated to support Swift 4.1.
  • Minuimum required version of BitByteData is now 1.2.0.
  • Added TarEntryInfo.compressionMethod property which is always equal to .copy.
  • Added documenation for Container.Entry and ContainerEntry.Info associated types.
  • Reverted "disable symbol stripping" change from 4.2.0 update, since underlying problem was fixed in Carthage.