Swiftpack.co - Package - tsolomko/SWCompression

SWCompression

Swift 4 GitHub license Build Status

A framework with (de)compression algorithms and functions for processing various archives and containers.

What is this?

SWCompression — is a framework with a collection of functions for:

  1. Decompression (and sometimes compression) using different algorithms.
  2. Reading (and sometimes writing) archives of different formats.
  3. Reading containers such as ZIP, TAR and 7-Zip.

It also works both on Apple platforms and Linux.

All features are listed in the tables below. "TBD" means that feature is planned but not implemented (yet).

| | Deflate | BZip2 | LZMA/LZMA2 | | ------------- | ------- | ----- | ---------- | | Decompression | ✅ | ✅ | ✅ | | Compression | ✅ | ✅ | TBD |

| | Zlib | GZip | XZ | | ----- | ---- | ---- | --- | | Read | ✅ | ✅ | ✅ | | Write | ✅ | ✅ | TBD |

| | ZIP | TAR | 7-Zip | | ----- | --- | --- | ----- | | Read | ✅ | ✅ | ✅ | | Write | TBD | TBD | TBD |

Also, SWCompression is written with Swift only.

Installation

SWCompression can be integrated into your project using Swift Package Manager, CocoaPods or Carthage.

Swift Package Manager

Add SWCompression to you package dependencies and also specify it as a dependency for your target, e.g.:

import PackageDescription

let package = Package(
    name: "PackageName",
    dependencies: [
        .package(url: "https://github.com/tsolomko/SWCompression.git",
                 from: "4.0.0")
    ],
    targets: [
        .target(
            name: "TargetName",
            dependencies: ["SWCompression"]
        )
    ]
)

More details you can find in Swift Package Manager's Documentation.

CocoaPods

Add to your Podfile pod 'SWCompression'.

If you need only some parts of framework, you can install only them using sub-podspecs. Available subspecs:

  • SWCompression/BZip2
  • SWCompression/Deflate
  • SWCompression/Gzip
  • SWCompression/LZMA
  • SWCompression/LZMA2
  • SWCompression/SevenZip
  • SWCompression/TAR
  • SWCompression/XZ
  • SWCompression/Zlib
  • SWCompression/ZIP

Also, do not forget to include use_frameworks! line in your Podfile.

To complete installation, run pod install.

"Optional Dependencies"

Both ZIP and 7-Zip containers have a single compression method which is most likely to be used, for compression of data inside them. This is Deflate for ZIP and LZMA/LZMA2 for 7-Zip. Thus, SWCompression/ZIP subspec have SWCompression/Deflate subspec as a dependency and SWCompression/LZMA subspec as a dependency for SWCompression/SevenZip.

But both of these formats support other compression methods, and some of them are implemented in SWCompression. For CocoaPods configurations there are some sort of 'optional dependencies' for such compression methods.

"Optional dependency" in this context means that SWCompression/ZIP or SWCompression/7-Zip will support particular compression methods only if a corresponding subspec is expicitly specified in your Podfile and installed.

List of "optional dependecies":

  • For SWCompression/ZIP:
    • SWCompression/BZip2
    • SWCompression/LZMA
  • For SWCompression/SevenZip:
    • SWCompression/BZip2
    • SWCompression/Deflate

Note: If you use Carthage or Swift Package Manager you always have the full package and ZIP and 7-Zip are built with Deflate, BZip2 and LZMA/LZMA2 support.

Carthage

Add to your Cartfile github "tsolomko/SWCompression".

Then run carthage update.

Finally, drag and drop SWCompression.framework from Carthage/Build folder into the "Embedded Binaries" section on your targets' "General" tab in Xcode.

Usage

Basic Example

If you'd like to decompress "deflated" data just use:

// let data = <Your compressed data>
let decompressedData = try? Deflate.decompress(data: data)

However, it is unlikely that you will encounter deflated data outside of any archive. So, in case of GZip archive you should use:

let decompressedData = try? GzipArchive.unarchive(archiveData: data)

Handling Errors

Most SWCompression functions can throw an error and you are responsible for handling them. If you look at list of available error types and their cases, you may be frightened by their number. However, most of these cases (such as XZError.wrongMagic) exist for diagnostic purposes.

Thus, you only need to handle the most common type of error for your archive/algorithm. For example:

do {
    // let data = <Your compressed data>
    let decompressedData = try XZArchive.unarchive(archive: data)
} catch let error as XZError {
    <handle XZ related error here>
} catch let error {
    <handle all other errors here>
}

Or, if you don't care about errors at all, use try?.

Documentation

Every function or type of SWCompression's public API is documented. This documentation can be found at its own website.

Sophisticated example

There is a small command-line program, "swcomp", which is included in this repository in "Sources/swcomp". To build it you need to uncomment several lines in "Package.swift" and run swift build -c release.

Contributing

Whether you find a bug, have a suggestion, idea or something else, please create an issue on GitHub.

In case you have encoutered a bug, it would be especially helpful if you attach a file (archive, etc.) that caused the bug to happen.

If you'd like to contribute code, please create a pull request on GitHub.

Executing tests locally

If you'd like to run tests on your computer, you need to do some additional steps after cloning this repository:

git submodule update --init --recursive
cd Tests/Test\ Files
git lfs pull

These commands fetch example archives and other files which are used for testing. These files are stored in a separate repository. Git LFS is used for storing them which is the reason for having them in the separate repository, since Swift Package Manager have some problems with Git LFS-enabled repositories (it requires installing git-lfs locally with --skip-smudge option to solve these problems).

Performace

Usage of whole module optimizations is recommended for best performance. These optimizations are enabled by default for Release configurations.

Tests Results document contains results of performance testing of various functions.

Why?

First of all, existing solutions for work with compression, archives and containers have some problems. They might not support some particular compression algorithms or archive formats and they all have different APIs, which sometimes can be slightly "unfriendly" to users. This project attempts to provide missing (and sometimes existing) functionality through unified API, which is easy to use and remember.

Secondly, it may be important to have a compression framework written completely in Swift, without relying on either system libraries or solutions implemented in different languages. Additionaly, since SWCompression is written fully in Swift without Objective-C, it can also be compiled on Linux.

Future plans

  • Performance...
  • Better Deflate compression.
  • Something else...

References

Github

link
Stars: 18
Help us keep the lights on

Dependencies

Used By

Total: 0

Releases

4.0.1 - Nov 25, 2017

  • Starting with this update, git tags for releases no longer have "v" prefix, since absence of such prefix is a common practice among Swift developers.
  • Fixed incorrectly thrown XZError.wrongDataSize without actually trying to decompress anything.
  • Fixed crash when opening 7-Zip containers with more than 255 entries with empty streams.
  • Reduced memory usage by Deflate compression.
  • Improved performance in some extreme cases (e.g. containers with an enormous amount of entries).
  • No longer verify if ZIP string field needs UTF-8 encoding, if language encoding flag is set for the entry.
  • Added "perf-test" command to swcomp, which is used for measuring performance. See Results document for the new tests results.

v4.0.0 - Nov 18, 2017

Reworked "Container" API

There is a couple of ideas behind this rework:

  1. Enforce in API the idea that one can get information about entries without acquiring their data, but not vice versa.
  2. Provide a unified set of API for all three formats: ZIP, TAR and 7-Zip.
  3. Try to fix mistakes that were made in the development of previous versions.

Implementation of these ideas led to a lot of changes and here I will try to highlight most of them.

  • New protocol: ContainerEntryInfo.
    • It contains some informational properties from the previous version of ContainerEntry.
    • There are also new properties such as access/creation/modificationTime and type.

Comment: I would like to encourage you to check out this protocol's documentation as well as types that implement it: SevenZipEntryInfo, TarEntryInfo and ZipEntryInfo. These types not only have protocol's properties, but also contain their own format-specific members which may be useful in certain cases.

  • ContainerEntry now has only two members: info and data properties.

Comment: All properties that existed in previous versions were either removed or moved to ContainerEntryInfo. One may also note that entry's data is now acquired through constant property rather than function. This means that it is no longer possible to asynchronously unpack ZIP containers, but it worth mentioning, that it was never intended in the first place.

  • ContainerEntry now has an associated type Info: ContainerEntryInfo.
  • ContainerEntry.entryAttributes removed without replacement.

Comment: One may note that it is not a SWCompression's job to prepare entry's properties for the file system. Besides that, its existence was causing duplication of entry's information, so, in the end, it is for the best for this property to gone.

  • Container now has an associated type Entry: ContainerEntry.
  • open function now returns an array of associated type Entry.
  • Added new function info which returns an array of Entry.Info.

Comment: One of the useful consequences of these changes is that it is no longer necessary to cast the result of open(container:) to specific entry type (such as ZipEntry).

  • All existing ZIP, TAR and 7-Zip types conform to these protocols.
  • Added missing types for ZIP, TAR and 7-Zip with conformance to these protocols.
  • ZipEntry, TarEntry and SevenZipEntry now have only two members: info and data (in accordance with ContainerEntry protocol).
  • Standardised behavior when ContainerEntry.data can be nil:
    • If entry is a directory.
    • If data is unavailable, but error wasn't thrown for some reason.
    • 7-Zip only: if entry is an anti-file.
  • Removed SevenZipEntryInfo.isDirectory. Use type property instead.
  • SevenZipEntryInfo.name is no longer Optional.
    • Now throws SevenZipError.internalStructureError when file names cannot be properly processed.
    • Entries now have empty strings as names when no names were found in container.

Common Types

Added several new common types which are used across the framework:

  • CompressionMethod.
    • Used as a type of compressionMethod property of GzipHeader, ZlibHeader and ZipEntryInfo.
    • Removed GzipHeader.CompressionMethod.
  • ContainerEntryType.
    • Used as a type of type property of ContainerEntryInfo.
    • Removed TarEntry.EntryType.
  • DosAttributes.
    • It is the same as SevenZipEntryInfo.DosAttributes type from previous version.
    • Used as a type of dosAttributes property of SevenZipEntryInfo and ZipEntryInfo.
  • Permissions.
    • It is the same as SevenZipEntryInfo.Permissions type from previous version.
    • Used as a type of permissions property of ContainerEntryInfo.
  • FileSystemType.
    • Used as a type of GzipHeader.osType and ZipEntryInfo.fileSystemType properties.
    • Removed GzipHeader.FileSystemType.

Errors

  • Removed following errors:
    • SevenZipError.dataIsUnavailable
    • LZMAError.decoderIsNotInitialised
    • LZMA2Error.wrongProperties (LZMA2Error.wrongDictionarySize is thrown instead).
    • TarError.wrongUstarVersion.
    • TarError.notAsciiString (TarError.wrongField is thrown instead).
    • XZError.wrongFieldValue (XZError.wrongField is thrown instead).
  • Renamed following errors:
    • BZip2Error.wrongHuffmanLengthCode to BZip2Error.wrongHuffmanCodeLength.
    • BZip2Error.wrongCompressionMethod to BZip2Error.wrongVersion.
    • TarError.fieldIsNotNumber to TarError.wrongField.
    • XZError.reservedFieldValue to XZError.wrongField.
  • Standardised behavior for errors with names similar to wrongCRC:
    • These errors mean that everything went well, except for comparing the checksum.
    • Their associated values now contain all "successfully" unpacked data, including the one which caused checksum error.
      • This change affects BZip2.decompress, GzipArchive.multiUnarchive, XZArchive.unarchive, XZArchive.splitUnarchive, ZipContainer.open.
    • Some of these errors now have arrays as associated values to account for the situations with unpacked data split.
      • This change affects GzipArchive.multiUnarchive, XZArchive.unarchive, XZArchive.splitUnarchive, ZipContainer.open.

General changes

  • Renamed XZArchive.multiUnarchive to XZArchive.splitUnarchive.
  • XZArchive.unarchive now processes all XZ streams similar to splitUnarchive, but combines them into one output Data.

Comment: These two changes are made to fix intended but apparently incorrect behaviour from previous versions, which was causing inconveniences with some archives.

  • Fixed "bitReader is not aligned" precondition crash in Zlib.
  • Fixed potential incorrect behavior when opening ZIP containers with size bigger than 4 GB.
  • Updated to use Swift 4.
  • Various improvements to documentation.
  • "swcomp" is now included as part of this repository.

Comment: This was done to make it easier to synchronise changes between SWCompression and swcomp, if necessary. "swcomp" is not built by default.

v4.0.0-test.2 - Nov 11, 2017

  • Changes to Errors:
    • Removed:
      • SevenZipError.dataIsUnavailable
      • LZMAError.decoderIsNotInitialised
      • LZMA2Error.wrongProperties (LZMA2Error.wrongDictionarySize is thrown instead.)
    • Renamed BZip2Error.wrongHuffmanLengthCode to BZip2Error.wrongHuffmanCodeLength.
  • ContainerEntryInfo.name is no longer Optional.
    • 7z now throws internalStructureError when it is unable to process file names.
  • Various FileSystemType variables are no longer Optional.
    • They have .other value where it was nil previously.
  • TarEntryInfo.linkName is no longer Optional.
  • Fixed "bitReader is not aligned" precondition crash in Zlib.
  • ZlibHeader.compressionMethod now uses common CompressionMethod enum instead of its own.

v4.0.0-test.1 - Nov 5, 2017

Upcoming 4.0.0 update will include major rework of "Container API" as well as changes here and there. It will also have some internal changes which are tested by this test release.

v3.4.0 - Oct 3, 2017

  • Implementation of BZip2 compression algorithm.
    • There is also an auxiliary function BZip2.compress(data:blockSize:) if you need to specify size of input data block.
    • Default block size is 100 Kilobytes.
  • Added CompressionAlgorithm protocol.
  • Deflate now conforms to CompressionAlgorithm protocol (as well as BZip2).
  • Deflate.compress(data:) and ZlibArchive.archive(data:) no longer throw, but can crash with fatalError(). Comment: Though unlikely to happen, it seems more logical to crash with fatalError() when problems occur during compression instead of error throwing because it indicates there is a problem with the code itself, not with the input data.
  • Fixed crash in some rare cases for corrupted BZip2 archives. It now throws BZip2Error instead.