SWCompression
A framework with (de)compression algorithms and functions for processing various archives and containers.
What is this?
SWCompression — is a framework with a collection of functions for:
- Decompression (and sometimes compression) using different algorithms.
- Reading (and sometimes writing) archives of different formats.
- Reading (and sometimes writing) containers such as ZIP, TAR and 7-Zip.
It also works both on Apple platforms and Linux.
All features are listed in the tables below. "TBD" means that feature is planned but not implemented (yet).
| | Deflate | BZip2 | LZMA/LZMA2 | | ------------- | ------- | ----- | ---------- | | Decompression | ✅ | ✅ | ✅ | | Compression | ✅ | ✅ | TBD |
| | Zlib | GZip | XZ | ZIP | TAR | 7-Zip | | ----- | ---- | ---- | --- | --- | --- | ----- | | Read | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Write | ✅ | ✅ | TBD | TBD | ✅ | TBD |
Also, SWCompression is written with Swift only.
Installation
SWCompression can be integrated into your project using Swift Package Manager, CocoaPods or Carthage.
Swift Package Manager
Add SWCompression to you package dependencies and specify it as a dependency for your target, e.g.:
import PackageDescription
let package = Package(
name: "PackageName",
dependencies: [
.package(url: "https://github.com/tsolomko/SWCompression.git",
from: "4.5.0")
],
targets: [
.target(
name: "TargetName",
dependencies: ["SWCompression"]
)
]
)
More details you can find in Swift Package Manager's Documentation.
CocoaPods
Add pod 'SWCompression', '~> 4.5'
and use_frameworks!
to your Podfile.
To complete installation, run pod install
.
If you need only some parts of framework, you can install only them using sub-podspecs. Available subspecs:
- SWCompression/BZip2
- SWCompression/Deflate
- SWCompression/Gzip
- SWCompression/LZMA
- SWCompression/LZMA2
- SWCompression/SevenZip
- SWCompression/TAR
- SWCompression/XZ
- SWCompression/Zlib
- SWCompression/ZIP
"Optional Dependencies"
For both ZIP and 7-Zip there is a most commonly used compression method. This is Deflate for ZIP and LZMA/LZMA2 for 7-Zip. Thus, SWCompression/ZIP subspec has SWCompression/Deflate subspec as a dependency and SWCompression/LZMA subspec is a dependency for SWCompression/SevenZip.
But both of these formats support other compression methods as well, and some of them are implemented in SWCompression. For CocoaPods configurations there are some sort of 'optional dependencies' for such compression methods.
"Optional dependency" in this context means that SWCompression/ZIP or SWCompression/7-Zip will support particular compression methods only if a corresponding subspec is expicitly specified in your Podfile and installed.
List of "optional dependecies":
- For SWCompression/ZIP:
- SWCompression/BZip2
- SWCompression/LZMA
- For SWCompression/SevenZip:
- SWCompression/BZip2
- SWCompression/Deflate
Note: If you use Swift Package Manager or Carthage you always have everything (ZIP and 7-Zip are built with Deflate, BZip2 and LZMA/LZMA2 support).
Carthage
Add to your Cartfile github "tsolomko/SWCompression" ~> 4.5
.
Then run carthage update
.
Finally, drag and drop SWCompression.framework
from Carthage/Build
folder
into the "Embedded Binaries" section on your targets' "General" tab in Xcode.
SWCompression uses BitByteData framework, so Carthage will also download it,
and you should drag and drop BitByteData.framework
file into the "Embedded Binaries" as well.
Usage
Basic Example
If you'd like to decompress "deflated" data just use:
// let data = <Your compressed data>
let decompressedData = try? Deflate.decompress(data: data)
However, it is unlikely that you will encounter deflated data outside of any archive. So, in case of GZip archive you should use:
let decompressedData = try? GzipArchive.unarchive(archiveData: data)
Handling Errors
Most SWCompression functions can throw an error and you are responsible for handling them.
If you look at list of available error types and their cases, you may be frightened by their number.
However, most of these cases (such as XZError.wrongMagic
) exist for diagnostic purposes.
Thus, you only need to handle the most common type of error for your archive/algorithm. For example:
do {
// let data = <Your compressed data>
let decompressedData = try XZArchive.unarchive(archive: data)
} catch let error as XZError {
<handle XZ related error here>
} catch let error {
<handle all other errors here>
}
Or, if you don't care about errors at all, use try?
.
Documentation
Every function or type of SWCompression's public API is documented. This documentation can be found at its own website.
Sophisticated example
There is a small command-line program, "swcomp", which is included in this repository in "Sources/swcomp".
To build it you need to uncomment several lines in "Package.swift" and run swift build -c release
.
Contributing
Whether you find a bug, have a suggestion, idea or something else, please create an issue on GitHub.
In case you have encoutered a bug, it would be especially helpful if you attach a file (archive, etc.) that caused the bug to happen.
If you'd like to contribute code, please create a pull request on GitHub.
Note: If you are considering working on SWCompression, please note that Xcode project (SWCompression.xcodeproj)
was created manually and you shouldn't use swift package generate-xcodeproj
command.
Executing tests locally
If you'd like to run tests on your computer, you need to do an additional step after cloning this repository:
git submodule update --init --recursive
This command downloads files which are used for testing. These files are stored in a
separate repository.
Git LFS is used for storing them which is the reason for having them in the separate repository,
since Swift Package Manager have some problems with Git LFS-enabled repositories
(installing git-lfs locally with --skip-smudge
option is required to solve these problems).
Note: You can also use "Utils/prepare-workspace-macos.sh" script from the repository, which not only downloads test files but also downloads dependencies.
Performance
Usage of whole module optimizations is recommended for best performance. These optimizations are enabled by default for Release configurations.
Tests Results document contains results of benchmarking of various functions.
Why?
First of all, existing solutions for work with compression, archives and containers have certain disadvantages. They might not support a particular compression algorithm or archive format and they all have different APIs, which sometimes can be slightly confusing for users. This project attempts to provide missing (and sometimes existing) functionality through unified API which is easy to use and remember.
Secondly, it may be important to have a compression framework written completely in Swift, without relying on either system libraries or solutions implemented in different languages. Additionaly, since SWCompression is written fully in Swift without Objective-C, it can also be used on Linux.
Future plans
See 5.0 Update Project for the list of planned API changes and new features.
- Performance...
- Better Deflate compression.
- Something else...
Support Financially
If you would like to support this project or me financially you can do so via PayPal using this link.
License
References
- pyflate
- Deflate specification
- GZip specification
- Zlib specfication
- LZMA SDK and specification
- XZ specification
- Wikipedia article about LZMA
- .ZIP Application Note
- ISO/IEC 21320-1
- List of defined ZIP extra fields
- Wikipedia article about TAR
- Pax specification
- Basic TAR specification
- star man pages
- Apache Commons Compress
- A walk through the SA-IS Suffix Array Construction Algorithm
- Wikipedia article about BZip2
Github
link |
Stars: 34 |
Help us keep the lights on
Dependencies
Releases
4.5.0 - Sep 11, 2018
- Added
LZMAProperties
struct with simple member-wise initializer. - Added
LZMA.decompress(data:properties:uncompressedSize:)
function (withuncompressedSize
argument being optional) which allows to specify LZMA properties.- Useful in situations when properties are known from some external source instead of being encoded at the beginning of
data
. - Note, that these new APIs are intended to be used by expert users and as such no validation is performed on LZMA properties values.
- Useful in situations when properties are known from some external source instead of being encoded at the beginning of
- Added support for Delta "filter" in both XZ archives and 7-Zip containers.
- Added support for SHA-256 check type in XZ archives.
- As a result
XZError.checkTypeSHA256
is now never thrown and will be removed in the next major update.
- As a result
- Added
ZipEntryInfo.crc
property. - Fixed a problem where
XZArchive.unarchive
andXZArchive.splitUnarchive
functions would produce incorrect result when more than one "filter" was used (though it was practically impossible to encounter this issue since only one filter was supported (LZMA2) until this update). - Reduced in-memory size of
ZipEntryInfo
instances.- Some rough estimates indicate that the reduction is up to 68%.
- Clarified documentation for
LZMA.decompress(data:)
to explain expectation aboutdata
argument.- Particularly, it is explained that it expects LZMA properties encoded with standard LZMA encoding scheme at the beginning of
data
.
- Particularly, it is explained that it expects LZMA properties encoded with standard LZMA encoding scheme at the beginning of
- swcomp changes:
zip -i
command now also prints CRC32 for all entries.-v
is now accepted as an alias for--verbose
option.
4.5.0-test - Sep 6, 2018
This is the first and only test release for the upcoming 4.5.0 update. It includes new LZMAProperties
APIs, support for SHA-256 check for XZ archives and support for delta filter in 7-ZIP and XZ, as well as a couple of fixes.
Known issue: no documentation for new APIs.
4.4.0 - Aug 9, 2018
A couple of side notes before diving into release notes:
- I've started a github project board where I am going to track and plan changes and additions for 5.0 Update.
- If you ever wanted to financially support either this project or me you can now do so using this link.
Creating TAR containers
The main addition in this update is a set of APIs which allow to create a new TAR container.
- Added
TarContainer.create(from:)
function which creates a new TAR container with provided array ofTarEntry
objects as its content and generates container'sData
. - Added
TarCreateError
error type with a single caseutf8NonEncodable
. Comment: This enum is planned to be merged withTarError
in 5.0. A new enum had to be created since otherwise it would be a breaking change to introduce a new case to already existing enum.
To enable reasonable usage scenarios for these new APIs, additional changes to existing APIs have been made:
TarEntry.info
andTarEntry.data
are nowvar
-properties (instead oflet
).- Accessing setter of
TarEntry.data
now automatically updatesTarEntry.info.size
withdata.count
value (or 0 ifdata
isnil
). Comment: Maintaining consistency between these two properties is extremely important for producing correct and valid containers. - Added (or, rather, made public)
TarEntry.init(info:data:)
initializer. - Most public properties of
TarEntryInfo
are nowvar
-properties (instead oflet
). Exceptions:size
andtype
. Comment: Propertysize
is kept read-only for reasons mentioned above. The reason for not allowing mutatingtype
property is more vague: it is hard to imagine usage scenario where changing the type of an entry makes sense. Moreover, there are some concerns about (potential future) behavior in more generic context with type-erasedContainerEntryInfo
objects, etc. - Added
TarEntryInfo.init(name:type:)
initializer.
I do realize that this set of APIs is somewhat limited. For example, it is not easy to convert ZIP container (array of ZipEntry
objects) to TAR using these new additions. But rest assured, there are plans to provide more generic functionality for creating new containers in the future (something like TarContainer.create(from entries: [ContainerEntry]) throws -> Data
).
Other Changes
- Improved compatibility with other TAR implementations:
- All string fields of TAR headers are now treated as UTF-8 strings. Comment: This is compatible with previous behavior since ASCII strings are UTF-8 strings.
- Non-well-formed numeric fields of TAR headers no longer cause
TarError.wrongField
to be thrown and instead result innil
values of corresponding properties ofTarEntryInfo
(exception:size
field). Comment: Mainly, this change was made to accommodate situations when a TAR header field is absent (i.e. filled with NULLs). Absentsize
field is still not accepted since its value impacts the structure of the container. This particular behavior is consistent with other implementations. - Base-256 encoding of numeric fields, which is sometimes used for very big or negative values, is now supported.
- Leading NULLs and whitespaces in numeric fields are now correctly skipped.
- Sun Extended Headers are now processed as local PAX extended headers instead of being considered entries with
.unknown
type. - GNU TAR format features for incremental backups are now partially supported (access and creation time).
TarContainer.formatOf
now correctly returnsTarFormat.gnu
when GNU format "magic" field is encountered.- A new (copy)
Data
object is now created forTarEntry.data
property instead of using a slice of input container data. Comment: This change makes indices ofTarEntry.data
zero-based which is consistent with other containers. This should also prevent keeping in memoryData
for the entire container until theTarEntry
object is destroyed. - Fixed incorrect file name of TAR entries from containers with GNU TAR format-specific features being used.
- Fixed
TarError.wrongPaxHeaderEntry
error being thrown when header with multi-byte UTF-8 characters is encountered. - Fixed incorrect values of
TarEntryInfo.ownerID
,groupID
,deviceMajorNumber
anddeviceMinorNumber
properties (previously, they were assumed to be encoded as decimal numbers). - Slightly improved performance of LZMA/LZMA2 operations by making internal classes declared as
final
. - swcomp changes:
- Added
-c
,--create
option totar
command which creates a new TAR container. - Output of bencmark commands is now properly flushed on non-Linux platforms.
- Results for omitted iterations of benchmark commands are now also printed.
- Iteration number in benchmark commands is now printed with leading zeros.
- Fixed compilation error on Linux platforms due to
ObjCBool
no longer being an alias forBool
.
- Added
4.4.0-test - Aug 5, 2018
This is the first and only test release for the upcoming 4.4.0 update. It includes functionality for creating new TAR containers as well as numerous fixes for TAR open/info functions.
Known issue: no documentation for new APIs.
4.3.0 - Apr 29, 2018
ZIP Custom Extra Fields
ZIP format provides capabilities to define third-party extra fields, so it is impossible for SWCompression to support all possible extra fields. In this update several APIs were added which allow users to define their own extra fields (aka "custom extra fields") and make SWCompression recognize them. All extra fields previously supported by SWCompression (aka "standard extra fields") are still supported.
- Added
ZipExtraField
protocol. - Added
ZipExtraFieldLocation
enum. - Added
ZipContainer.customExtraFields
property. - Added
ZipEntryInfo.customExtraFields
property.
To add support of a custom extra field one must first create a type which conforms to ZipExtraField
protocol. Then it must be added to ZipContainer.customExtraFields
dictionary with the key equal to the id
property of the type being added. If during execution of open(container:)
or info(container:)
functions custom extra field is found it will be processed using initializer of the provided type and stored in ZipEntryInfo.customExtraFields
property of entry where this extra field was found.
Note: It is impossible to define custom extra field with the same ID as any of the standard extra fields and make SWCompression use user-defined extra field instead of the standard one (i.e. SWCompression first checks if ID is one of the standard IDs and then tries to find it in ZipContainer.customExtraFields
dictionary).
TAR Formats
- Added
TarContainer.Format
enum which represents various formats of TAR containers. - Added
TarContainer.formatOf(container:)
function which returns format of the TAR container. - Added
-f
,--format
option to swcomp'star
command which prints format of the TAR container.
Comment: In the context of TAR containers "format" means a set of extensions to the basic TAR container layout which must be supported to successfully process given container.
Benchmark changes
- Number of benchmark iterations increased from 6 to 10.
- Benchmarks now have a zeroth iteration which is excluded from averages.
Comment: For some reason when benchmarked functions are being executed for the first time they perform significantly worse than any of the following iterations. So it was decided to drop this "zeroth" iteration from calculating of averages. This change, of course, artificially improves benchmark results, but, hopefully, makes them more reliable. On the other hand, the increase in number of iterations aims to improve accuracy of benchmarks in general.
Other changes
- Updated to support Swift 4.1.
- Minuimum required version of BitByteData is now 1.2.0.
- Added
TarEntryInfo.compressionMethod
property which is always equal to.copy
. - Added documenation for
Container.Entry
andContainerEntry.Info
associated types. - Reverted "disable symbol stripping" change from 4.2.0 update, since underlying problem was fixed in Carthage.