Swiftpack.co - Package - CoreOffice/CoreXLSX

CoreXLSX

Excel spreadsheet (XLSX) format parser written in pure Swift

Build Status Version License Platform Coverage

CoreXLSX is a library focused on representing the low-level structure of the XML-based XLSX spreadsheet format. It allows you to open a spreadsheet archive with .xlsx extension and map its internal structure into model types expressed directly in Swift.

Important to note that this library provides read-only support only for the .xlsx format. As the older legacy .xls spreadsheet format has completely different internals, please refer to other libraries if you need to work with files of that type.

If your .xlsx files use ECMA-376 agile encryption (which seems to be the most popular variety), have a look the CryptoOffice library.

Example

To run the example project, clone the repo, and run pod install from the Example directory first.

Model types in CoreXLSX directly map internal structure of XLSX format with more sensible naming applied to a few attributes. The API is pretty simple:

import CoreXLSX

let filepath = "./categories.xlsx"
guard let file = XLSXFile(filepath: filepath) else {
  fatalError("XLSX file at \(filepath) is corrupted or does not exist")
}

for (worksheetName, path) in try file.parseWorksheetPathsAndNames() {
  if let worksheetName = worksheetName {
    print("This worksheet has a name: \(worksheetName)")
  }

  let worksheet = try file.parseWorksheet(at: path)
  for row in worksheet.data?.rows ?? [] {
    for c in row.cells {
      print(c)
    }
  }
}

This prints raw cell data from every worksheet in the given XLSX file. Please refer to the Worksheet model for more atttributes you might need to read from a parsed file.

Strings in spreadsheet internals are frequently represented as strings shared between multiple worksheets, thus to parse a string value from a cell use of stringValue(_: SharedStrings) function is recommended together with try file.parseSharedString().

Here's how you can get all strings in column "C" for example:

let sharedStrings = try file.parseSharedStrings()
let columnCStrings = worksheet.cells(atColumns: [ColumnReference("C")!])
  .compactMap { $0.stringValue(sharedStrings) }

To parse a date value from a cell, use dateValue property on the Cell type:

let columnCDates = worksheet.cells(atColumns: [ColumnReference("C")!])
  .compactMap { $0.dateValue }

Similarly, to parse rich strings, use the richStringValue function:

let richStrings = try file.parseSharedStrings()
let columnCRichStrings = worksheet.cells(atColumns: [ColumnReference("C")!])
  .compactMap { $0.richStringValue(sharedStrings) }

Styles

Since version 0.5.0 you can parse style information from the archive with the new parseStyles() function. Please refer to the Styles model for more details. You should also note that not all XLSX files contain style information, so you should be prepared to handle the errors thrown from parseStyles() function in that case.

Here's a short example that fetches a list of fonts used:

let styles = try file.parseStyles()
let fonts = styles.fonts?.items.compactMap { $0.name?.value }

Reporting compatibility issues

If you stumble upon a file that can't be parsed, please file an issue posting the exact error message. Thanks to use of standard Swift Codable protocol, detailed errors are generated listing a missing attribute, so it can be easily added to the model enabling broader format support. Attaching a file that can't be parsed would also greatly help in diagnosing issues. If these files contain any sensitive data, we suggest obfuscating or generating fake data with same tools that generated original files, assuming the issue can still be reproduced this way.

If the whole file can't be attached, try passing a sufficiently large value (between 10 and 20 usually works well) to errorContextLength argument of XLSXFile initializer. This will bundle the failing XML snippet with the debug description of thrown errors. Please also attach the full debug description if possible when reporting issues.

How does it work?

Since every XLSX file is a zip archive of XML files, CoreXLSX uses XMLCoder library and standard Codable protocols to map XML nodes and atrributes into plain Swift structs. ZIPFoundation is used for in-memory decompression of zip archives. A detailed description is available here.

Requirements

Apple Platforms

  • Xcode 10.0 or later
  • Swift 4.2 or later
  • iOS 9.0 / watchOS 2.0 / tvOS 9.0 / macOS 10.11 or later deployment targets

Linux

  • Ubuntu 16.04 or later
  • Swift 5.1 or later

Installation

Swift Package Manager

Swift Package Manager is a tool for managing the distribution of Swift code. It’s integrated with the Swift build system to automate the process of downloading, compiling, and linking dependencies on all platforms.

Once you have your Swift package set up, adding CoreXLSX as a dependency is as easy as adding it to the dependencies value of your Package.swift.

dependencies: [
  .package(url: "https://github.com/CoreOffice/CoreXLSX.git",
           .upToNextMinor(from: "0.11.0"))
]

If you're using CoreXLSX in an app built with Xcode, you can also add it as a direct dependency using Xcode's GUI.

CocoaPods

CoreXLSX is available through CocoaPods on Apple's platforms. To install it, simply add pod 'CoreXLSX', '~> 0.11.0' to your Podfile like shown here:

source 'https://github.com/CocoaPods/Specs.git'
# Uncomment the next line to define a global platform for your project
# platform :ios, '9.0'
use_frameworks!
target '<Your Target Name>' do
  pod 'CoreXLSX', '~> 0.11.0'
end

Carthage

Carthage is a dependency manager that builds your dependencies and provides you with binary frameworks on Apple's platforms.

Carthage can be installed with Homebrew using the following command:

$ brew update
$ brew install carthage

Inside of your Cartfile, add GitHub path to CoreXLSX and its latest version:

github "CoreOffice/CoreXLSX" ~> 0.11.0

Then, run the following command to build the framework:

$ carthage update

Drag the built frameworks (including the subdependencies XMLCoder and ZIPFoundation) into your Xcode project.

Contributing

Sponsorship

If this library saved you any amount of time or money, please consider sponsoring the work of its maintainer. While some of the sponsorship tiers give you priority support or even consulting time, any amount is appreciated and helps in maintaining the project.

Development Workflow

On macOS the easiest way to start working on the project is to open the Package.swift file in Xcode 11. There is an extensive test suite that both tests files end-to-end and isolated snippets against their corresponding model values.

If Xcode 11 is not available for you and you'd like to develop the library or to run the test suite in Xcode 10, you need to run carthage bootstrap in the root directory of the cloned repository first. Please refer to the Carthage installation instructions described in the section above if you don't have Carthage installed. Then you can open the CoreXLSX.xcodeproj from the same directory and select the CoreXLSXmacOS scheme. This is the only scheme that has the tests set up, but you can also build any other scheme (e.g. CoreXLSXiOS) to make sure it builds on other platforms.

If you prefer not to work with Xcode, the project fully supports SwiftPM and the usual workflow with swift build and swift test should work, otherwise please report this as a bug.

Coding Style

This project uses SwiftFormat and SwiftLint to enforce formatting and coding style. We encourage you to run SwiftFormat within a local clone of the repository in whatever way works best for you either manually or automatically via an Xcode extension, build phase or git pre-commit hook etc.

To guarantee that these tools run before you commit your changes on macOS, you're encouraged to run this once to set up the pre-commit hook:

brew bundle # installs SwiftLint, SwiftFormat and pre-commit
pre-commit install # installs pre-commit hook to run checks before you commit

Refer to the pre-commit documentation page for more details and installation instructions for other platforms.

SwiftFormat and SwiftLint also run on CI for every PR and thus a CI build can fail with incosistent formatting or style. We require CI builds to pass for all PRs before merging.

Code of Conduct

This project adheres to the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to conduct@coreoffice.org.

Maintainers

Max Desiatov, Matvii Hodovaniuk

License

CoreXLSX is available under the Apache 2.0 license. See the LICENSE file for more info.

Github

link
Stars: 385

Used By

Total: 0

Releases

0.11.0 - 2020-05-30 20:52:58

This is a feature release that enables compatibility with CryptoOffice for decrypting spreadsheets. Additionally, with 0.11.0 you can easily get worksheet names with a new parseWorksheetPathsAndNames function on XLSXFile and get rich text values from cells with a new richStringValue function on Cell.

Due to technical issues, Swift 5.0 CI job for Linux has been removed, so compatibility with Swift 5.0 on Linux can no longer be guaranteed. While CoreXLSX may continue to work with Swift 5.0 on Linux, please update to Swift 5.1 or later to avoid unexpected issues.

Thanks to @kobylyanets and @duodo2412 for their contributions to this release!

New APIs:

  • XLSXFile now provides a new initializer that takes an argument of Data type. This allows opening in-memory documents, the primary example being spreadsheets decrypted with CryptoOffice.

  • XLSXFile now has a new parseWorksheetPathsAndNames function that returns an array of worksheet names and their paths in a given workbook, while previously you had to use parseWorksheetPaths and match paths manually with results of the parseWorkbooks function.

  • Cell now has a richStringValue function that takes a result of the XLSXFile.parseSharedStrings function and produces an array of RichText values. This makes it easier to query rich text content from cells, while previously you had to match cell values against SharedStrings manually.

Breaking change:

Due to the introduction of the new XLSXFile.init(data:) initializer, the filepath property on XLSXFile no longer makes sense. This property was not used internally in any way and in-memory files don't have any filepaths. If you need to refer to a filepath of an .xlsx file after you've parsed from your filesystem, you should retain it manually and process it separately as you see fit.

Closed issues:

  • API for matching sheet names to sheet paths (#105)

Merged pull requests:

0.10.0 - 2020-04-06 22:29:37

This is a release with bugfixes and a few improvements to usability of the spreadsheet cell values API. Thanks to all contributors and users, you provide an invaluable amount of feedback and help!

New API:

The library now provides a simplified API to fetch string and date values from cells, which is much easier to use than the previous version (which is still available).

Here's how you can get all strings (including shared strings) in column "C" for example:

let sharedStrings = try file.parseSharedStrings()
let columnCStrings = worksheet.cells(atColumns: [ColumnReference("C")!])
  .compactMap { $0.stringValue(sharedStrings) }

To parse a date value from a cell, use dateValue property on the Cell type:

let columnCDates = worksheet.cells(atColumns: [ColumnReference("C")!])
  .compactMap { $0.dateValue }

Breaking change:

The type property on Cell is no longer of String type. It was previously used to check if cell's type is equal to "s", which denoted a shared string. You should use enum values for that since this release, which for shared strings now is (unsurprisingly) .sharedString.

Closed issues:

  • Xcode 11 installation and build (#90)
  • Reading Date values from cell (#89)
  • Can't open xml (#82)
  • Not able to read Numeric data from Sheet (#81)
  • Getting the value of a cell with number format? (#71)
  • Opening xlsx file Document Directory, Crashes (#52)

Merged pull requests:

- 2019-11-08 15:19:51

This release adds a new value to the Relationship.SchemaType enum, which fixes compatibility with some spreadsheet files. Thanks to @mxcl for the bug report!

Fixed bugs:

  • Cannot initialize SchemaType from invalid String value #87

Merged pull requests:

- 2019-10-19 13:25:02

This release adds Linux support and improves compatibility with .xlsx files that contain shared strings. Thanks to @CloseServer, @funnel20 and @LiewLi for bug reports and contributions!

Implemented enhancements:

  • Bump XMLCoder to 0.9.0, add CI jobs for Linux, Xcode 11 #86 (MaxDesiatov)

Fixed bugs:

  • Multi-line text in an Excel cell is parsed into single line in the SharedStrings property text #83

Closed issues:

  • I crashed while calling try file.parsesharedstrings() with an error #79

Merged pull requests:

- 2019-07-12 10:38:19

Feature and bugfix release that makes the library compatible with more spreadsheet types. It also adds support for Comments structure, which can be parsed with the new parseComments API.

Many thanks to @grin, @GoldenJoe and @LiewLi for reporting and fixing issues in this release.

Closed issues:

  • parseDocumentPaths has internal protection, but is needed by parseDocumentRelationships #74 (GoldenJoe)
  • Missing Documentation #73 (GoldenJoe)

Merged pull requests:

- 2019-05-25 13:42:00

Bugfix release that improves compatibility with different spreadsheet types.

Thanks to @grin for reporting and fixing issues in this release.

Breaking changes

All properties on struct Format except fontId and numberFormatId are now optional.

Additions

New borderId and fillId properties on struct Format.

Fixed bugs

  • Can't get cell string #58
  • Can't load basic spreadsheets created in Google Docs #64
  • fillId and borderId attributes missing from CoreXLSX.Format #65

Merged pull requests

- 2019-05-09 09:17:14

Bugfix release that adds case externalLink to Relationship.SchemaType improving .xlsx compatibility.

- 2019-05-02 21:20:37

This is a bugfix release with changes to the model API that improve compatibility with files containing formulas and varied shared strings formats.

Specifically:

  • new struct Formula added with a corresponding property on struct Cell
  • property color on struct Properties became optional
  • properties on struct RichText became optional
  • new chartsheet case added to enum Relationship
  • richText on struct SharedStrings became an array, not optional

Closed issues

  • Error Domain=NSCocoaErrorDomain Code=4865 "Expected String but found null instead." #59
  • Importing XLSX file #56
  • Error ParseCellContent #51
  • error parseWorksheet #50
  • Couldn't find end of Start Tag c #37

Merged pull requests

- 2019-04-18 17:28:31

This is a release with API additions and bug fixes.

This release of CoreXLSX can be integrated as a Swift 5 module if you're using Xcode 10.2, but support for Swift 4.2 and earlier Xcode 10 versions is also maintained.

Compatibility is improved for big files and files that internally contain namespaced XML. A few other previously reported compatibility issues are now fixed. Many thanks to everyone who reported the issues, the improvements in this release wouldn't be possible without your contribution!

Breaking changes

Several properties on the model types became optional when there's no guarantee they are always available in files generated by different apps and tools.

Additions

Now you can parse style information from the archive with the new parseStyles() function. Please refer to the Styles model for more details. Please note that not all XLSX files contain style information, so you should be prepared to handle the errors thrown from parseStyles() function in that case.

Merged pull requests

0.4.0 - 2019-02-07 11:18:01

This is a release with API improvements and bug fixes. A big thank you to everyone who provided bug reports and contributions that made this release possible!

Breaking changes

  • A few properties on the model types were added with cleaner names and better fitting types. Most of the old versions of those properties were kept as deprecated, but you might get some breakage with optionality, where we couldn't find a good deprecation path.

Additions

  • New parseSharedStrings function on XLSXFile allows you get values of cells with shared string value. Quite frequently those strings are unavailable and are only referenced in the original model types you get with parseWorksheet.

  • Previously when addressing cells and columns you had to use a stringly-typed API. It was also not very convenient for specifying a range of columns. This is now fixed with the new type-safe ColumnReference struct, which conforms to Comparable and Strideable.

  • Following the addition of an error context to XMLCoder, which is the main dependency of CoreXLSX, it is now exposed on struct XLSXFile. Pass a non-zero value to errorContextLength argument (default is 0) of XLSXFile initializer and you'll get a snippet of XML that failed to parse in the debug description of the error value.

  • Additional optional argument bufferSize was added to XLSXFile initializer as a response to previous reports about problems with zip file extraction. The default value is 10 MiB, which seems to be enough in most cases, but you can still try passing a larger value for bigger files if you see that an XML file stops abruptly in the middle of the file. Unfortunately, we haven't found a good way to adjust this value dynamically based on the file size, but please let us know if you did.

  • Support for Carthage was added as well as support for tvOS and watchOS.

Bugfixes

Some files that couldn't be previously parsed should now be handled better thanks to fixes in optionality and more properties added to the model types.

All changes

- 2018-11-13 10:34:12

  • Improve Worksheet model property naming (#2). Some properties on Worksheet and its descendants had obscure names, most of that is fixed now with old names marked as deprecated.

- 2018-11-12 09:09:39

Refine README.md to include implementation details.

- 2018-11-11 20:06:44

Refine code comments and links in README.md

- 2018-11-11 14:34:00

  • Update README.md with instructions for Swift Package Manager.

- 2018-11-11 12:17:23

  • Cell by row/column filtering API with worksheetCache (#1) This new API allows users to filter all cells by a row or column reference. To avoid re-parsing of worksheets, a new private worksheetCache property is added on XLSXFile.

- 2018-11-10 19:55:59

Added macOS 10.11 deployment target to the podspec

- 2018-11-10 19:28:33

Improved README, fixed podspec

- 2018-11-10 19:22:10

First release with reading support for basic .xlsx files