Swiftpack.co is a collection of thousands of indexed Swift packages. Search packages.
SwiftDeepNeuralNetwork is a Swift library based on SwiftMatrix library and it aims to provide a step by step guide to Deep Neural Network implementation. It's largely based on code developed in Neural Networks and Deep Learning course by Andrew Ng on Coursera.
SwiftDeepNeuralNetwork has been developed in order to show all necessary steps to write code for Deep Learning in the easiest possible way
Using XCode v.12 and upper add a package dependancy by selecting File > Swift Packages > Add Package Dependancy ... and entering following URL https://github.com/theolternative/SwiftDeepNeuralNetwork.git
SwiftDeepNeuralNetwork uses Swift 5 and Accelerate Framework and SwiftMatrix as a dependancy.
SwiftDeepNeuralNetwork is available under the MIT license. See the LICENSE file for more info.
You can check Tests for examples. Currently DNN supports binary, multiclass and multitask classification. Basically
- Define a train set X, Y where X is a
n x minput matrix containing
nfeatures each and a
r x mY output matrix of labels. All values must be
- Create an instance of
DeepNeuralNetworkpassing an array of
Intwhere the first element is the number of features
n, the last one is always
r(number of rows of Y output matrix) while the others specify number of units per layer. [2000, 20, 5, 1] means that there are 1 input layer of 2000 features (and must match with number of rows of X), 2 hidden layers of 20 and 5 units each and and outpute layer.
- Train your model by calling function
- Make predictions on test set with
L_layer_model_test( X_test, Y_test)where X_test must have size
n x pand Y_test size
1 x pwhere p >= 1
// XOR example let X = Matrix([ [0,0], [0,1], [1,0], [1,1] ])′ let Y = Matrix([[ 0, 1, 1, 0 ]]) let dnn = DeepNeuralNetwork(layerDimensions: [2, 2, 1], X: X, Y: Y) dnn.learning_rate = 0.15 dnn.num_iterations = 1000 let (accuracy, costs) = dnn.L_layer_model_train(nil)
You can initialize a new model as
let dnn = DeepNeuralNetwork( layerDimensions: [Double], X: Matrix, Y: Matrix, type : Classification )
where layerDimensions is an array of
Int specyfing number of units per layer, X is the features matrix, Y is the labels matrix and type must be a value between
.binary | .multiclass | .multitask; if omitted it falls back to
Let's suppose you're building a model to identify if a set of images contains food.
.binary classification you train your model to identify one kind of food (i.e. pizza) in each image. In this case you build a model for each food you want to identify. Y matrix has only 1 row.
- Model 1 for "Pizza" -> Y[0,i] = 1.0 if image identified by features X[:,i] is a pizza, 0.0 otherwise
- Model 2 for "Lasagna" > Y[0,i] = 1.0 if image identified by features X[:,i] is a pizza, 0.0 otherwise etc.
.multiclass classification you train your model to identify one kind of food in each image among a set of different foods (i.e. pizza, lasagna, hamburger). In this case you build a unique model for all foods you want to identify. Y matrix has as many rows as the number of foods to be identified ("classes") so i.e. first row can be labeled as "Pizza", second row for "Lasagna", third row for "Hamburger". Only one row per column ("example") can be 1.0 (food is present in image), all others must be 0.0 (food is not present in image)
Y[:,i] = [[1.0], [0.0], [0.0]] // i-nth image is a pizza Y[:,i+1] = [[0.0], [0.0], [1.0]] // i+1-nth image is a hamburger
.multitask classification you train your model to identify more kinds of food (i.e. pizza, lasagna, hamburger) in each image. In this case you build a unique model for all foods you want to identify. Y matrix has as many rows as the number of food to be identified ("classes") so i.e. first row can be labeled as "Pizza", second row for "Lasagna", third row for "Hamburger". Possible values can be 1.0 (food is present), 0.0 (food is not present), -1.0 (food was not labeled meaing that we don't know if it's present or not). Please note that if all foods are present in the same image, all rows will be 1.0.
Y[:,i] = [[1.0], [0.0], [0.0]] // i-nth image contains only pizza Y[:,i+1] = [[1.0], [0.0], [1.0]] // i+1-nth image contains pizza and hamburger Y[:,i+2] = [[-1.0], [1.0], [-1.0]] // i+2-nth image contains lasagna while pizza and hamburger were not labeled
// Binary classification let dnn1 = DeepNeuralNetwork( layerDimensions: [10000,20,5,1], X: X_train, Y: Y_train) // Multiclass classification let dnn2 = DeepNeuralNetwork( layerDimensions: [10000,200,100,20,4], X: X_train1, Y: Y_train1, type: .multiclass ) // Multitask classification let dnn2 = DeepNeuralNetwork( layerDimensions: [10000,200,100,20,4], X: X_train1, Y: Y_train1, type: .multitask )
Setting up training
Following hyperparameters can be set before training
Default value: 0.0075
Learning rate determines the overall velocity of gradient descent. Low values (<0.001) can lead to slow performance but better accuracy while high values can lead to better performance but worse accuracy or instability.
dnn.learning_rate = 0.15
Hint: start with low values (0.0001) and increase by an order if too slow
Number of epochs
Default value: 2500
This hyperparameter controls how many iterations are performed by gradient descent. Low values can lead to better performance but worse accuracy while high values lead to better accuracy with performance as a trade-off.
dnn.num_iterations = 2500
Weights' matrices must be initialized with random values in order for the training to work. Following types are available:
||Values set to 0 (only for debug)|
||Random values in range -1.0....1.0|
||Multiply random values -1.0...1.0 by factor
dnn.weigth_init_type = .he
Two types of regularization in order to reduce variance between train and test sets, are provided: L2 and Dropout
Default value: 0.0
If a value greater than 0.0 is set for
λ it will be applied.
dnn.λ = 0.7
Default value: 1.0
Dropout Regularization randomly shuts off some elements in weights matrices so that a percentage of
keep_prob is not zero.
If a value less than 1.0 is set for
keep_prob, dropout regularization will be applied.
dnn.keep_prob = 0.85 // 15% elements of weights matrices will be 0
Following choices are available
||Gradient descent with momentum|
||Adam optimization method|
.momentum optmization method
β factor can be set (deafult is 0.1)
.adam optmization method
β2 factors can be set (deafults are 0.9 and 0.999)
dnn.optimization_type = .adam dnn.β1 = 0.91 dnn.β2 = 0.98
Batch type determines how many examples are computed all at a time.
||All examples at a time|
||1 example at a time|
dnn.batch_type = .minibatch dnn.mini_batch_size = 64
Hint: Stochastic gradient descent is very slow but can lead to better accuracy after fewer epochs. When examples tend to become many (>1000) training times become longer and Minibatch approach should be the default choice