CreateML 101

This tutorial is aimed at the student level, and you can find expanded information on Apple's website and in the related documentation they provide.

It will run you through the creation of a simple ML app that should be sufficient for basic tasks using common data sets.

A typical ML pipeline involves the following:

Gathering. Sourcing data sets that will be used for input and output data for your model.
Selecting. Opting for a model architecture that suits your purposes and data type.
Training. Allowing the data to be processed by the software to 'train' its ability to identify and differentiate.
Integrating (Deployment). Bringing the model you've trained into your app itself.

1 - Gathering the Dataset

For the purposes of this tutorial, we'll use an established data set with our model. You can find it >here<. Download, extract and place it somewhere locally. This is sufficient for the training classifier.

Sadly, CreateML is unable to understand any dataset formats, so you'll have to mark it up manually using a pre-existing tool (ML Highlight) or convert it to a CreateML format (convert_dataset.py).

*NOTE: x, y coordinates are in the centre of your bounding box.*

A CreateML Annotations.json file looks like the below:

[
    {
        "image": "66179910a09e.jpg", 
        "annotations": [
            {
                "label": "66179910a09e.jpg", 
                "coordinates": {"x": 787, "y": 789, "width": 530, "height": 592
                }
            }
        ]
    }, 
    {...}
]

2.1 - Train Model using CreateML

To simplify the process we'll use a standard Apple model integrated into the OS, and we'll train only top-level layers using existing layers as a feature extractor. This saves time and disk space.

Classification

New document -> Image Classification
We assume that we must only classify by orders (without any additional elements)
Drag your dataset folder to Training Data (though splitting data on training / validation / testing is preferable)
You are able to see classes (one folder is one class) in the dataset by clicking on them in Data Sources
Set up the desired number of iterations (the default is 25) and hit Start training

You are also able to add augmentation. Augmentation is a process of additional distortion of input images. This artificially extends the training dataset for varying noises, sizes, etc. and as result allows to make the model more robust and reduce overfitting effect.
As your model is trained based on a standard Apple model, it uses low-level layers as a feature extractor and trains only high-level layers for your task. Firstly you'll only see Extracting Features, then afterwards you'll see Training

Typically the training process can take weeks utilising high-performance GPU(s), but our task is relatively simple and it should only take a couple of hours.
When your model has finished being trained, you can check how it works in the Preview tab. You'll likely find that it's not working very well. There can be several reasons for this, such as noisy or incoherent input data, limited dataset, inappropriate architecture or task-related issues.
Now export your model to an .mlmodel file and integrate it into your XCode project (Chapter 3)

Detection

Detection works essentially the same way, but your source data will need annotation. This process can also take longer. Note that transfer learning (the training of only top-level layers) is only supported on iOS 14 and above.

Take a break and let the task run for a couple of days. I would recommend making snapshots intermittently in case the training process fails. Also, be sure to check in Preview that you're actually training what you need to train.

*NOTE: x, y snapshot with the smallest training loss can be just overfitted and have worse precision on non training data.*

2.2 - Train the Model using Playground

If you like, you can also train your model manually in Playground:

You'll need MacOS Playground, as others don't support CreateML
It's virtually the same as training using CreateML, but the process is manual
Two options are available here: session or train immediately. Session is more controllable and stable, but training immediately is simpler, so we'll opt for that.

To begin, load your dataset:

let dataSource = MLImageClassifier.DataSource.labeledDirectories(at: URL(fileURLWithPath: "DATASET_PATH/ArTaxOr"))

Now split one on training and evaluating subsets:

let splitData = try! dataSource.stratifiedSplit(proportions: [0.8, 0.2])
let trainData = splitData[0]
let testData = splitData[1]

Prepare training parameters (add some augmentation and additional iterations):

let augmentation = MLImageClassifier.ImageAugmentationOptions(arrayLiteral: [
 //MLImageClassifier.ImageAugmentationOptions.blur,
 //MLImageClassifier.ImageAugmentationOptions.exposure,
 //MLImageClassifier.ImageAugmentationOptions.flip,
 //MLImageClassifier.ImageAugmentationOptions.noise,
 MLImageClassifier.ImageAugmentationOptions.rotation
])
let trainParams = MLImageClassifier.ModelParameters(validation: MLImageClassifier.ModelParameters.ValidationData.split(strategy: .automatic),
                                                 maxIterations: 1000,
                                                 augmentation: augmentation)

Create and train the classifier:

let classifier = try! MLImageClassifier(trainingData: trainData, parameters: trainParams)

Now we can estimate some resulting metrics (precision and loss):

/// Classifier training accuracy as a percentage
let trainingError = classifier.trainingMetrics.classificationError
let trainingAccuracy = (1.0 - trainingError) * 100
let validationError = classifier.validationMetrics.classificationError
let validationAccuracy = (1.0 - validationError) * 100
/// Evaluate the classifier
let classifierEvaluation = classifier.evaluation(on: testData)
let evaluationError = classifierEvaluation.classificationError
let evaluationAccuracy = (1.0 - evaluationError) * 100

To finish up, sign and save your model:

// Save model
let homePath = URL(fileURLWithPath: workPath)
let classifierMetadata = MLModelMetadata(author: "George Ostrobrod",
                                      shortDescription: "Predicts order of insect.",
                                      version: "1.0")
try classifier.write(to: homePath.appendingPathComponent("InsectOrder.mlmodel"),
                  metadata: classifierMetadata)

3 - Integration into your App

Preparations

Create or open your project in XCode
Drag your .mlmodel file into your project
Add CoreML, ImageIO and Vision frameworks
In UI we need to add:
- An image view
- A label for results
- A button for picking the image
Set the following string keys in Info.plist:
- Privacy - Camera Usage Description
- Privacy - Photo Library Usage Description

Classifier

Pick image and call your classifier for it (part of UIImagePickerControllerDelegate implementation):

func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
 picker.dismiss(animated: true)

 let image = info[UIImagePickerController.InfoKey.originalImage] as! UIImage
 imageView.image = image
 updateClassifications(for: image)
}

Selecting a source image from gallery/camera and its infrastructure you can see in demo code.

Set up classifier (called in the next step). Here we load our model and set up a callback for processing the results of its running:

lazy var classificationRequest: VNCoreMLRequest = {
 do {
     let model = try VNCoreMLModel(for: ArTaxOrders(configuration: MLModelConfiguration()).model)

     let request = VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
         self?.processClassifications(for: request, error: error)
     })
     request.imageCropAndScaleOption = .centerCrop
     return request
 } catch {
     fatalError("Failed to load Vision ML model: \(error)")
 }
}()

Set up request (called from image picking method). Here we call our classifier for an image:

func updateClassifications(for image: UIImage) {
 classificationLabel.text = "Classifying..."

 let orientation = CGImagePropertyOrientation(image.imageOrientation)
 guard let ciImage = CIImage(image: image) else { fatalError("Unable to create \(CIImage.self) from \(image).") }

 DispatchQueue.global(qos: .userInitiated).async {
     let handler = VNImageRequestHandler(ciImage: ciImage, orientation: orientation)
     do {
         try handler.perform([self.classificationRequest])
     } catch {
         print("Failed to perform classification.\n\(error.localizedDescription)")
     }
 }
}

Set up processing of classification result (callback in request). Here we receive the result of classification and process it. In this example we simply show it in our label. As we use a classifying model our result is VNClassificationObservation, but for regression it would be VNCoreMLFeatureValueObservation, and VNPixelBufferObservation for segmentation or another image-to-image model:

func processClassifications(for request: VNRequest, error: Error?) {
 DispatchQueue.main.async {
     guard let results = request.results else {
         self.classificationLabel.text = "Unable to classify image.\n\(error!.localizedDescription)"
         return
     }

     let classifications = results as! [VNClassificationObservation]
     if classifications.isEmpty {
         self.classificationLabel.text = "Nothing recognized."
     } else {
         // Display top classifications ranked by confidence in the UI.
         let topClassifications = classifications.prefix(2)
         let descriptions = topClassifications.map { classification in
             return String(format: "  (%.2f) %@", classification.confidence, classification.identifier)
         }
         self.classificationLabel.text = "Classification:\n" + descriptions.joined(separator: "\n")
     }
 }
}

Object detection

Put your detection model in your project

Set up the detector:

lazy var detectionRequest: VNCoreMLRequest = {
 do {
     let model = try VNCoreMLModel(for: InsectDetector(configuration: MLModelConfiguration()).model)

     let request = VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
         self?.processDetection(for: request, error: error)
     })
     request.imageCropAndScaleOption = .centerCrop
     return request
 } catch {
     fatalError("Failed to load Vision ML model: \(error)")
 }
}()

Set up request:

func updateDetector(for image: UIImage) {
 let orientation = CGImagePropertyOrientation(image.imageOrientation)
 guard let ciImage = CIImage(image: image) else { fatalError("Unable to create \(CIImage.self) from \(image).") }

 DispatchQueue.global(qos: .userInitiated).async {
     let handler = VNImageRequestHandler(ciImage: ciImage, orientation: orientation)
     do {
         try handler.perform([self.detectionRequest])
     } catch {
         print("Failed to perform detection.\n\(error.localizedDescription)")
     }
 }
}

Process detecting result and draw overlay:

var overlays = [UIView]()
/// Updates the UI with the results of the classification.
func processDetection(for request: VNRequest, error: Error?) {
 DispatchQueue.main.async {
     guard let results = request.results else {
         return
     }

     for view in self.overlays {
         view.removeFromSuperview()
     }

     let detections = results as! [VNRecognizedObjectObservation]
     for object in detections {
         print(object.labels[0])
         let objectBounds = self.detectedRectToView(object.boundingBox)

         let view = UIView(frame: objectBounds)
         view.backgroundColor = UIColor(displayP3Red: 1.0, green: 0.0, blue: 1.0, alpha: 0.25)
         self.overlays.append(view)
         self.imageView.addSubview(view)
     }
 }
}

4 - Result app

You can find here a project of the demo app (for iOS and MacOS), CreateML projects, some Python-scripts for converting datasets into the CreateML format and the above tutorial in Markdown.

References

George Ostrobrod, 2021