
This tutorial is aimed at the student level, and you can find expanded information on Apple's website and in the related documentation they provide.
It will run you through the creation of a simple ML app that should be sufficient for basic tasks using common data sets.
A typical ML pipeline involves the following:
Gathering. Sourcing data sets that will be used for input and output data for your model.Selecting. Opting for a model architecture that suits your purposes and data type.Training. Allowing the data to be processed by the software to 'train' its ability to identify and differentiate.Integrating (Deployment). Bringing the model you've trained into your app itself.For the purposes of this tutorial, we'll use an established data set with our model. You can find it >here<. Download, extract and place it somewhere locally. This is sufficient for the training classifier.
Sadly, CreateML is unable to understand any dataset formats, so you'll have to mark it up manually using a pre-existing tool (ML Highlight) or convert it to a CreateML format (convert_dataset.py).

*NOTE: x, y coordinates are in the centre of your bounding box.*
A CreateML Annotations.json file looks like the below:
[
{
"image": "66179910a09e.jpg",
"annotations": [
{
"label": "66179910a09e.jpg",
"coordinates": {"x": 787, "y": 789, "width": 530, "height": 592
}
}
]
},
{...}
]
To simplify the process we'll use a standard Apple model integrated into the OS, and we'll train only top-level layers using existing layers as a feature extractor. This saves time and disk space.
New document -> Image ClassificationTraining Data (though splitting data on training / validation / testing is preferable)

Data Sources

Set up the desired number of iterations (the default is 25) and hit Start training

You are also able to add augmentation. Augmentation is a process of additional distortion of input images. This artificially extends the training dataset for varying noises, sizes, etc. and as result allows to make the model more robust and reduce overfitting effect.
As your model is trained based on a standard Apple model, it uses low-level layers as a feature extractor and trains only high-level layers for your task. Firstly you'll only see Extracting Features, then afterwards you'll see Training
Typically the training process can take weeks utilising high-performance GPU(s), but our task is relatively simple and it should only take a couple of hours.
Preview tab. You'll likely find that it's not working very well. There can be several reasons for this, such as noisy or incoherent input data, limited dataset, inappropriate architecture or task-related issues.

.mlmodel file and integrate it into your XCode project (Chapter 3)Detection works essentially the same way, but your source data will need annotation. This process can also take longer. Note that transfer learning (the training of only top-level layers) is only supported on iOS 14 and above.
Take a break and let the task run for a couple of days. I would recommend making snapshots intermittently in case the training process fails. Also, be sure to check in Preview that you're actually training what you need to train.

*NOTE: x, y snapshot with the smallest training loss can be just overfitted and have worse precision on non training data.*
If you like, you can also train your model manually in Playground:
MacOS Playground, as others don't support CreateMLsession or train immediately. Session is more controllable and stable, but training immediately is simpler, so we'll opt for that.let dataSource = MLImageClassifier.DataSource.labeledDirectories(at: URL(fileURLWithPath: "DATASET_PATH/ArTaxOr"))
let splitData = try! dataSource.stratifiedSplit(proportions: [0.8, 0.2])
let trainData = splitData[0]
let testData = splitData[1]
let augmentation = MLImageClassifier.ImageAugmentationOptions(arrayLiteral: [
//MLImageClassifier.ImageAugmentationOptions.blur,
//MLImageClassifier.ImageAugmentationOptions.exposure,
//MLImageClassifier.ImageAugmentationOptions.flip,
//MLImageClassifier.ImageAugmentationOptions.noise,
MLImageClassifier.ImageAugmentationOptions.rotation
])
let trainParams = MLImageClassifier.ModelParameters(validation: MLImageClassifier.ModelParameters.ValidationData.split(strategy: .automatic),
maxIterations: 1000,
augmentation: augmentation)
let classifier = try! MLImageClassifier(trainingData: trainData, parameters: trainParams)
/// Classifier training accuracy as a percentage
let trainingError = classifier.trainingMetrics.classificationError
let trainingAccuracy = (1.0 - trainingError) * 100
let validationError = classifier.validationMetrics.classificationError
let validationAccuracy = (1.0 - validationError) * 100
/// Evaluate the classifier
let classifierEvaluation = classifier.evaluation(on: testData)
let evaluationError = classifierEvaluation.classificationError
let evaluationAccuracy = (1.0 - evaluationError) * 100
// Save model
let homePath = URL(fileURLWithPath: workPath)
let classifierMetadata = MLModelMetadata(author: "George Ostrobrod",
shortDescription: "Predicts order of insect.",
version: "1.0")
try classifier.write(to: homePath.appendingPathComponent("InsectOrder.mlmodel"),
metadata: classifierMetadata)
.mlmodel file into your project

CoreML, ImageIO and Vision frameworksInfo.plist:Privacy - Camera Usage DescriptionPrivacy - Photo Library Usage DescriptionPick image and call your classifier for it (part of UIImagePickerControllerDelegate implementation):
func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
picker.dismiss(animated: true)
let image = info[UIImagePickerController.InfoKey.originalImage] as! UIImage
imageView.image = image
updateClassifications(for: image)
}
Set up classifier (called in the next step). Here we load our model and set up a callback for processing the results of its running:
lazy var classificationRequest: VNCoreMLRequest = {
do {
let model = try VNCoreMLModel(for: ArTaxOrders(configuration: MLModelConfiguration()).model)
let request = VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
self?.processClassifications(for: request, error: error)
})
request.imageCropAndScaleOption = .centerCrop
return request
} catch {
fatalError("Failed to load Vision ML model: \(error)")
}
}()
Set up request (called from image picking method). Here we call our classifier for an image:
func updateClassifications(for image: UIImage) {
classificationLabel.text = "Classifying..."
let orientation = CGImagePropertyOrientation(image.imageOrientation)
guard let ciImage = CIImage(image: image) else { fatalError("Unable to create \(CIImage.self) from \(image).") }
DispatchQueue.global(qos: .userInitiated).async {
let handler = VNImageRequestHandler(ciImage: ciImage, orientation: orientation)
do {
try handler.perform([self.classificationRequest])
} catch {
print("Failed to perform classification.\n\(error.localizedDescription)")
}
}
}
Set up processing of classification result (callback in request). Here we receive the result of classification and process it. In this example we simply show it in our label. As we use a classifying model our result is VNClassificationObservation, but for regression it would be VNCoreMLFeatureValueObservation, and VNPixelBufferObservation for segmentation or another image-to-image model:
func processClassifications(for request: VNRequest, error: Error?) {
DispatchQueue.main.async {
guard let results = request.results else {
self.classificationLabel.text = "Unable to classify image.\n\(error!.localizedDescription)"
return
}
let classifications = results as! [VNClassificationObservation]
if classifications.isEmpty {
self.classificationLabel.text = "Nothing recognized."
} else {
// Display top classifications ranked by confidence in the UI.
let topClassifications = classifications.prefix(2)
let descriptions = topClassifications.map { classification in
return String(format: " (%.2f) %@", classification.confidence, classification.identifier)
}
self.classificationLabel.text = "Classification:\n" + descriptions.joined(separator: "\n")
}
}
}
Set up the detector:
lazy var detectionRequest: VNCoreMLRequest = {
do {
let model = try VNCoreMLModel(for: InsectDetector(configuration: MLModelConfiguration()).model)
let request = VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
self?.processDetection(for: request, error: error)
})
request.imageCropAndScaleOption = .centerCrop
return request
} catch {
fatalError("Failed to load Vision ML model: \(error)")
}
}()
Set up request:
func updateDetector(for image: UIImage) {
let orientation = CGImagePropertyOrientation(image.imageOrientation)
guard let ciImage = CIImage(image: image) else { fatalError("Unable to create \(CIImage.self) from \(image).") }
DispatchQueue.global(qos: .userInitiated).async {
let handler = VNImageRequestHandler(ciImage: ciImage, orientation: orientation)
do {
try handler.perform([self.detectionRequest])
} catch {
print("Failed to perform detection.\n\(error.localizedDescription)")
}
}
}
Process detecting result and draw overlay:
var overlays = [UIView]()
/// Updates the UI with the results of the classification.
func processDetection(for request: VNRequest, error: Error?) {
DispatchQueue.main.async {
guard let results = request.results else {
return
}
for view in self.overlays {
view.removeFromSuperview()
}
let detections = results as! [VNRecognizedObjectObservation]
for object in detections {
print(object.labels[0])
let objectBounds = self.detectedRectToView(object.boundingBox)
let view = UIView(frame: objectBounds)
view.backgroundColor = UIColor(displayP3Red: 1.0, green: 0.0, blue: 1.0, alpha: 0.25)
self.overlays.append(view)
self.imageView.addSubview(view)
}
}
}

You can find here a project of the demo app (for iOS and MacOS), CreateML projects, some Python-scripts for converting datasets into the CreateML format and the above tutorial in Markdown.
George Ostrobrod, 2021