If you have used the Photos app and Google Photos app on the iPhone, you might have noticed, it generates videos automatically for you. Basically it will group similar photos or videos and then make a personalized video for you. How do they do it ? Let’s try doing the same using CoreML framework in iOS.
CoreML is a framework provided by apple to Integrate machine learning models into your app. Core ML supports Vision for analyzing images, Natural Language for processing text, Speech for converting audio to text, and Sound Analysis for identifying sounds in audio. Core ML itself builds on top of low-level primitives like Accelerate
and BNNS, as well as Metal Performance Shaders. You can read about CoreML here: https://developer.apple.com/documentation/coreml
To start using CoreML, you would need to build a CoreML model. Don’t worry this is easy. You can use the CreateML tool to build a CoreML model. You can build and train a model with the Create ML app bundled with Xcode. Models trained using Create ML are in the Core ML model format and are ready to use in your app. Alternatively, you can use a wide variety of other machine learning libraries and then use Core ML Tools to convert the model into the Core ML format. Once a model is on a user’s device, you can use Core ML to retrain or fine-tune it on-device, with that user’s data.
Creating the Machine Learning Model using CreateML app
Open the CreateML app by going here
Go ahead and create a image classifier project as shown below.
Give it an appropriate name. The next steps are pretty straightforward. For my example, I created a folder called boards. Inside the boards folder, I have added sub folders. In each sub folder, I have added multiple photos of longboards.
Now go ahead and add the boards folder to the training data in CreateML app.
Click on the preview option in CreateML. You can test the data you provided in this tab.
After testing your data, download the model here.
Let’s name this model boardclassifier.
Take this model and drop it inside the xcode project you created. Xcode will automatically create a model class for you. This is very important.
// MARK: COREML
lazy var classificationRequest: VNCoreMLRequest = {
do {
/*
Use the Swift class `MobileNet` Core ML generates from the model.
To use a different Core ML classifier model, add it to the project
and replace `MobileNet` with that model's generated Swift class.
*/
// darshan
let model = try VNCoreMLModel(for: boardclassifier().model)
let request = VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
self?.processClassifications(for: request, error: error)
})
request.imageCropAndScaleOption = .centerCrop
return request
} catch {
fatalError("Failed to load Vision ML model: \(error)")
}
}()
We are planning to do the following here. Read through the media available on your phone. Push it through the CoreML model. Get the confidence of each video. If the video has enough confidence we trim the video. Finally we merge the trimmed videos and merge it into a single personalized video.
Architecture
Code / Implementation Details
This is the data structure I use to store data for processing in this app.
class AssetData : Hashable {
var name: String = ""
var path: String = ""
var avasset: AVAsset?
var classificationTotalScore = 0.0
var classificationIdentifier = ""
var images:[UIImage]?
var documentDirectoryPath: URL?
var classifications:[VNClassificationObservation] = []
var maxLabel: String = ""
var trimmedPath: URL?
static func == (lhs: AssetData, rhs: AssetData) -> Bool {
if (lhs.name == rhs.name && lhs.path == rhs.path) {
return true
}
return false
}
func hash(into hasher: inout Hasher) {
hasher.combine(path)
}
}
Go through all the videos in the bundle folder and generate images asynchronously at regular period. Here I am generating the images every second. For the implementation refer this method in the code
func getImagesForAssetAsynchronously(assetData: AssetData, completionHandler: @escaping (AssetData, Bool)-> Void) {
Once the images are generated, pass it through the classifier to get its confidence level.
func updateClassificationsForAssetData(for assetData:AssetData, completionHandler: @escaping (AssetData)->Void) {
classificationText = "Classifying..."
DispatchQueue.global(qos: .userInitiated).sync {
for pos in (0..<assetData.images!.count) {
let image = assetData.images![pos]
let orientation = CGImagePropertyOrientation(image.imageOrientation)
guard let ciImage = CIImage(image: image) else { fatalError("Unable to create \(CIImage.self) from \(image).") }
let handler = VNImageRequestHandler(ciImage: ciImage, orientation: orientation)
do {
//AssetDataRequestHandler.currentlyProcessingAssetDataName = assetData
let visionModel = try VNCoreMLModel(for: boardclassifier().model)
try handler.perform([self.classificationRequest])
//print("classification finished for ",assetData.name, pos)
if pos >= assetData.images!.count - 1 {
//print("calling completion handler classification finished for ",assetData.name, pos)
assetData.classifications = AssetDataRequestHandler.classifications
AssetDataRequestHandler.classifications.removeAll()
completionHandler(assetData)
}
} catch {
/*
This handler catches general image processing errors. The `classificationRequest`'s
completion handler `processClassifications(_:error:)` catches errors specific
to processing that request.
*/
print("Failed to perform classification.\n\(error.localizedDescription)")
}
}
}
}
After finding out about the confidence level in each video for each label, we can trim the videos for which you have the highest confidence. This is done as shown below
// MARK: VIDEO HANDLERS
func trimVideos(assetData: AssetData, completionBlock: @escaping (Bool, AssetData?, URL?)->()) throws -> Void {
let exportSession = AVAssetExportSession(asset: assetData.avasset!, presetName: AVAssetExportPresetHighestQuality)
let outputURL = try FileManager.createDirectoryForTrimmedFiles()?.appendingPathComponent(assetData.name+".mov")
exportSession?.outputURL = outputURL
exportSession?.shouldOptimizeForNetworkUse = true;
exportSession?.outputFileType = AVFileType.mov;
let startTime = CMTimeMake(value: Int64(5), timescale: 1)
let stopTime = CMTimeMake(value: Int64(8), timescale: 1)
let range = CMTimeRangeFromTimeToTime(start: startTime, end: stopTime)
exportSession?.timeRange = range
exportSession?.exportAsynchronously(completionHandler: {
switch exportSession?.status {
case .failed:
print("Export failed: \(String(describing: exportSession?.error != nil ? exportSession?.error!.localizedDescription : "No Error Info"))")
case .cancelled:
print("Export canceled")
case .completed:
assetData.trimmedPath = outputURL
let notificationCenter = NotificationCenter.default
notificationCenter.post(name: Notification.Name("SendUpdatesToUser"), object: nil, userInfo: ["text":"completed trimming video \(assetData.name)"])
completionBlock(true, assetData, outputURL)
default:
break
}
})
}
Merge the trimmed videos into a single video with the code below
func merge(mlmodelName: String, arrayVideos:[AVAsset], filename:String, completion:@escaping (_ exporter: AVAssetExportSession, _ mlmodelName: String) -> ()) -> Void {
let mainComposition = AVMutableComposition()
let compositionVideoTrack = mainComposition.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid)
var insertTime = CMTime.zero
for videoAsset in arrayVideos {
try! compositionVideoTrack?.insertTimeRange(CMTimeRangeMake(start: CMTime.zero, duration: videoAsset.duration), of: videoAsset.tracks(withMediaType: .video)[0], at: insertTime)
// try! soundtrackTrack?.insertTimeRange(CMTimeRangeMake(start: CMTime.zero, duration: videoAsset.duration), of: videoAsset.tracks(withMediaType: .audio)[0], at: insertTime)
insertTime = CMTimeAdd(insertTime, videoAsset.duration)
}
compositionVideoTrack?.preferredTransform = arrayVideos[0].preferredTransform
let soundtrackTrack = mainComposition.addMutableTrack(withMediaType: .audio, preferredTrackID: kCMPersistentTrackID_Invalid)
do {
let fileManager = FileManager.default
let docsArray = try fileManager.contentsOfDirectory(atPath: Bundle.main.resourcePath!)
for doc in docsArray {
if doc == "Toybox.m4a" {
print(doc)
let audioAsset = AVAsset(url: URL(fileURLWithPath: Bundle.main.resourcePath! + "/Toybox.m4a"))
try soundtrackTrack?.insertTimeRange(
CMTimeRangeMake(
start: .zero,
duration: insertTime),
of: audioAsset.tracks(withMediaType: .audio)[0],
at: .zero)
}
}
} catch {
print("Failed to load Audio track")
}
do {
let outputFileURL = try FileManager.createDirectoryForTrimmedFiles()?.appendingPathComponent(filename+".mp4")
let fileManager = FileManager.default
if fileManager.fileExists(atPath: outputFileURL!.absoluteString) {
try fileManager.removeItem(atPath: outputFileURL!.absoluteString)
}
let exporter = AVAssetExportSession(asset: mainComposition, presetName: AVAssetExportPresetHighestQuality)
exporter?.outputURL = outputFileURL
exporter?.outputFileType = AVFileType.mov
exporter?.shouldOptimizeForNetworkUse = true
exporter?.exportAsynchronously {
DispatchQueue.main.async {
completion(exporter!, mlmodelName)
}
}
} catch {
print("error merging \(error)")
}
}
Once you get the merged video, this is a bonus step. You can add labels and text to it using the below method.
func addGraphicsToVideos() {
let videoEditor = VideoEditor()
var counter = 0
let notificationCenter = NotificationCenter.default
for assetData in personalizedAssets {
videoEditor.makeBirthdayCard(fromVideoAt: URL(fileURLWithPath: assetData.path), forName: assetData.name) { [self] graphicsVideoURL in
notificationCenter.post(name: Notification.Name("SendUpdatesToUser"), object: nil, userInfo: ["text":"Generated video using \(assetData.name)"])
counter += 1
print("video url \(graphicsVideoURL?.absoluteString ?? "")")
assetData.path = graphicsVideoURL!.path
if counter == personalizedAssets.count {
generatePersonalizedVideos.isHidden = false
activityIndicator.stopAnimating()
activityIndicator.isHidden = true
let alert = UIAlertController(title: "Information", message: "Finished generating personalized videos", preferredStyle: .alert)
alert.addAction(UIAlertAction(title: "OK", style: .default, handler: { _ in
// let player = AVPlayer(url: graphicsVideoURL!)
// let vc = AVPlayerViewController()
// vc.player = player
// present(vc, animated: true) {
// vc.player?.play()
// }
}))
self.present(alert, animated: true)
}
}
}
}
I hope this explains how we do it. If you feel lazy, the entire codebase is here https://github.com/kmdarshan/personalizedVideoCoreML
Leave a Reply