Skip to content

altomator/Introduction_to_Deep_Learning-2-Object_Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction to Deep Learning #2

Object detection

Use case: object detection on heritage images

Goals

Automatic annotation of objects on heritage images has uses in the field of information retrieval and digital humanities. Depending on the scenarios considered, this may involve obtaining a new source of textual metadata ("this image contains a cat, a child and a sofa") or locating every object classes of interest within the image (in this image, there is a car at position x,y,w,h).

These goals can be satisfy with "out-of-the-box" services or customized solutions.

Object detection on engraving material

Vogue magazine, French edition, 1922

Hands-on session

"Out-of-the-box" services

YOLO

YOLO performs object detection on a 80 classes model. YOLO is well known to be fast and accurate.

Hands-on This Python 3 script uses a YOLO v4 model that can be easily downloaded from the web. The images of a Gallica document are first loaded thanks to the IIIF protocol. The detection then occurs and annotated images are generated, as well as the CSV data.

Display the Jupyter notebook with nbviewer

Launch the notebook with Binder: Binder

Object detection on engraving material

ark:/12148/bpt6k46000341

Google Cloud Vision, IBM Watson Visual Recognition, Clarifai...

These APIs may be used to perform objects detection. They are trained on huge datasets of thousands of object classes (like ImageNet) and may be useful for XXth century heritage content. These datasets are primarily aimed at photography, but the generalizability of artificial neural networks means that they can produce acceptable results for drawings and prints.

Hands-on The Perl script described here calls the Google or IBM APIs.

> perl toolbox.pl -CC datafile -google

Note: IBM Watson Visual Recognition is discontinued. Existing instances are supported until 1 December 2021.

The API endpoint is simply called with a curl command sending the request to the API as a JSON fragment including the image data and the features expected to be returned:

> curl --insecure  -v -s -H "Content-Type: application/json" https://vision.googleapis.com/v1/images:annotate?key=yourKey --data-binary @/tmp/request.jso
  ...
	"features": [
			{
				"type": "LABEL_DETECTION"
			},
			{
				"type": "CROP_HINTS"
			},
			{
				"type": "IMAGE_PROPERTIES"
			}
		], ...

Hands-on See also with Recipe which makes use of IBM Watson API to call a model previously trained with Watson Studio.

Cost, difficulties: Analyzing an image with such APIs costs a fraction of a cent per image. Processing can be done entirely using the web platform or with a minimal coding load.

Customized solutions

Transfert learning

Out-of-the box solutions use pretrained models. Transfert learning means to cut-off the last classification layer of these models and transfert the "model's knowledge" to a local problem, i.e. the set of images and objects one needs to work with.

Transfer learning and domain adaptation refer to the situation where what has been learned in one setting … is exploited to improve generalization in another setting. (Deep Learning, Ian Goodfellow and al., 2016)

Google Cloud Vision and other commercial framework can be used for training a specific object detector on custom data. Training can be done on the web platform (e.g. AutoML Vision) or using APIs. The trained models can then be deployed in the cloud or locally.

Same is true for YOLO, using a commercial web app like Roboflow or local code.

Hands-on Open source AI platforms all offers APIs to apply transfert learning. This Google Colab Jupyter script from the MODOAP project uses tf.keras (the high level API of TensorFlow) to train a classification model. Training images must be stored on a Google drive.

Cost, difficulties: Training means having annotated images available, which implies some preliminary work, and some computing power to train the model. Depending on the context and the expected performance, tens or hundreds of annotated images may be required. For commercial products, pricing is higher when using a trained model.

Training from scratch

There is almost no reason to start from complete scratch, as the pretreained models tend to generalize well to other tasks, and will reduce overfitting then starting from small dataset of images.

Use cases

Object detection on patterns: lines

Object detection on newspapers illustrations

Resources

About

Object detection with Yolo, Google Vision or IBM Watson

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published