Introduction to Deep Learning #2

Object detection

Use case: object detection on heritage images

Goals

Automatic annotation of objects on heritage images has uses in the field of information retrieval and digital humanities. Depending on the scenarios considered, this may involve obtaining a new source of textual metadata ("this image contains a cat, a child and a sofa") or locating every object classes of interest within the image (in this image, there is a car at position x,y,w,h).

These goals can be satisfy with "out-of-the-box" services or customized solutions.

^{Vogue magazine, French edition, 1922}

Hands-on session

"Out-of-the-box" services

YOLO

YOLO performs object detection on a 80 classes model. YOLO is well known to be fast and accurate.

This Python 3 script uses a YOLO v4 model that can be easily downloaded from the web. The images of a Gallica document are first loaded thanks to the IIIF protocol. The detection then occurs and annotated images are generated, as well as the CSV data.

Display the Jupyter notebook with nbviewer

Launch the notebook with Binder:

^{ark:/12148/bpt6k46000341}

Google Cloud Vision, IBM Watson Visual Recognition, Clarifai...

These APIs may be used to perform objects detection. They are trained on huge datasets of thousands of object classes (like ImageNet) and may be useful for XXth century heritage content. These datasets are primarily aimed at photography, but the generalizability of artificial neural networks means that they can produce acceptable results for drawings and prints.

The Perl script described here calls the Google or IBM APIs.

> perl toolbox.pl -CC datafile -google

Note: IBM Watson Visual Recognition is discontinued. Existing instances are supported until 1 December 2021.

The API endpoint is simply called with a curl command sending the request to the API as a JSON fragment including the image data and the features expected to be returned:

> curl --insecure  -v -s -H "Content-Type: application/json" https://vision.googleapis.com/v1/images:annotate?key=yourKey --data-binary @/tmp/request.jso

  ...
	"features": [
			{
				"type": "LABEL_DETECTION"
			},
			{
				"type": "CROP_HINTS"
			},
			{
				"type": "IMAGE_PROPERTIES"
			}
		], ...

See also with Recipe which makes use of IBM Watson API to call a model previously trained with Watson Studio.

Cost, difficulties: Analyzing an image with such APIs costs a fraction of a cent per image. Processing can be done entirely using the web platform or with a minimal coding load.

Customized solutions

Transfert learning

Out-of-the box solutions use pretrained models. Transfert learning means to cut-off the last classification layer of these models and transfert the "model's knowledge" to a local problem, i.e. the set of images and objects one needs to work with.

Transfer learning and domain adaptation refer to the situation where what has been learned in one setting … is exploited to improve generalization in another setting. (Deep Learning, Ian Goodfellow and al., 2016)

Google Cloud Vision and other commercial framework can be used for training a specific object detector on custom data. Training can be done on the web platform (e.g. AutoML Vision) or using APIs. The trained models can then be deployed in the cloud or locally.

Same is true for YOLO, using a commercial web app like Roboflow or local code.

Open source AI platforms all offers APIs to apply transfert learning. This Google Colab Jupyter script from the MODOAP project uses tf.keras (the high level API of TensorFlow) to train a classification model. Training images must be stored on a Google drive.

Cost, difficulties: Training means having annotated images available, which implies some preliminary work, and some computing power to train the model. Depending on the context and the expected performance, tens or hundreds of annotated images may be required. For commercial products, pricing is higher when using a trained model.

Training from scratch

There is almost no reason to start from complete scratch, as the pretreained models tend to generalize well to other tasks, and will reduce overfitting then starting from small dataset of images.

Use cases

Information Retrieval: the labels of the object classes are used as metadata and generally feed the library search engine.
- GallicaPix web app;
- Digitens project: indexing wallpaper and textile design patterns from the The National Archives and the BnF
- Standford University Library: Clustering and Classification on all public images
- Artificial intelligence @ the National Library of Norway

Digital Humanities: in this context, labels and bounding boxes are used for retrieval or data mining scenarii.
- Helsinki Digital Humanities Hackathon 2019: data analysis of newspapers illustrated adds regarding transport means
- Numapress project: data analysis and information retrieval on the newspapers movie section (1900-1945):

Telecom Paris-Tech, Nicolas Gonthier: Weakly Supervised Object Detection in Artworks

Resources

Object Detection in a Nutshell
Object annotation tools:
- VGG Image Annotator
- CVAT

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
binder		binder
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

binder

binder

images

images

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Introduction to Deep Learning #2

Goals

Hands-on session

"Out-of-the-box" services

YOLO

Google Cloud Vision, IBM Watson Visual Recognition, Clarifai...

Customized solutions

Transfert learning

Training from scratch

Use cases

Resources

About

Releases

Packages

Contributors 2

Languages

License

altomator/Introduction_to_Deep_Learning-2-Object_Detection

Folders and files

Latest commit

History

Repository files navigation

Introduction to Deep Learning #2

Goals

Hands-on session

"Out-of-the-box" services

YOLO

Google Cloud Vision, IBM Watson Visual Recognition, Clarifai...

Customized solutions

Transfert learning

Training from scratch

Use cases

Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Languages