Skip to content

todd-cook/ML-You-Can-Use

Repository files navigation

ML-You-Can-Use

Build Status CircleCI codecov.io

Practical Machine Learning and Natural Language Processing with examples.

Featuring

  • Interesting applications of ML, NLP, and Computer Vision
  • Practical demonstration notebooks
  • Reproducible experiments
  • Illustrated best practices:
    • Code extracted from notebooks for:
      • automatic formatting with Black
      • Type checking via MyPy annotations
      • Linting via Pylint
      • Doctests whenever possible

Setup

Download this repo using git with the submodule command, e.g.:

git pull --recurse-submodules

Submodules are used to pull in some data and external data processing utilities that we'll use for preprocessing some of the data.

Install Python 3

Create Virtual Environment

mkdir p3
 `which python3` -m venv ./p3
 source setPythonHashSeed.sh
 source p3/bin/activate

Install Requirements

pip install -r requirements.txt

For running all notebook examples

pip install -r requirements-dev.txt

Note: some examples will have a conda environment.yaml file that you will want to use.

Installing Test Corpora

Many notebooks use data that needs to be installed, do so by running the install script.

install_corpora.sh

  • installs Python ssl certificates
  • installs CLTK data for Latin and Greek
  • installs NLTK data

Testing

./runUnitTests.sh

Interactivity

juypter notebook

Notebooks

Getting data

Labeling Data

Modeling Language

Detecting Duplicate Documents

Classifying Texts

Detecting Loanwords

Wikipedia Corpus Processing

Quality Embeddings

Computer Vision - Object Detection

Summarizing Texts

Searching and Search Relevance

References and Acknowledgements