scrapy-monkeylearn

A Scrapy pipeline to categorize items using MonkeyLearn.

Settings

MONKEYLEARN_BATCH_SIZE

The size of the item batches sent to MonkeyLearn.

Default: 200

Example:

MONKEYLEARN_BATCH_SIZE = 200

MONKEYLEARN_MODULE

The ID of the monkeylearn module.

Example:

MONKEYLEARN_MODULE = 'cl_oFKL5wft'

MONKEYLEARN_USE_SANDBOX

In case of using a classifier, if the sandbox version should be used.

Default: False

Example:

MONKEYLEARN_USE_SANDBOX = True

MONKEYLEARN_TOKEN

The auth token.

Example:

MONKEYLEARN_TOKEN = 'TWFuIGlzIGRp...'

MONKEYLEARN_FIELD_TO_PROCESS

A field or list of Item text fields to use for classification. Also comma-separated string with field names is supported.

Example:

MONKEYLEARN_FIELD_TO_PROCESS = 'title'

MONKEYLEARN_FIELD_TO_PROCESS = ['title', 'description']

MONKEYLEARN_FIELD_TO_PROCESS = 'title,description'

MONKEYLEARN_FIELD_OUTPUT

The field where the MonkeyLearn output will be stored.

Example:

MONKEYLEARN_FIELD_OUTPUT = 'categories'

An example value of the MONKEYLEARN_FIELD_OUTPUT field after classification is:

[{'label': 'English', 'probability': 0.321}]

Usage

In your settings.py file, add the previously described settings and add MonkeyLearnPipeline to your pipelines, e.g.:

ITEM_PIPELINES = {
    'scrapy_monkeylearn.pipelines.MonkeyLearnPipeline': 100,
}

License

Released under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
scrapy_monkeylearn		scrapy_monkeylearn
.gitignore		.gitignore
README.rst		README.rst
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly