Skip to content

scrapy-plugins/scrapy-monkeylearn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scrapy-monkeylearn

A Scrapy pipeline to categorize items using MonkeyLearn.

Settings

MONKEYLEARN_BATCH_SIZE

The size of the item batches sent to MonkeyLearn.

Default: 200

Example:

MONKEYLEARN_BATCH_SIZE = 200

MONKEYLEARN_MODULE

The ID of the monkeylearn module.

Example:

MONKEYLEARN_MODULE = 'cl_oFKL5wft'

MONKEYLEARN_USE_SANDBOX

In case of using a classifier, if the sandbox version should be used.

Default: False

Example:

MONKEYLEARN_USE_SANDBOX = True

MONKEYLEARN_TOKEN

The auth token.

Example:

MONKEYLEARN_TOKEN = 'TWFuIGlzIGRp...'

MONKEYLEARN_FIELD_TO_PROCESS

A field or list of Item text fields to use for classification. Also comma-separated string with field names is supported.

Example:

MONKEYLEARN_FIELD_TO_PROCESS = 'title'
MONKEYLEARN_FIELD_TO_PROCESS = ['title', 'description']
MONKEYLEARN_FIELD_TO_PROCESS = 'title,description'

MONKEYLEARN_FIELD_OUTPUT

The field where the MonkeyLearn output will be stored.

Example:

MONKEYLEARN_FIELD_OUTPUT = 'categories'

An example value of the MONKEYLEARN_FIELD_OUTPUT field after classification is:

[{'label': 'English', 'probability': 0.321}]

Usage

In your settings.py file, add the previously described settings and add MonkeyLearnPipeline to your pipelines, e.g.:

ITEM_PIPELINES = {
    'scrapy_monkeylearn.pipelines.MonkeyLearnPipeline': 100,
}

License

Copyright (c) 2015 MonkeyLearn.

Released under the MIT license.

About

A Scrapy pipeline to categorize items using MonkeyLearn

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages