Callbacks for LDAMultiCore #3481

maciejskorski · 2023-06-20T22:17:48Z

This PR upgrades the multi-core implementation of LDA to use callbacks 💪.

Callbacks are critical for model evaluation in general, and have been requested in past for Gensim's model in particular 🙏.

A usage example on News20 dataset:

from gensim.models import LdaMulticore
from gensim.models.callbacks import CoherenceMetric, PerplexityMetric
from gensim.models import LdaMulticore, LdaModel

callback1 = CoherenceMetric(corpus=mm_corpus, dictionary=dictionary, coherence='u_mass', title='u_mass')
callback2 = CoherenceMetric(corpus=mm_corpus, texts=docs_tokenized, dictionary=dictionary, coherence='c_v', title='c_v',)
lda = LdaMulticore(mm_corpus, id2word=dictionary, num_topics=20, passes=20, batch=False, callbacks=[callback1,callback2])

# evaluation

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

metrics = pd.DataFrame(lda.metrics)
metrics.reset_index(names=['epoch'], inplace=True)
metrics['epoch'] = metrics['epoch']+1

fig,ax1 = plt.subplots()
ln1=ax1.plot(metrics['epoch'],metrics['u_mass'],label='$U_{mass}$',color='tab:red')
ax1.set_xlabel('epoch')
ax1.set_ylabel('$U_{mass}$')
ax2 = ax1.twinx()
ln2 = ax2.plot(metrics['epoch'],metrics['c_v'],label='$C_v$',color='tab:blue')
ax2.set_ylabel('$C_v$')
lines = ln1+ln2
labels = [l.get_label() for l in lines]
ax2.legend(lines, labels, loc=0)
plt.show()

This illustrates the point of using callbacks: we know how many epochs are sufficient to converge 🆒

Also, the doc string has been made more accurate:

        callbacks : list of :class:`~gensim.models.callbacks.Callback`
            Metric callbacks to log evaluation metrics of the model at every training epoch.

For a full example see this Kaggle notebook.

DISCLAIMER: this is a byproduct of the implementation for the purpose of a research paper.

mpenkov · 2024-04-08T03:26:04Z

@maciejskorski Looks like some tests in your PR are failing. Are you able to fix them?

piskvorky · 2024-06-11T12:16:18Z

Closing as stale. Are you still using LDA in 2024? What is your use-case here? Thanks.

cleanup

afa2a05

mpenkov added this to the Spring 2024 release milestone Apr 8, 2024

mpenkov modified the milestone: Summer 2024 release Jun 11, 2024

mpenkov added the stale Waiting for author to complete contribution, no recent effort label Jun 11, 2024

fix flake8 issues

a817bb5

piskvorky closed this Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Callbacks for LDAMultiCore #3481

Callbacks for LDAMultiCore #3481

maciejskorski commented Jun 20, 2023 •

edited

mpenkov commented Apr 8, 2024

piskvorky commented Jun 11, 2024

Callbacks for LDAMultiCore #3481

Callbacks for LDAMultiCore #3481

Conversation

maciejskorski commented Jun 20, 2023 • edited

mpenkov commented Apr 8, 2024

piskvorky commented Jun 11, 2024

maciejskorski commented Jun 20, 2023 •

edited