Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter shardsize ignored on queries #3407

Open
MaickelHubner opened this issue Nov 30, 2022 · 0 comments
Open

Parameter shardsize ignored on queries #3407

MaickelHubner opened this issue Nov 30, 2022 · 0 comments

Comments

@MaickelHubner
Copy link

Problem description

When I use the shardsize parameter in the similarities.Similarity method, when querying the index the same parameter is not used, causing errors:

self._similarity_index = similarities.Similarity(MODELS_PATH + f'/{model}', sim_vectors, num_features=len(self._dictionary), shardsize=50000)

sims = self._similarity_index[doc_vector]

image

PS: If I don't use the parameter shardsize, the error already occurs in the similarities.Similarity call.

Steps/code/corpus to reproduce

Save the .py files in the pruvo folder (package), the .parquet file in data folder and run this script:

import pandas as pd

from pruvo.embedding import Corpus

df = pd.read_parquet('data/preprocess.parquet')

corpus = Corpus()
corpus.add(list(df['bookingRoomType'].unique()), pre_processed=True)
corpus.add(list(df['mappedRoomType'].unique()), pre_processed=True)

w2v = corpus.train(model='word2vec')

w2v_similars = corpus.get_similars('apartment 1 king bed in neverland')
w2v_similars.head(10)

Versions

Please provide the output of:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import struct; print("Bits", 8 * struct.calcsize("P"))
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import gensim; print("gensim", gensim.__version__)
from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)

image

files.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant