Integrate VectorStore from Elasticsearch client #13291

maxjakob · 2024-05-06T14:55:09Z

Description

We recently added a VectorStore abstraction to the Elasticsearch client library (elastic/elasticsearch-py#2528, see module) in order to centralize the development and ensure all GenAI library integrations work the same.

This PR integrates the new module into LlamaIndex. The LlamaIndex class keeps its existing interface, it is just extended with the option to specify more retrieval strategies.

Type of Change

New feature
- non-breaking change in default usage of the vector store
- breaking change when using query modes that do not match the retrieval mode specified at init time
- --> I would suggest making a breaking release with version 0.2.0
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Adapted existing unit and integration tests. Note that the abstract library already tests its own functionality with respect to the different retrieval strategies.
I stared at the code and made sure it makes sense

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
~~I have added Google Colab support for the newly added notebooks.~~
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

review-notebook-app · 2024-05-13T13:14:02Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

logan-markewich

This looks good to me. I tried running ElasticsearchIndexDemo though and got this error:

NotFoundError: NotFoundError(404, 'resource_not_found_exception', 'Could not find trained model [.elser_model_2]')

Did I miss a step?

maxjakob · 2024-05-14T12:17:32Z

@logan-markewich The ELSER model needs to be deployed first. I added a comment to this section.

logan-markewich · 2024-05-14T17:28:47Z

...ores/llama-index-vector-stores-elasticsearch/llama_index/vector_stores/elasticsearch/base.py

-        query_embedding = cast(List[float], query.query_embedding)
-
-        es_query = {}
+        _mode_must_match_retrieval_strategy(query.mode, self.retrieval_strategy)

        if query.filters is not None and len(query.filters.legacy_filters()) > 0:


fyi this is why less filters are supported, elastic is only using the legacy exact match filters. If updated, it could use many filters (less than, greater than, contains, etc.) and operators (AND, OR)

logan-markewich · 2024-05-14T21:37:01Z

Seems like aiohttp needs to be an explicit dependency -- adding that now

maxjakob · 2024-05-15T10:06:43Z

How odd, CI ran successfully before, then you changed some text in a notebook and from then on it's failing. Let me trigger CI again (without explicit aiohttp installation).

This reverts commit 4268874.

maxjakob · 2024-05-15T14:55:36Z

Rebased on main.

maxjakob · 2024-05-16T07:51:18Z

Thank you so much for pushing this over the finish line @logan-markewich!

maxjakob force-pushed the es-use-orchestration-lib branch 2 times, most recently from 260132f to 6ca21d0 Compare May 13, 2024 13:13

maxjakob force-pushed the es-use-orchestration-lib branch 4 times, most recently from 85d1b09 to 99b61bd Compare May 13, 2024 13:41

maxjakob marked this pull request as ready for review May 13, 2024 13:51

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 13, 2024

logan-markewich reviewed May 13, 2024

View reviewed changes

maxjakob requested a review from logan-markewich May 14, 2024 12:35

logan-markewich reviewed May 14, 2024

View reviewed changes

logan-markewich approved these changes May 14, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label May 14, 2024

logan-markewich enabled auto-merge (squash) May 14, 2024 17:32

auto-merge was automatically disabled May 15, 2024 10:06
Head branch was pushed to by a user without write access

maxjakob and others added 9 commits May 15, 2024 16:55

Integrate VectorStore from Elasticsearch client

1ebad1e

update Elasticsearch Vector Store doc notebook

76cd88e

update Elasticsearch doc notebook with link

9c95558

vbump

4bd9cdc

add comments

d94a45b

nit: wording

5747632

add aiohttp

ecc046b

Revert "add aiohttp"

c903bad

This reverts commit 4268874.

add aiohttp (again)

d2be0c7

maxjakob force-pushed the es-use-orchestration-lib branch from fcf88ae to d2be0c7 Compare May 15, 2024 14:55

maxjakob and others added 2 commits May 15, 2024 17:36

revert Makefile change

e4e5679

force dependency in pants

678a6f9

logan-markewich merged commit 07a0c66 into run-llama:main May 15, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate VectorStore from Elasticsearch client #13291

Integrate VectorStore from Elasticsearch client #13291

maxjakob commented May 6, 2024 •

edited

review-notebook-app bot commented May 13, 2024

logan-markewich left a comment

maxjakob commented May 14, 2024

logan-markewich May 14, 2024

logan-markewich commented May 14, 2024

maxjakob commented May 15, 2024 •

edited

maxjakob commented May 15, 2024

maxjakob commented May 16, 2024

Integrate VectorStore from Elasticsearch client #13291

Integrate VectorStore from Elasticsearch client #13291

Conversation

maxjakob commented May 6, 2024 • edited

Description

Type of Change

How Has This Been Tested?

Suggested Checklist:

review-notebook-app bot commented May 13, 2024

logan-markewich left a comment

Choose a reason for hiding this comment

maxjakob commented May 14, 2024

logan-markewich May 14, 2024

Choose a reason for hiding this comment

logan-markewich commented May 14, 2024

maxjakob commented May 15, 2024 • edited

maxjakob commented May 15, 2024

maxjakob commented May 16, 2024

maxjakob commented May 6, 2024 •

edited

maxjakob commented May 15, 2024 •

edited