[VLDB'24 UNDER REVIEW] Query Hardness Measurement and Unbiased Workload Generation for Graph-Based ANN Index Evaluation
This repo provides methods to measure the hardness of queries on graph-based ANN indexes (e.g., HNSW, NSG, KGraph).
This hardness measure is what we proposed as
Graph-based indexes have been widely employed to accelerate approximate similarity search of high-dimensional vectors in various research and industrial fields. However, we observe that evaluations of graph indexes do not pay attention to the query workload distribution, leading to results that are over-optimistic, due to a bias for simple queries. In such cases, even though the average query performance is good, users may suffer from an inconsistent result quality, that is, high-precision results for simple queries, but rather low-precision results for hard queries. To provide an objective and comprehensive evaluation of graph indexes, in this paper, we propose a new approach for building unbiased workloads consisting of queries with different hardness. In order to measure the hardness of queries, we first propose a theoretical framework to estimate the query answering effort in a given graph index. A novel query hardness measure, Steiner -hardness, is then defined by calculating the proposed query effort on a representative MRNG (Monotonic Relative Neighborhood Graph) graph structure. Extensive experiments verify that the proposed query effort estimations accurately profile the real query effort. High correlations between Steiner-hardness and real effort across five graph indexes and six datasets demonstrate its effectiveness as a hardness measure. We also evaluate advanced graph indexes with new unbiased workloads. The new evaluation results can help users not only better understand the performance of graph indexes, but also obtain insights useful for the further development of graph-based indexing methods.
-
Eigen == 3.4.0
- Download the Eigen library from https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.tar.gz.
- Unzip it and move the
Eigen
folder to./src/
.
-
openmp
There are many entries in this project.
You can build them directly with g++ 9 or higher.
Building scripts are provided in folder script/
We've prepared several unbiased workloads for common public ANN datasets. They can be directly downloaded from folder workloads/
.
We provide the code to build NSW, HNSW, MRNG, SSG, and script/index_nsw.sh
, script/index_hnsw.sh
, and script/index_mrng.sh
We use python-script/ground_truth.py
to achieve this. Other EXACT kNN methods are also OK.
To get the revsersed graph, use python-script/get_rev_graph.py
.
You can then use script/get_me.sh
to get the ME of queries.
If you want to compare the ME with the actual query effort (or reproduce the result of Figure 8 and 9 in the paper), you need to obtain the actual effort of queries on a given recall and script/search_hnsw.sh
, script/search_mrng.sh
, etc. with purpose=1
.
Use the python script python-script/minimum_effor_greedy_test.py
to plot the figure and get the correlation coefficient.
Our
Use script/index_mrng.sh
to build the MRNG index, like in 1.1.
Use script/get_me.sh
to compute the ME of queries on MRNG index, like in 1.2.
These ME are the
If you want to compare the
To evaluate python-script/shuffle_dataset2.py
to get the shuffled datasets, and build indexes on these datasets.
The same as 1.3. Use script/search_hnsw.sh
, script/search_mrng.sh
, etc. with purpose=1
to compute the actual effort of queries.
Take the average of the actual effort of queries on different insertion orders of the same graph index. We view this average value as the ground truth of the hardness of queries on this index.
2.4 (optional) plot the density-scatter figure of $Steiner$ -hardness and average actual query effort
Use python script python-script/hardness_test.py
Use python-script/augment_GMM.py
to generate new data with Gaussian Mixture Model.
See 2. to compute the hardness of the new queries.
Use python-script/build_workloads.py
to build workloads.
Use python-script/plot_box_plot_workload_hardness.py
to plot the distribution of the hardness of the new queries.
Use script/search_hnsw.sh
, script/search_mrng.sh
, etc. with purpose=2
to evaluate indexes with the new unbiased workloads.