pypi version throws ValueError #607

FinnHuelsbusch · 2023-08-01T10:56:01Z

To reproduce the bug:

Create a new python 3.11.x environment (tested with python 3.11.4)
install the following dependencies:

scipy 1.11.1
scikit-learn 1.3.0
cython 0.29.36
hdbscan 0.8.33

create a minimal example:

from sklearn.datasets import make_blobs
import hdbscan
blobs, labels = make_blobs(n_samples=2000, n_features=10)
clusterer = hdbscan.HDBSCAN()
clusterer.fit(blobs)
print(clusterer.labels_)

Execute it and get the following error:

Traceback (most recent call last):
File "/home/***/Desktop/hdbscan_test.py", line 5, in <module>
    clusterer.fit(blobs)
  File "/home/***/micromamba/envs/hdbscan3/lib/python3.11/site-packages/hdbscan/hdbscan_.py", line 1205, in fit
    ) = hdbscan(clean_data, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/***/micromamba/envs/hdbscan3/lib/python3.11/site-packages/hdbscan/hdbscan_.py", line 884, in hdbscan
    _tree_to_labels(
  File "/home/***/micromamba/envs/hdbscan3/lib/python3.11/site-packages/hdbscan/hdbscan_.py", line 80, in _tree_to_labels
    labels, probabilities, stabilities = get_clusters(
                                         ^^^^^^^^^^^^^
  File "hdbscan/_hdbscan_tree.pyx", line 659, in hdbscan._hdbscan_tree.get_clusters
  File "hdbscan/_hdbscan_tree.pyx", line 733, in hdbscan._hdbscan_tree.get_clusters
TypeError: 'numpy.float64' object cannot be interpreted as an integer

Workaround:

Clone the repo
uninstall hdbscan from the environment
execute python setup.py install while the environment is active
Execute the minimal example again.
It work's

This was also tested with the commit 813636b (The commit of version 0.8.33)

Would be nice to get instructions on how to fix this (if the error is on my side) or to fix this in general.

Tested on Windows and Linux. This error only occurs under python 3.11.x.

The text was updated successfully, but these errors were encountered:

FinnHuelsbusch · 2023-08-01T11:23:32Z

The error message seems similar to an error mentioned in #600 in the comments and its fix in #602. Though both are talking about the condense_tree function.

empowerVictor · 2023-08-02T12:17:40Z

I have the same error, both the 0.8.29 and 0.8.33

LoveFishoO · 2023-08-08T07:17:18Z

Absolutely, my version of python is also 3.11.x. I have the same error, but after I try this method, I get anthor error ModuleNotFoundError: No module named 'hdbscan._hdbscan_linkage'

Try python setup.py develop to replace python setup.py install
I solve this problem.

FinnHuelsbusch · 2023-08-08T09:12:39Z

Maybe #606 helps with this error.

jkmackie · 2023-08-10T03:46:43Z

I also replicated the bug on Windows. Packages installed with pypi. Base virtual environment created with miniconda.

Bug occurs:

Python 3.11.x
scikit-learn 1.3.0
hdbscan 0.8.33
numpy 1.24.4

from sklearn.datasets import make_blobs
import hdbscan
blobs, labels = make_blobs(n_samples=2000, n_features=10)
clusterer = hdbscan.HDBSCAN()
clusterer.fit(blobs)
print(clusterer.labels_)

Error:

File hdbscan\\_hdbscan_tree.pyx:733, in hdbscan._hdbscan_tree.get_clusters()

TypeError: 'numpy.float64' object cannot be interpreted as an integer

Avoid the bug by switching to slower Python 3.10.x and downgrading scikit-learn. Keep the hdbscan and numpy versions.

No errors:

Python 3.10.x
scikit-learn 1.21.1
hdbscan 0.8.33
numpy 1.24.4

Revised 15 August, 2023

RichieHakim · 2023-08-14T23:43:42Z

I am also getting this error on windows builds. This seems like a pretty urgent issue. @lmcinnes or @gclendenning, forgive the @, but you may want to take a look at this.

johnlees · 2023-08-16T08:13:33Z

So this line:
https://github.com/scikit-learn-contrib/hdbscan/blob/master/hdbscan/_hdbscan_tree.pyx#L733

is_cluster = {cluster: True for cluster in node_list}

node_list is constructed above:

    if allow_single_cluster:
        node_list = sorted(stability.keys(), reverse=True)
    else:
        node_list = sorted(stability.keys(), reverse=True)[:-1]
        # (exclude root)

and stability is from https://github.com/scikit-learn-contrib/hdbscan/blob/master/hdbscan/_hdbscan_tree.pyx#L164, see return https://github.com/scikit-learn-contrib/hdbscan/blob/master/hdbscan/_hdbscan_tree.pyx#L237-L241

    result_pre_dict = np.vstack((np.arange(smallest_cluster,
                                           condensed_tree['parent'].max() + 1),
                                 result_arr)).T

    return dict(result_pre_dict)

np.arange should have an integer dtype I think; result_arr has type dtype=np.double.

I am not sure if the np.vstack might be casting the the integer keys to floats due to the result_arr type (I might check this later), can't see anything obvious in numpy which would have changed this behaviour

JanElbertMDavid · 2023-08-16T11:47:30Z

@jkmackie thanks for the solution mate! appreciate it.

lmcinnes · 2023-08-16T15:11:50Z

At least some of the issues seem to be related to the wheel built for windows (and python 3.11). I have deleted that from PyPI. The downside is that installing on windows will require you to build from source; the upside is that hopefully installing from PyPI might work now.

johnlees · 2023-08-16T15:16:51Z

Just to confirm, I am also seeing this on an Ubuntu 22.04 CI with:

hdbscan 0.8.33
python 3.10.12
scikit-learn 1.3.0
numpy 1.22.4

johnlees · 2023-08-16T16:27:08Z

b .../lib/python3.10/site-packages/hdbscan/hdbscan_.py:80
p stability_dict.keys()
dict_keys([378.0, 379.0, 380.0, 381.0, 382.0, 383.0, 384.0, 385.0, 386.0, 387.0, 388.0, 389.0, 390.0, 391.0, 392.0, 393.0, 394.0])

not sure if those being floats is the problem here

jkmackie · 2023-08-16T16:27:30Z

@johnlees I suspect downgrading scikit-learn below 1.3 would fix on Ubuntu. Numpy 1.22.4 is used in the successful Windows configuration below:

#Successful configuration - Windows 10.

(myvirtualenv) 
me@mypc MINGW64 ~/embedding_clustering
$ conda list | grep -w '^python\s\|scikit\|hdbscan\|numpy'
hdbscan                   0.8.33                   pypi_0    pypi
numpy                     1.24.4                   pypi_0    pypi
python                    3.10.9          h4de0772_0_cpython    conda-forge
scikit-learn              1.2.1                    pypi_0    pypi

Note hdbscan is imported separately from scikit-learn. I wonder why it isn't imported as a module like KMeans?

#from package.subpackage import module
from sklearn.cluster import KMeans

#in contrast, hdbscan cluster algo is imported directly
import hdbscan

johnlees · 2023-08-16T16:33:53Z

Same issue with scikit-learn 1.2.2 and 1.2.1, and other packages as above.
I'm guessing this is a cython issue with the pyx files?

lmcinnes · 2023-08-16T19:01:49Z

This is really quirky, and I am having a great deal of trouble reproducing it in a way that I can actually debug it myself.

RichieHakim · 2023-08-16T19:32:10Z

Removing the pre-built wheel for windows on pypi was sufficient to get it working on my github actions windows runners.

If it is helpful, here is an example of when it was failing: https://github.com/RichieHakim/ROICaT/actions/runs/5861440405/job/15891513454

Thank you for the quick fix.

alxfgh · 2023-08-16T20:18:32Z

Removing the pre-built wheels and building from source didn't solve the bug for me

jkmackie · 2023-08-16T23:45:53Z

Removing the pre-built wheels and building from source didn't solve the bug for me

Did you try a fresh environment?

conda create -n testenv python=3.11

pip install hdbscan==0.8.33 numpy==1.24.4 notebook==7.0.2 scikit-learn==1.3.0

Cython should be something like 0.29.26 not 3.0.

If there's a hdbscan error, try:

pip install --upgrade git+https://github.com/scikit-learn-contrib/hdbscan.git#egg=hdbscan

johnlees · 2023-08-22T16:08:44Z

This is really quirky, and I am having a great deal of trouble reproducing it in a way that I can actually debug it myself.

Likewise – doing the install from source (rebuilding the cython generated .so libraries) makes the issue go away. I have floats in the line reported by the backtrace, and am not sure that's the correct erroring line anyway. I might try rebuilding the conda-forge version and see if that helps

lmcinnes · 2023-08-22T17:29:09Z

We have a new azure-pipelines CI system that will automatically build wheels and publish them to PyPI thanks to @gclendenning, so hopefully the next time we make a release this will all work a little better. It is definitely just quirks on exactly how things build on different platforms etc. but the fine details of that are ... hard to sort out.

johnlees · 2023-08-23T08:21:50Z

Ah maybe I should have been clearer, I am having issues with the conda version, not pypi.
The rebuild on conda-forge didn't sort out the CI issue unfortunately, still the same error.

lmcinnes · 2023-08-23T14:58:40Z

The conda forge recipe might need to be changed. Potentially adding a version restriction to Cython in the recipe itself (since it may not use the build isolation that pip install does) might help.

see: #329 (comment) scikit-learn-contrib/hdbscan#607

See: #329 (comment) scikit-learn-contrib/hdbscan#607

johnlees · 2023-08-24T10:34:14Z

The conda forge recipe might need to be changed. Potentially adding a version restriction to Cython in the recipe itself (since it may not use the build isolation that pip install does) might help.

Thanks for the pointer, this seems to have fixed it! Looks like we can have cython<3 when built but free version at run time and it works. I also added a run test to the recipe which I hope would flag such an issue in future releases

Gr4dient · 2023-08-25T04:50:39Z

Hi all, having trouble understanding what to do here (I installed HDBSCAN 2 days ago through Conda and I'm currently experiencing this issue). Can I remove and reinstall HDBSCAN through Conda at this point to solve the problem? If so, do I also need to remove and reinstall anything else? Cython? Thank you.

johnlees · 2023-08-25T07:55:18Z

@Gr4dient I would reinstall your HDBSCAN in that environment, or even just try a fresh conda environment. I hope to have fixed it in 0.8.33_3 releases (when you do conda list the hdbscan version should end in _3)

Gr4dient · 2023-08-25T18:41:06Z

Hi John, thanks for clarifying - it took several hours for Conda to find a solution to remove Cython and HDBSCAN from my NLP environment last night... not sure why it got so hung up. I'm not seeing '_3' on conda-forge; will that be available at some point soon? Thanks

johnlees · 2023-08-29T08:38:23Z

The new builds are on conda forge, e.g. in my working environment conda list shows:

hdbscan                    0.8.33        py310h1f7b6fc_3          conda-forge

If you are having trouble with time taken to resolve environments I would recommend using mamba instead of conda, or just starting over with a new environment, or both.

benmwebb · 2023-11-14T06:15:22Z

I can also reproduce this with a from-source build on Fedora 39:

# dnf install python3-devel python3-Cython python3-numpy python3-scipy python3-scikit-learn python3-setuptools gcc
# curl -LO https://files.pythonhosted.org/packages/44/2c/b6bb84999f1c82cf0abd28595ff8aff2e495e18f8718b6b18bb11a012de4/hdbscan-0.8.33.tar.gz
# tar -xvzf hdbscan-0.8.33.tar.gz 
# (cd hdbscan-0.8.33 && python3 setup.py build -j8)
# cat <<END > test.py
import hdbscan
from sklearn.datasets import make_blobs
data, _ = make_blobs(1000)
clusterer = hdbscan.HDBSCAN(min_cluster_size=10)
cluster_labels = clusterer.fit_predict(data)
assert len(cluster_labels) == 1000
END
# PYTHONPATH=hdbscan-0.8.33/build/lib.linux-x86_64-cpython-312/ python3 test.py
...
  File "//hdbscan-0.8.33/build/lib.linux-x86_64-cpython-312/hdbscan/hdbscan_.py", line 80, in _tree_to_labels
    labels, probabilities, stabilities = get_clusters(
                                         ^^^^^^^^^^^^^
  File "hdbscan/_hdbscan_tree.pyx", line 659, in hdbscan._hdbscan_tree.get_clusters
  File "hdbscan/_hdbscan_tree.pyx", line 733, in hdbscan._hdbscan_tree.get_clusters
TypeError: 'numpy.float64' object cannot be interpreted as an integer

A hacky fix which works for me is to replace https://github.com/scikit-learn-contrib/hdbscan/blob/0.8.33/hdbscan/_hdbscan_tree.pyx#L726-L729 with

    if allow_single_cluster:
        node_list = sorted([int(x) for x in stability.keys()], reverse=True)
    else:
        node_list = sorted([int(x) for x in stability.keys()], reverse=True)[:-1]

jvanelteren mentioned this issue Aug 3, 2023

TypeError: 'numpy.float64' object cannot be interpreted as an integer ddangelov/Top2Vec#343

Open

FinnHuelsbusch mentioned this issue Aug 5, 2023

TypeError encountered #608

Open

FinnHuelsbusch mentioned this issue Aug 8, 2023

Getting Error while using HDBSCAN #609

Open

zoccoler mentioned this issue Aug 10, 2023

HDBSCAN throwing TypeError BiAPoL/napari-clusters-plotter#263

Open

johnlees mentioned this issue Aug 14, 2023

Allow overwrite if forced bacpop/PopPUNK#279

Merged

JanElbertMDavid mentioned this issue Aug 16, 2023

Unable to install hdbscan on colab. #600

Open

johnlees mentioned this issue Aug 22, 2023

Rebuild library, bump build conda-forge/hdbscan-feedstock#60

Merged

5 tasks

johnlees mentioned this issue Aug 23, 2023

Pin cython<3 and add test which runs hdbscan conda-forge/hdbscan-feedstock#61

Merged

5 tasks

chasemc added a commit to KwanLab/Autometa that referenced this issue Aug 23, 2023

attempt to fix tests by pinning cython

c25a8b0

see: #329 (comment) scikit-learn-contrib/hdbscan#607

chasemc added a commit to KwanLab/Autometa that referenced this issue Aug 23, 2023

Pin cython until hdbscan is fixed

1bb702d

See: #329 (comment) scikit-learn-contrib/hdbscan#607

dhoogest mentioned this issue Oct 18, 2023

Docker build issue related to hbdscan fhcrc/deenurp#76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pypi version throws ValueError #607

pypi version throws ValueError #607

FinnHuelsbusch commented Aug 1, 2023 •

edited

FinnHuelsbusch commented Aug 1, 2023

empowerVictor commented Aug 2, 2023

LoveFishoO commented Aug 8, 2023

FinnHuelsbusch commented Aug 8, 2023

jkmackie commented Aug 10, 2023 •

edited

RichieHakim commented Aug 14, 2023

johnlees commented Aug 16, 2023

JanElbertMDavid commented Aug 16, 2023

lmcinnes commented Aug 16, 2023

johnlees commented Aug 16, 2023

johnlees commented Aug 16, 2023

jkmackie commented Aug 16, 2023 •

edited

johnlees commented Aug 16, 2023

lmcinnes commented Aug 16, 2023

RichieHakim commented Aug 16, 2023

alxfgh commented Aug 16, 2023

jkmackie commented Aug 16, 2023

johnlees commented Aug 22, 2023

lmcinnes commented Aug 22, 2023

johnlees commented Aug 23, 2023

lmcinnes commented Aug 23, 2023

johnlees commented Aug 24, 2023

Gr4dient commented Aug 25, 2023

johnlees commented Aug 25, 2023

Gr4dient commented Aug 25, 2023

johnlees commented Aug 29, 2023

benmwebb commented Nov 14, 2023

pypi version throws ValueError #607

pypi version throws ValueError #607

Comments

FinnHuelsbusch commented Aug 1, 2023 • edited

FinnHuelsbusch commented Aug 1, 2023

empowerVictor commented Aug 2, 2023

LoveFishoO commented Aug 8, 2023

FinnHuelsbusch commented Aug 8, 2023

jkmackie commented Aug 10, 2023 • edited

RichieHakim commented Aug 14, 2023

johnlees commented Aug 16, 2023

JanElbertMDavid commented Aug 16, 2023

lmcinnes commented Aug 16, 2023

johnlees commented Aug 16, 2023

johnlees commented Aug 16, 2023

jkmackie commented Aug 16, 2023 • edited

johnlees commented Aug 16, 2023

lmcinnes commented Aug 16, 2023

RichieHakim commented Aug 16, 2023

alxfgh commented Aug 16, 2023

jkmackie commented Aug 16, 2023

johnlees commented Aug 22, 2023

lmcinnes commented Aug 22, 2023

johnlees commented Aug 23, 2023

lmcinnes commented Aug 23, 2023

johnlees commented Aug 24, 2023

Gr4dient commented Aug 25, 2023

johnlees commented Aug 25, 2023

Gr4dient commented Aug 25, 2023

johnlees commented Aug 29, 2023

benmwebb commented Nov 14, 2023

FinnHuelsbusch commented Aug 1, 2023 •

edited

jkmackie commented Aug 10, 2023 •

edited

jkmackie commented Aug 16, 2023 •

edited