Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daft_validation - AttributeError: module 'statistics' has no attribute 'correlation' #200

Closed
maltzsama opened this issue Apr 10, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@maltzsama
Copy link
Contributor

Describe the bug
When attempting to use the correlation function within a custom defined function decorated with @daft.udf, an AttributeError is raised due to the absence of the correlation attribute in the statistics module. This error occurs because the correlation function was introduced in Python version 3.10. Nowadays the minor python version supported by cuallee is 3.8, where this error can be reproduced

To Reproduce
To reproduce the behavior:

  1. Use the correlation function within a custom defined function decorated with @daft.udf.
  2. Execute the code.
    @daft.udf(return_dtype=daft.DataType.float64())
    def correlation(x, y):
>       return [statistics.correlation(x.to_pylist(), y.to_pylist())]
E       AttributeError: module 'statistics' has no attribute 'correlation'

cuallee/daft_validation.py:211: AttributeError

Expected behavior
The correlation function should be successfully called within the custom defined function decorated with @daft.udf.

Additional context
This bug arises due to the fact that the correlation function was introduced in Python version 3.10, and the environment where the code is being executed is likely using an older version of Python where this function does not exist.

@canimus
Copy link
Owner

canimus commented Apr 25, 2024

Thank you @maltzsama I think the quick fix, will be to push now cuallee to a min version of 3.10 as we will be compromising in development if we stayed in 3.8.
The original idea of keeping compatibility with 3.8 was because of the Snowpark API, but now that they put the batteries on, and is compatible with newer versions of python, I think is worth pushing for newer, more secure and faster versions of python. What do you think?

@maltzsama
Copy link
Contributor Author

maltzsama commented Apr 30, 2024

@canimus , It will create in other validation, PySpark 3.2.X does not support python 3.10 as you can see on this table, so I think it's not the better way to solve this problem, Look:

Spark Version Python Min Supported Version Python Max Supported Version Python 2.7 Python 3.4 Python 3.5 Python 3.6 Python 3.7 Python 3.8 Python 3.9 Python 3.10 Python 3.11
3.5.1 3.8 3.11 No No No No No Yes Yes Yes Yes
3.5.0 3.8 3.11 No No No No No Yes Yes Yes Yes
3.4.2 3.7 3.11 No No No No Yes Yes Yes Yes Yes
3.4.1 3.7 3.11 No No No No Yes Yes Yes Yes Yes
3.4.0 3.7 3.11 No No No No Yes Yes Yes Yes Yes
3.3.3 3.7 3.10 No No No No Yes Yes Yes Yes No
3.3.2 3.7 3.10 No No No No Yes Yes Yes Yes No
3.3.1 3.7 3.10 No No No No Yes Yes Yes Yes No
3.3.0 3.7 3.10 No No No No Yes Yes Yes Yes No
3.2.4 3.6 3.9 No No No Yes Yes Yes Yes No No
3.2.3 3.6 3.9 No No No Yes Yes Yes Yes No No
3.2.2 3.6 3.9 No No No Yes Yes Yes Yes No No
3.2.1 3.6 3.9 No No No Yes Yes Yes Yes No No
3.2.0 3.6 3.9 No No No Yes Yes Yes Yes No No

Maintain this compatibility should be important to execute this lib on GCP DataProc when using images as 2.0.96-debian10.
I checked the daft lib, and it support python 3.8 too
Maybe it can happen in other validations, but I'm not sure.

@maltzsama
Copy link
Contributor Author

@canimus , I was looking for some information about versions of python in some environments. GCP works like I said in previous comment. But I had found out this:

Version Last Updated Released On Supported Until Available Until
2.0-debian10 2024/05/06 2021/01/22 2024/07/31 2026/07/31

So, I agree with you. Push now cuallee to a min version of 3.10 can be done with a min of side effects(I hope so)

@maltzsama maltzsama reopened this May 7, 2024
@canimus
Copy link
Owner

canimus commented May 11, 2024

Hi @maltzsama after looking at the statistics here it seems that 20% of users are still downloading from python>=3.8.
Eventually users using legacy versions of cuallee can fixate their pipelines by anchoring to a specific version that supports the daft minimum requirements, which are python>=3.10.
In the meantime, do you think we can close this issue, as it does not affect cuallee in the core, but affects just the installation of the daft extension.
Agree?

@canimus canimus added the enhancement New feature or request label May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants