Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return_unexpected_index_query returning broken query, escaping double quotes in result for SparkDFExecutionEngine #9942

Open
jyoti-thakkar opened this issue May 17, 2024 · 0 comments

Comments

@jyoti-thakkar
Copy link

Describe the bug
we have set result_format as "COMPLETE" and return_unexpected_index_query to true, we want to use return_unexpected_index_query to get the error records from dataframe. It seems that it is returning broken query and escaping double quotes.
Example : return unexpected index query by GX : df.filter(F.expr(NOT(city IS NOT NULL)))
working query : df.filter(F.expr("NOT(city IS NOT NULL)"))

To Reproduce
import great_expectations as ge
from great_expectations.core import ExpectationSuite
from great_expectations.core.batch import RuntimeBatchRequest
from great_expectations.data_context import BaseDataContext
from great_expectations.data_context.types.base import FilesystemStoreBackendDefaults,DataContextConfig, DatasourceConfig

expectation_suite_config = {
"expectation_suite_name": "my_expectation_suite",
"expectations": [ # List of expectations
{
"expectation_type": "expect_column_values_to_not_be_null",
"kwargs": {
"column": "my_column",
"result_format": {"result_format": "COMPLETE"}

        }
    }
]

}

my_expectation_suite = ExpectationSuite(**expectation_suite_config)

Define DataContext configuration

data_context_config = DataContextConfig(
plugins_directory=None,
config_variables_file_path= None,
datasources={
"my_spark_datasource": DatasourceConfig(
class_name= "Datasource",
execution_engine={
"class_name": "SparkDFExecutionEngine",
"force_reuse_spark_context": True,

        },
        data_connectors={
            "spark_runtime_dataconnector":{
                "class_name": "RuntimeDataConnector",
                "module_name":"great_expectations.datasource.data_connector",
                "batch_identifiers": ["batch_name"]
            },
        },
    )
},
store_backend_defaults=FilesystemStoreBackendDefaults(root_directory="/"),

)

batch_request=RuntimeBatchRequest(datasource_name="my_spark_datasource",
data_connector_name="spark_runtime_dataconnector",
data_asset_name="my_asset",
runtime_parameters={"batch_data": df},
batch_identifiers={"batch_name": "batch_run"})

context = ge.get_context(project_config=data_context_config)
batch_validator = context.get_validator(batch_request=batch_request, expectation_suite=my_expectation_suite)
validation_result = batch_validator.validate()
print(validation_result)

validation_result contains unexpected_index_query value as "df.filter(F.expr(NOT(city IS NOT NULL)))"

when i execute this query it is giving error syntax error. Invalid syntax

Expected behavior
Executing Query should result into getting error records from dataframe

Environment (please complete the following information):

  • Operating System: [e.g. Linux, MacOS, Windows] --> Windows
  • Great Expectations Version: [e.g. 0.13.2] --> 0.18.13
  • Data Source: [e.g. Pandas, Snowflake] --> spark dataframe
  • Cloud environment: [e.g. Airflow, AWS, Azure, Databricks, GCP] --> Databricks

Additional context
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant