-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
enforce_privacy dose not work? #1145
Comments
Facing same issue. enforce privacy is working till v2.0.28. |
I think it's due to how pandasai/helpers/dataframe_serializer.py -> convert_df_to_csv() doesn't care at all about the it happily just adds the details: # Add dataframe details
dataframe_info += f"\ndfs[{extras['index']}]:{df.rows_count}x{df.columns_count}\n{df.to_csv()}" Until this gets properly fixed, I replaced above code with: # TEMP FIX: Do not add dataframe details
df_without_sample_data = pd.DataFrame(columns=df.pandas_df.columns)
dataframe_info += f"\ndfs[{extras['index']}]:{df.rows_count}x{df.columns_count}\n{df_without_sample_data.to_csv()}" In contrast, pandasai/helpers/dataframe_serializer.py -> convert_df_to_json() properly checks for Related: #1147 |
After some more digging, it seems you can get If you add field descriptions,
# If field descriptions are added always use YML. Other formats don't support field descriptions yet
if self.field_descriptions or self.connector_relations:
serializer = DataframeSerializerType.YML ..and then... def serialize(
self,
df: pd.DataFrame,
extras: dict = None,
type_: DataframeSerializerType = DataframeSerializerType.YML,
) -> str:
if type_ == DataframeSerializerType.YML:
return self.convert_df_to_yml(df, extras)
elif type_ == DataframeSerializerType.JSON:
return self.convert_df_to_json_str(df, extras)
elif type_ == DataframeSerializerType.SQL:
return self.convert_df_sql_connector_to_str(df, extras)
else:
return self.convert_df_to_csv(df, extras)
|
System Info
OS version: win11
Python version: 3.11
The current version of pandasai being used: 2.0.36
馃悰 Describe the bug
The sample data appears in the prompt even set enforce_privacy True.
The Code below:
And can see the print out of prompt, the dataframe still with data:
### QUERY
Get the top 3 GDP countries.
Variable
dfs: list[pd.DataFrame]
is already declared.At the end, declare "result" variable as a dictionary of type and value.
If you are asked to plot a chart, use "matplotlib" for charts, save as png.
Generate python code and return full updated code:
2024-05-04 15:08:46 [INFO] Executing Step 3: CodeGenerator
2024-05-04 15:08:49 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-04 15:08:49 [INFO] Prompt used:
dfs[0]:10x3
country,gdp,happiness_index
Spain,19294482071552,6.38
Japan,14631844184064,7.23
China,3435817336832,7.22
Update this initial code:
QUERY
Get the top 3 GDP countries.
Variable
dfs: list[pd.DataFrame]
is already declared.At the end, declare "result" variable as a dictionary of type and value.
If you are asked to plot a chart, use "matplotlib" for charts, save as png.
Generate python code and return full updated code:
2024-05-04 15:08:49 [INFO] Code generated:
```
# TODO: import the required dependencies
import pandas as pd
Write code here
top_3_gdp_countries = dfs[0].nlargest(3, 'gdp')
Declare result var
result = {
"type": "dataframe",
"value": top_3_gdp_countries
}
```
2024-05-04 15:08:49 [INFO] Executing Step 4: CachePopulation
2024-05-04 15:08:49 [INFO] Executing Step 5: CodeCleaning
2024-05-04 15:08:49 [INFO]
Code running:
The text was updated successfully, but these errors were encountered: