Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pygwalker cannot render too much data #546

Closed
heqi201255 opened this issue May 13, 2024 · 1 comment
Closed

Pygwalker cannot render too much data #546

heqi201255 opened this issue May 13, 2024 · 1 comment

Comments

@heqi201255
Copy link

heqi201255 commented May 13, 2024

I was trying to plot my data using Pygwalker, the data is a csv file about 467MB with shape (3682080, 12), my code is like:

from pygwalker.api.streamlit import StreamlitRenderer
import pandas as pd
import streamlit as st

# Adjust the width of the Streamlit page
st.set_page_config(
    page_title="Use Pygwalker In Streamlit",
    layout="wide"
)

# Add Title
st.title("Use Pygwalker In Streamlit")

# You should cache your pygwalker renderer, if you don't want your memory to explode
@st.cache_resource
def get_pyg_renderer() -> "StreamlitRenderer":
    df = pd.read_csv("/data.csv")
    # If you want to use feature of saving chart config, set `spec_io_mode="rw"`
    return StreamlitRenderer(df, kernel_computation=True)


renderer = get_pyg_renderer()

renderer.explorer()

I tried to use pygwalker inside jupyter and via streamlit, both gave me the error "The query returned too many data entries, making it difficult for the frontend to render. Please adjust your chart configuration and try again."

Screenshot:
Screenshot 2024-05-13 at 14 56 56

The visualization is stuck at loading, and got a timeout message afterwards. Is there any workaround to render my data? What chart configuration should I adjust?

@longxiaofei
Copy link
Member

longxiaofei commented May 13, 2024

Hi @heqi201255

Thank you for bringing up this issue with pygwalker. By default, pygwalker has a fixed limitation on data queries to ensure the safety of memory usage in the frontend browser.

When the count(distinct t) exceeds 1,000,000 (1 million), it becomes challenging for the frontend to efficiently render such a large amount of data into a chart.

To address this issue, we are considering adding a new parameter that allows users to control the maximum data size for rendering. This parameter will provide flexibility and allow users to adjust the size according to their specific needs.

One possible solution is to introduce the following code snippet, which sets the maximum data length to 10,000,000 (10 million):

pyg.GlobalVarManager.set_max_data_length(10 * 1000 * 1000)

We would appreciate your thoughts and feedback on this proposed solution. Please let us know if you have any suggestions or concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants