ENH: Writing a DataFrame to Excel with `XlsxWriter` in constant_memory mode requires row-by-row writes #34710

idantene · 2020-06-11T09:54:33Z

Is your feature request related to a problem?

When writing large DataFrames to an Excel file using XlsxWriter, one can use the options={'constant_memory': True} keyword arguments.
However, per the documentation: once this mode is active, data should be written in sequential row order.

The way pandas works at the moment, is that cells are written per series first, so column-by-column. This effectively writes the DataFrame so that only the first column and last row are fully written (along with the column names, which are written as a single row).

Describe the solution you'd like

It would be great to add an axis-like argument in the to_excel method, controlling how the data is written to the file (by columns (series) or by rows).

API breaking implications

There should be no breaking implications. This new argument can have a default value that matches the current implementation.

Describe alternatives you've considered

Monkeypatching the ExcelFormatter as such works fine:

from pandas.io.formats.excel import ExcelFormatter, ExcelCell

def write_excel_by_rows(self, coloffset: int):
    if self.styler is None:
        styles = None
    else:
        styles = self.styler._compute().ctx
        if not styles:
            styles = None
    xlstyle = None
    for rowidx in range(self.df.shape[0]):
        for colidx in range(len(self.columns)):
            if styles is not None:
                xlstyle = self.style_converter(";".join(styles[rowidx, colidx]))
            yield ExcelCell(self.rowcounter + rowidx, colidx + coloffset, self.df.iloc[rowidx, colidx], xlstyle)

ExcelFormatter._generate_body = write_excel_by_rows

Additional context

Reproducible minimal example:

import pandas as pd
df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5], 'c': [6, 7, 8]})
with pd.ExcelWriter('foo.xlsx', engine='xlsxwriter', options={'constant_memory': True}) as xl:
    df.to_excel(xl, index=False)
pd.read_excel('foo.xlsx')
>>>    a    b    c
>>> 0  0  NaN  NaN
>>> 1  1  NaN  NaN
>>> 2  2  5.0  8.0

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-06-12T21:32:27Z

Do we have any other engine-specific formatters that affects how we write cells? Or is ExcelFormatter supposed to be agnostic to the engine?

idantene · 2020-06-14T15:34:57Z

I believe ExcelFormatter is intended to be agnostic to the engine.

idantene · 2021-02-18T19:23:11Z

Bringing this back up again.

davidemerritt · 2022-01-13T21:12:18Z

+1 on this being very useful, even 10k rows requires a workaround (several 100s of mB of memory used otherwise)

kemalmutlu · 2024-04-29T08:18:36Z

+1 It's very useful

idantene added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 11, 2020

TomAugspurger added IO Excel read_excel, to_excel Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Writing a DataFrame to Excel with `XlsxWriter` in constant_memory mode requires row-by-row writes #34710

ENH: Writing a DataFrame to Excel with `XlsxWriter` in constant_memory mode requires row-by-row writes #34710

idantene commented Jun 11, 2020 •

edited

TomAugspurger commented Jun 12, 2020

idantene commented Jun 14, 2020

idantene commented Feb 18, 2021

davidemerritt commented Jan 13, 2022

kemalmutlu commented Apr 29, 2024

ENH: Writing a DataFrame to Excel with XlsxWriter in constant_memory mode requires row-by-row writes #34710

ENH: Writing a DataFrame to Excel with XlsxWriter in constant_memory mode requires row-by-row writes #34710

Comments

idantene commented Jun 11, 2020 • edited

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

TomAugspurger commented Jun 12, 2020

idantene commented Jun 14, 2020

idantene commented Feb 18, 2021

davidemerritt commented Jan 13, 2022

kemalmutlu commented Apr 29, 2024

ENH: Writing a DataFrame to Excel with `XlsxWriter` in constant_memory mode requires row-by-row writes #34710

ENH: Writing a DataFrame to Excel with `XlsxWriter` in constant_memory mode requires row-by-row writes #34710

idantene commented Jun 11, 2020 •

edited