Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export CSV on all assets/logs chooses header columns from the first asset/log in the list #805

Open
Fosten opened this issue Mar 19, 2024 · 10 comments · May be fixed by #842
Open

Export CSV on all assets/logs chooses header columns from the first asset/log in the list #805

Fosten opened this issue Mar 19, 2024 · 10 comments · May be fixed by #842
Labels

Comments

@Fosten
Copy link
Contributor

Fosten commented Mar 19, 2024

When using the Export CSV button from all /assets or all /logs page, only the header columns from the first asset in the list are chosen.

From 3.1.0 release blog post... "In previous versions, only the columns that are visible on the page were included in the CSV. Now, the columns include all text, reference, timestamp, and boolean data by default."

This works fine for exporting similar asset/log types. But not for "all" pages that include multiple types.

Expected behavior

  • Perhaps "all" pages could only export the columns visible on the all /assets and all /logs pages.
    • assets: id, asset_type, status
    • logs: status, id, timestamp, name, log_type, owner
  • Or "all" pages could specify all fields that are common to all assets/logs
    • assets: id,uuid,type,uid,name,status,created,changed,archived,notes,flag,location,is_location,is_fixed,owner,parent,quick
    • logs: id,uuid,type,uid,name,timestamp,status,created,changed,notes,flag,location,is_movement,asset,category,owner,equipment,quick
  • Or "all" pages could hide the option to export from the dropdown altogether.

Example assets export
id,uuid,type,uid,name,status,created,changed,archived,notes,flag,location,is_location,is_fixed,owner,parent,quick,animal_type,birthdate,is_castrated,nickname,sex
2,952f0dac-a9ba-4ecb-b042-269d298a797e,animal,admin,"test animal",active,2024-03-19T20:57:35+00:00,2024-03-19T20:57:35+00:00,,,,,,,,,,"test breed",,,,
3,1cdcfa41-6f0b-4bf6-8981-7cf694a529df,equipment,admin,"test equipment",active,2024-03-19T20:58:02+00:00,2024-03-19T20:58:02+00:00,,,,,,,,,,,,
4,44e55b63-8fa4-477f-b26b-0f815946ddb3,land,admin,"test land",active,2024-03-19T20:58:20+00:00,2024-03-19T20:58:20+00:00,,,,,1,1,,,,bed
5,310948da-b398-4efe-8516-38da54bed309,structure,admin,"test structure",active,2024-03-19T20:58:38+00:00,2024-03-19T20:58:38+00:00,,,,,1,1,,,,building
1,e79d294b-2de2-4ee2-b848-482323a52a01,plant,admin,"test tomato",active,2024-03-01T23:01:54+00:00,2024-03-01T23:01:54+00:00,,,,,,,,,,tomato,
6,5d34f390-4d1d-4dcb-85ef-ab9e2fb410c6,water,admin,"test water",active,2024-03-19T20:58:50+00:00,2024-03-19T20:58:50+00:00,,,,,1,1,,,

Example logs export
id,uuid,type,uid,name,timestamp,status,created,changed,notes,flag,location,is_movement,asset,category,owner,equipment,quick,lot_number,purchase_date,source
6,bf552e33-40fe-44ca-9244-c48e6874e6e5,seeding,admin,"test seeding",2024-03-19T21:37:14+00:00,pending,2024-03-19T21:37:21+00:00,2024-03-19T21:37:21+00:00,,,,1,,,admin,,,,,
5,f7ed4abe-4578-4f2d-a527-f963ffa7eee1,observation,admin,"test observation",2024-03-19T21:37:02+00:00,pending,2024-03-19T21:37:08+00:00,2024-03-19T21:37:08+00:00,,,,,,,admin,,
4,21e6178b-7219-48f8-a79b-045eb3d9fea0,maintenance,admin,"test maintenance",2024-03-19T21:36:49+00:00,pending,2024-03-19T21:36:54+00:00,2024-03-19T21:36:54+00:00,,,,,,,admin,,
3,0fee77e9-0002-453b-95ef-d0cb3bf2df69,input,admin,"test input",2024-03-19T21:36:22+00:00,pending,2024-03-19T21:36:27+00:00,2024-03-19T21:36:27+00:00,,,,,,,admin,,,,,,
2,e5b9db31-a703-4d64-8406-f7e391ac48fe,harvest,admin,"test harvest",2024-03-19T21:36:08+00:00,pending,2024-03-19T21:36:13+00:00,2024-03-19T21:36:13+00:00,,,,,,,admin,,,
1,e1c82931-28d6-4bed-b060-eb58493788fd,activity,admin,"test activity",2024-03-19T21:35:45+00:00,pending,2024-03-19T21:35:54+00:00,2024-03-19T21:35:54+00:00,,,,,,,admin,,

@Fosten Fosten added the bug label Mar 19, 2024
@mstenta
Copy link
Member

mstenta commented Mar 20, 2024

Ah good catch @Fosten. This is due to this logic (in the upstream csv_serialization module we use for serialization):

https://git.drupalcode.org/project/csv_serialization/-/blob/600cee7e604401faf06fdd84d68f8c0a0a528686/src/Encoder/CsvEncoder.php#L177

The extractHeaders() method is used to figure out which headers to include in the CSV, and it only looks at the first row. I knew this, but overlooked how it would affect exports from Views that include all asset/log types.

  • Perhaps "all" pages could only export the columns visible on the all /assets and all /logs pages.
  • Or "all" pages could specify all fields that are common to all assets/logs
  • Or "all" pages could hide the option to export from the dropdown altogether.

These all make sense as options to consider.

A fourth possible option might be to override the extractHeaders() class and assemble a list of headers from all the rows included in the output.

There are some costs and benefits to this approach. The obvious benefit is it would fix this bug and include more data in the CSV exports. Some costs that come to mind include:

  • Performance cost to iterate through all the data rows to find unique headers.
  • More empty cells on asset/log types that don't have the associated column's field in their data model. This also feels a bit weird to include a column for a non-existent field in general.
  • It might make "batch" exports harder in the future.*

* More details on that last point: Right now the export happens all in one "page request", which has memory limits. We do not currently allow exporting "all" records for this reason. It will only include the results displayed on the page. @paul121 and I talked about how we might enable exporting "all" records using Drupal's Batch API (which splits up the job across multiple page requests, with a progress bar shown to the user). However, in a batch context, we would only be able to see the current "set" of rows being processed (I think), so finding all unique headers might not be possible. 🤔

Setting all technical considerations aside, perhaps the first question we should ask is: how would you expect a CSV export like this to work? Our goal should be to make this work intuitively. Although this is a bit of a strange case because we're offering an option to create a CSV that combines different "types" of assets/logs. If we take that line of thinking to the extreme: you would never expect to be able to export a CSV that combines vastly different records, like assets AND logs in the same CSV. So in that way, there may be an argument for removing the option to export from these "combined" lists altogether.

@mstenta
Copy link
Member

mstenta commented Mar 20, 2024

Another option that just came to mind... combining another "next step" idea @paul121 and I discussed...

We talked about adding the ability to choose which columns are included in the exports during the "confirm" step, via a set of checkboxes. Perhaps implementing that would open up some possibilities with this issue...

@Fosten
Copy link
Contributor Author

Fosten commented Mar 27, 2024

Having previously exported in 2.x, I was familiar with these header columns:

  • Assets—all: "Bulk update",image_target_id,ID,"Asset name","Asset type",Flags,Parents,Owner,Location,Status
  • Logs—all: "Bulk update",Status,id,Timestamp,"Log name","Log type",Assets,Location,Quantity,Flags,"Log category",Owner
  • Quantities—all: ID,"Log ID","Log status","Log timestamp","Log type","Log name","Log assets","Quantity measure","Quantity value","Quantity units","Quantity label","Quantity type"

Knowing the export logic had changed in 3.x, I wasn’t sure what to expect.

  • Since there isn’t an option to “Import all”, it would make sense there wouldn't be an option to “Export all”
  • But I suppose I was thinking these pages would export all fields common to all assets/logs (option 2 above).

Logically, I wouldn’t expect to be able to combine header columns from different types in the same CSV. But it might be useful to export “A yearly report of all logs” in the same CSV for example. Especially since there is already a prefaced warning stating "CSV exports do not include all data."

@paul121
Copy link
Member

paul121 commented Apr 22, 2024

This has come up as an issue for Rothamsted now too. We are leveraging this general log exporter on a custom view that include all logs (of different types) that are associated with a research experiment. This results in CSV exports with headers that do not match the data exported in some of the rows.

Our goal should be to make this work intuitively. Although this is a bit of a strange case because we're offering an option to create a CSV that combines different "types" of assets/logs. If we take that line of thinking to the extreme: you would never expect to be able to export a CSV that combines vastly different records, like assets AND logs in the same CSV. So in that way, there may be an argument for removing the option to export from these "combined" lists altogether.

The issue with a general log/asset exporter action that others may use in contrib is that it is challenging to know what a "combined" list is... these other views could choose to not include this export action, but by default views will pull in all available bulk actions, so it would be nice if this could still provide some base functionality.

Logically, I wouldn’t expect to be able to combine header columns from different types in the same CSV. But it might be useful to export “A yearly report of all logs” in the same CSV for example. Especially since there is already a prefaced warning stating "CSV exports do not include all data."

Thanks for the feedback @Fosten . I agree this seems reasonable, especially for a short-term "fix" to this problem.

SO if we try for a short-term fix....

I think we just need some way of identifying if all entities are of a single bundle or if we have multiple bundles. As @mstenta mentioned there is some complexity in doing this and it could have a performance cost.

But I wonder if there is an elegant way.... during the EntityCsv::executeMultiple step of the actual action plugin. We could save a list of the bundles to the tempStore/state in addition to the selected entities. The export form could then check the pre-processed list of bundles and when there are multiple notify the user that the export will be simple/include less data. I think we could adapt our existing logic in EntityCsvActionForm::getIncludeColumns to only include the entity base fields in this scenario.

We talked about adding the ability to choose which columns are included in the exports during the "confirm" step, via a set of checkboxes. Perhaps implementing that would open up some possibilities with this issue...

Once we have a list of bundles it would become easier to build this type of functionality... but ultimately it seems that indeed it would require a custom CsvEncoder::extractHeaders method so we could specify multiple columns for all rows of the csv export. But this could be a nice follow up after an initial fix for this bug?

@paul121
Copy link
Member

paul121 commented Apr 22, 2024

But I wonder if there is an elegant way.... during the EntityCsv::executeMultiple step of the actual action plugin

Thinking more, maybe it is just as easy to do this check when building the form initially 🤷

This all needs a bit more thought regarding batch operation exports but wonder if we can find a short-term fix for now...

@mstenta
Copy link
Member

mstenta commented May 7, 2024

Just throwing this out there for discussion... I replaced extractHeaders() with this very simple one, and it fixes the issue:

  protected function extractHeaders(array $data, array $context = []) {
    $headers = [];
    foreach ($data as $row) {
      $headers = array_merge($headers, array_diff(array_keys($row), $headers));
    }
    return $headers;
  }

Tl;dr: Iterates over all the rows, and adds any headers that don't already exist to the list of headers.

Is it crazy? Probably.
Is it performant? Doubtful.
Will it work with batch API? Nope.
Is it worth considering as a quick fix until we come up with something better? ... 🤔

@mstenta
Copy link
Member

mstenta commented May 7, 2024

(Note: the exact extractHeaders() method logic we end up with would need to be a little more than that... since the original also has support for using the labels from Views headers, which we still depend on in our Quantity and Data Stream export displays.)

@mstenta
Copy link
Member

mstenta commented May 7, 2024

It was easy enough to package this up into a PR, so I did: #842

@mstenta
Copy link
Member

mstenta commented May 10, 2024

Just throwing this out there for discussion... I replaced extractHeaders() with this very simple one, and it fixes the issue:

Ah bad news... in testing #842 more I realized it only solves part of the problem. :-(

There are two issues:

only the header columns from the first asset in the list are chosen

and

This results in CSV exports with headers that do not match the data exported in some of the rows.

#842 ensures all the header columns are included, but it does not fix the mis-matched header/cells issue. I didn't catch this in my initial testing because the test data I used wasn't susceptible to it. But it is still happening.

To reproduce, create a seeding, harvest, and lab test log and fill in the bundle fields on each. Try to export all three and you'll see that the bundle field data is all jumbled up.

This happens because the CsvEncoder::formatRow() method takes an associative array and returns a non-associative array, and that is what gets added to the CSV as it is built.

Drat. There doesn't seem to be any clear solution to this, if we want to include all combined bundle columns.

So... we may have no choice but to remove bundle columns from the export if multiple bundles are included. I was hoping we would be able to do it, but looks like we don't have a choice.

On the bright side, some of the other work done in PR #842 will still be reusable. I can adjustment the warning text that was added to make it clear that it will only include columns that are shared by all rows (base fields). And we'll still be able to keep the new option of choosing which columns are included in the export - it will just be a narrower list of checkboxes when multiple bundles are included. It also means we won't need to override the extractHeaders() method in our own class, which keeps things simple.

Oh well. 🤷

@mstenta
Copy link
Member

mstenta commented May 10, 2024

I updated PR #842 to omit bundle-specific columns when multiple bundles are included in an export.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

3 participants