Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ls_recursive in Azure. #4981

Merged
merged 8 commits into from
May 22, 2024
Merged

Support ls_recursive in Azure. #4981

merged 8 commits into from
May 22, 2024

Conversation

teo-tsirpanis
Copy link
Member

@teo-tsirpanis teo-tsirpanis commented May 15, 2024

SC-44943

This PR implements ls_recursive in the Azure VFS. The design is based on the S3 implementation, but unlike S3, the Azure SDK calls are cleanly separated, which prevents the Azure SDK headers from leaking to the VFS header.


TYPE: C_API
DESC: Support VFS ls_recursive API for Azure filesystem.

@teo-tsirpanis teo-tsirpanis marked this pull request as ready for review May 16, 2024 17:26
@teo-tsirpanis
Copy link
Member Author

I was having trouble debugging this in my Windows machine, because of asserts caused by MSVC's checked iterators feature. Opened SC-47581 to track this.

Copy link
Contributor

@shaunrd0 shaunrd0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, the only change I would like to see before giving a 👍 is C++ / C API tests using filters that won't accept all entries. I left a separate comment that points to existing tests for S3 that could be reused. We should also list Azure as supported for external docs in C++ (please update comment to include local FS here too) and C APIs.

Otherwise just left some suggestions for comments. Nice work!

* iterators returned by a previous request. To be able to detect this, we
* can track the batch number and compare it to the batch number associated
* with the iterator returned by the previous request. Batch number can be
* tracked by the total number of times we submit a ListObjectsV2 request
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* tracked by the total number of times we submit a ListObjectsV2 request
* tracked by the total number of times we submit a ListBlobs request

* exit, will hold the continuation token to pass to the next listing
* operation, or nullopt if there are no more results.
*
* @return A vector with the blobs and directories found.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return A vector with the blobs and directories found.
* @return Vector of results with each entry being a pair of the string URI
* and object size.

// Currently only S3 is supported for VFS::ls_recursive.
using TestBackends = std::tuple<S3Test>;
// Currently only S3 and Azure is supported for VFS::ls_recursive.
using TestBackends = std::tuple<S3Test, AzureTest>;
TEMPLATE_LIST_TEST_CASE(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests using custom filters we could update these tests

https://github.com/TileDB-Inc/TileDB/blob/dev/test/src/unit-cppapi-vfs.cc#L510-L649

It looks like only one of them is using TEMPLATE_LIST_TEST_CASE at the moment but if it's not too much trouble I think all three of them could be updated to do this for local, s3, and azure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local filesystem seems to fail. Investigating…

This comment was marked as outdated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I figured out how to fix the local filesystem test, but will make the fix in a separate PR. For now I reverted enabling it.

@@ -581,6 +581,28 @@ class VFS : private VFSBase, protected S3_within_VFS {
return results;
}

/**
* Lists objects and object information that start with `prefix`, invoking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit to update comment, even though the name for the function documents itself.

Suggested change
* Lists objects and object information that start with `prefix`, invoking
* Recursively lists objects and object information that start with `prefix`, invoking

@teo-tsirpanis
Copy link
Member Author

Feedback addressed.

@teo-tsirpanis
Copy link
Member Author

CI is green @shaunrd0.

Copy link
Contributor

@shaunrd0 shaunrd0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for digging into the local FS failures

@KiterLuc KiterLuc merged commit 9b26fd9 into dev May 22, 2024
60 checks passed
@KiterLuc KiterLuc deleted the teo/azure-ls-recursive branch May 22, 2024 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants