Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix return type specification of asset loader helper functions in load_assets_from_modules #21808

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

emirkmo
Copy link

@emirkmo emirkmo commented May 13, 2024

Summary & Motivation

Solves: #21807

Basically Functions/Classes should take in generics like Sequence, but they should not return said generic if they actually return a concrete implementation. In this case, all of these functions actually return a list. So we shouldn't mistype.

How I Tested These Changes

Ran the docs build for latest. Ran the dev test suite from where it is actually passing on master (checked out the latest green commit). My changes are minor. No failures found before or after.

Concrete functions should specify the actual type
of what they return. In this case, these functions
all returns lists, NOT a generic Sequence.

In general, functions and classes in a library should
be as generic as possible in what they accept as input
but be as specific as possible in what they return
as output.

These returns lists. If I try to append or manipulate
the returns, they have the incorrect type of Sequence.
A sequence does not have .append for instance. It
cannot be concatenated with a list of AssetDefinitions,
etc. If using strict type checking, which we do, I
cannot even easily combine the output of these functions
with assets imported "manually". I would get the error
 list[AssetsDefinition] is incompatiable with Sequence[...].

It is quite annoying to have to `typing.cast` the output
of library code from Dagster into its actual type just to
get my own type checking to work. It is made harder by the
fact that CacheableAssetsDefintion is not exported by default,
so I have to import it, and cast the output to
`list[AssetsDefinition | SourceAsset | Cacheable...]]`.

This fixes the return of these functions to the actual type,
meaning that they are still `Sequence[Union[A, B, C]`, and will
naturally pass the type check for any functions that works on a
generic Sequence[Union[A, B ,C]`, but
they are now also more concretely `List[Union[A, B, C]` so
one can actually use the returned objects in their own user code.

Change-Id: I13a06494c69c622dd65b4349d9538f94f1b3fc1a
Return types should be specific, not generic, (unless
you take in a generic, work on it, and then return it).
But these functions do not do that. They always return
a list, even if they take in a Sequnce.

Change-Id: I374139e50de4bfeaf908fb18699fd81fadb14475
@gibsondan gibsondan requested a review from sryza May 15, 2024 18:46
Copy link
Contributor

@sryza sryza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally return Sequence in these cases because it gives us future flexibility to do things like caching the results. E.g. with the current typing, we could later decorate one of these functions with functools.lru_cache. With List, we wouldn't be able to do this, because we couldn't expect that the caller wouldn't modify the returned collection.

@emirkmo
Copy link
Author

emirkmo commented May 17, 2024

But Sequence is wrong. It is returning a list. I fail to see how wrongly typing the output effects runtime caching (unimplemented). The returned list is not hashable so your cache wouldn't work correctly..

As far as I can tell, you don't want to LRU cache these functions, at least in user code? Their point is to, given the same input, pull in all assets even if there are new ones defined in those locations. LRU caching is for functions that return the same thing for the same input, which these functions do not..

If you do want to cache, then you should return an object type that can be cached. Such as a tuple. (See the second answer for details as its possible to add wrappers around list returning functions to cache them by converting them to something hashable of course)

If nothing, then please do not actively mistype the return. You could leave it untyped which then at least allows users to set some settings to ignore the untyped third party type or have the type checker/LSP figure out the correct type. Right now the type is wrong and has to be manually cast to correct type to actually use the output. Please see the linked issue for the problems this causes.

@sryza
Copy link
Contributor

sryza commented May 17, 2024

If nothing, then please do not actively mistype the return

@emirkmo - all lists are sequences, so the type is accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants