Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide index_urls via config to override the default. #2065

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

amjith
Copy link
Contributor

@amjith amjith commented May 19, 2024

Description

Add a config option to optionally provide index_urls that can override the default python package index url.

Changes

A new config option called index_urls that can provide a list of URLs that can act as an alternate source of python packages.

I haven't been able to get the tests passing locally. I'm unable to figure out a how to debug this locally.

I would like some help with the PR. I spoke with @FabioRosado at PyCon about the feature and he offered to help.

@ntoll
Copy link
Member

ntoll commented May 20, 2024

This PR will allow folks to install packages from third party packaging indexes (often inside organisations). A further refinement will be to enable/configure authentication for such third party packaging indexes.

@amjith
Copy link
Contributor Author

amjith commented May 20, 2024

As you can see here the url is still pointing to the default PyPI https://pypi.org/simple.

I can't figure out how to debug this.

@FabioRosado
Copy link
Contributor

Hello the failure is because /simple doesn't contain test-foo if you go to https://pypi.org/simple/ you will see the package names, I'd recommend changing the pyscript file to use one of the existing packages there and making sure that we can import them with pyscript following the necessary requirements (being a pure python wheel for example)

@amjith
Copy link
Contributor Author

amjith commented May 21, 2024

The intent of this PR is to use a different Python Package Index than the default (pypi.org). So I chose a package test-foo which is only available in test.pypi.org but not in pypi.org. The alternate package index is supplied using the index_urls config option.

Also, I just realized that I spoke to a different Fabio at PyCon. My apologies for the confusion. cc @fpliger

@WebReflection
Copy link
Contributor

I don't mind adding this config property but I have the following questions / concerns:

  • AFAIK there's no MicroPython (mip) equivalent for this, we do allow github: published packages though ... should we think about porting this in there too?
  • the name is index_urls but if it's always the same foreign repository, I find the idea counter-intuitive ... what if I have more than a single package? what are the entries then? how do these map to the packages list? i.e. 3 packages, only the one in the middle is custom ... how do we tackle that?

Beside these questions, I think, as a hint, that adding a package within the project for testing purpose and use that index urls to point at such local package should do the trick ... the test can be trivial, like just importing pyscript.document and add a class name to the document.documentElement (aka: HTML) to then check in there everything loaded just fine. It doesn't matter the package name.

@WebReflection
Copy link
Contributor

P.S. for @ntoll we don't want users to think they can add credentials in the py-config so the authentication story is actually not trivial at all to me and requires proper UX workflow and it will be likely undesired from a user landing perspective.

@amjith
Copy link
Contributor Author

amjith commented May 22, 2024

@WebReflection I was using the micropip.install() as a reference to implement the feature and the config option.

https://micropip.pyodide.org/en/stable/project/api.html#micropip.install

The documentation outlines how it will handle a list of index_urls when presented with more than one entry:

If a list of URLs is provided, micropip will try each URL in order until it finds a package. If no package is found, an error will be raised.

@fpliger
Copy link
Contributor

fpliger commented May 24, 2024

Sorry for chiming in late, really busy week with traveling, etc...

The index_urls option makes sense in the optics of actually matching config option names with the underlying tooling (micropip) options, but I do agree it's probably better to change our config with something that is easier to understand as "this is the repository/server where things are coming from".

I agree with @amjith in that we should not try to change too much around "how python packaging works". In most of python packaging scenarios you have that: a list of servers (or channels) that define a priority order in where you should install the packages from. If a package is not in the first one on the list it tries on the next one, etc... but you can always define the full path if you a specific package from a specific repo/channel.

@WebReflection
Copy link
Contributor

@fpliger @amjith I am not familiar with this micropip dance but people already complain Pyodide starts slow, I can't imagine just randomly checking HEAD around to find the right thing via fetch could make things better but hey, if that's OK for developers using custom packages, it's OK to me as well.

The mip though has nothing like this and we're currently discussing if mip should be embedded in MicroPython offer instead so that we only need to orchestrate github namespaces or stuff like this, but that might also then end up failing at mip level ... so: are we OK landing this for Pyodide only or are we sure we need this at all for MicroPython?

@hoodmane
Copy link
Contributor

hoodmane commented May 27, 2024

Ideally you would lock your dependencies ahead of time and use a custom lockfile. micropip should only be used in a situation like a repl where you really want to install anything, and even then its kind of questionable. micropip.freeze() generates the custom lockfile and loadPyodide({lockFileURL: "some_url"}) lets you load it. Then ideally you should be able to load packages from your custom index with loadPackage.

Frankly I'm surprised that PyScript hasn't been pushing in this direction. Solving a dependency set is NP-hard so it doesn't make sense to do it at run time, but people will need custom packages. Anaconda is a packaging company so it should be good at this sort of thing. With some help we could get lock files working very smoothly. @jezdez

We've been actively working on this in the last few days, but it's a bit tricky. pyodide/micropip#107
I think to support locking from arbitrary custom indexes we need a packaging PEP to add an INDEX file to the dist-metadata so that it's possible to work out where a wheel was installed from. Unless we just want to ask all of the indexes if they have the wheel again in order as part of micropip.freeze(), which feels pretty suboptimal.

@WebReflection
Copy link
Contributor

WebReflection commented May 27, 2024

@hoodmane thanks for chiming in, both appreciated and revealing ... I agree with you doing this at runtime feels weird and the lockFileURL could be an awesome config.toml thing to enable and pass along instead, created out of our pyscript-cli instead of runtime or we could use localStorage to do the runtime dance once and use the previously stored lock file every other time ... although this plays well with a single user re-landing on the same project, but it still bother the internet every single time (new user) so that I'd be happier if we could orchestrate lock file separately and offer it to any user out of a config flag instead.

@fpliger
Copy link
Contributor

fpliger commented May 28, 2024

@hoodmane @WebReflection we've been promoting the idea of frozen environments for a while but it's tricky as there are so many communities involved and we want to try to not reinvent the wheel...

I think we made some good progress on talking with the packaging community and folks interested in WASM during PyCon. I'm hoping we can actually start doing more in this soon...

@hoodmane
Copy link
Contributor

hoodmane commented May 28, 2024

Well Pyodide has most of a solution for locking dependencies. If you would collaborate a bit more with the upstream here you could move a lot faster. If you aren't satisfied with Pyodide's lock files you can help improve it or at least identify what its deficiencies are before ditching it rather than going off trying to invent your own thing without first finding out what already exists.

@amjith
Copy link
Contributor Author

amjith commented May 28, 2024

I feel like this discussion has enlarged in scope. Let me reset the context. I saw the talk given by @JeffersGlass and thought it was awesome. I wanted to build interactive doc sites for internal libraries that we build in our company. Since the internal libraries are not available on public PyPI but only in our internal index, I decided to make this PR to enabled my use case.

I'm not familiar with mip and I couldn't find any references of mip in the pyscript repo.

I fully understand that expanding in this direction might detract from pursuing a more efficient solution. If that is the case, I'm happy to withdraw this PR and follow along if someone wants to file an issue.

@hoodmane
Copy link
Contributor

@amjith Sorry for having this conversation on your PR, I actually kind of forgot that this was a PR discussion.

For what it's worth, the PR looks great to me. It is a nice improvement and I think it should be merged. But PyScript should work towards using a lockfile rather than resolving dependencies slowly and inaccurately at runtime on every page load.

@WebReflection
Copy link
Contributor

WebReflection commented May 28, 2024

to whom it might concern, this PR only tackles the SW side of affair, it doesn't change PyScript resolutions and it doesn't propagate the intent to pass along index_url ... that's actually not even in PyScript repository scope, it'd be eventually in polyscript which is in charge of understanding the package and do the right thing ... that being said, I hope nobody is offended by the discussion but I personally do appreciate it because I didn't know Pyodide already solved that problem, I didn't know there was a lock file already around, but I do like both things I've learned in here, and I just want to do the best we can for PyScript users.

I am OK enabling this for Pyodide interpreter only, and see how it goes, and if that's successful, I would be keen to think about a similar solution for MicroPython.

This means this PR will be closed either ways, only because it's the wrong place, but it was a wonderful starter for this conversation, and we hopefully can have more of these sooner than later for related, or unrelated, topics.

TL;DR thank you all, I'll play around in polyscript and see how it goes!

@fpliger
Copy link
Contributor

fpliger commented May 29, 2024

Well Pyodide has most of a solution for locking dependencies. If you would collaborate a bit more with the upstream here you could move a lot faster. If you aren't satisfied with Pyodide's lock files you can help improve it or at least identify what its deficiencies are before ditching it rather than going off trying to invent your own thing without first finding out what already exists.

@hoodmane is this answer related to my post or something else? If it's related to what I wrote, I think I probably miscommunicated here. Totally agree with you about contributing upstream instead of reinventing the wheel.

@WebReflection
Copy link
Contributor

WebReflection commented May 30, 2024

the way I see this working is by doing the following:

  • we add an experimental_lock_dependencies boolean flag that defaults to False
  • when that is True, we check if there is a config stored (localStorage or IndexedDB)
    • if not, we resolve everything once, we store that config hashed, we freeze in vFS and we move that into IndexedDB (or localStorage, not sure how big are these things, but IDB supports blobs/buffers which is better, imho).
    • if yes, and the config is the same as before, we load pyodide with a lockfile through an URL.createObjectURL(blob) and we should be good for the next time.
    • if yes, and the config is different, we repeat the dance at point one of this sub-list
  • if the flag is False, we silently clean up both config and lock file entries ... free space, happy users

@hoodmane does this logic sounds good to you? Is there any point in there that could be facilitated upstream instead? (or even all of them?)

/cc @fpliger

@ntoll
Copy link
Member

ntoll commented May 30, 2024

👋 Hi folks, just catching up (eventually) after post-PyCon email Armageddon. 😉

@amjith first of all, thank you so much for your contribution. I think everyone would like to be able to use a custom package index. It certainly makes sense when folks are using PyScript inside a corporate walled garden with a curated source of packages.

@hoodmane thank you for highlighting the suggestion of using a lock file. If there is a suggested process from upstream Pyodide, we should ensure PyScript makes this the easiest route for folks to use.

@WebReflection oh boy, I totally appreciate that auth is a VERY tricky nut to crack. I just mentioned it here since the need for authentication was a part of the face-to-face discussion about this issue at PyCon.

I also want to acknowledge that this PR has turned into a discussion... this to me is evidence that the PR is a step in the right direction, but we have not yet fully discovered all the possible options available to us (lock files etc...) nor achieved a consensus around what needs to be implemented and how. But, this is also the very best of open source collaboration: we have a PyScript user submitting code, and a contributor to an upstream project, all in the context of making stuff better. Let's keep up this sort of energy.

My intuition is we should take this PR as a starting point for further refinement (@WebReflection 's list of how it could work looks promising to me), then agree on the scope of the change[s], and modify this PR.

Sound like a plan?

@WebReflection
Copy link
Contributor

@ntoll this PR is not a (technical) starting point because it modifies something fully unrelated with PyScript core ... it changes a file into pyscript.sw that won't benefit anyone.

However, this PR is definitively a hint we want that feature available to our users and we should decide what/how we want to implement it, so I think we could move it as discussion and work in polyscript after, as everything discussed in here also won't directly land in pyscript/core because the config is part of the interpreter bootstrap which is not within PyScript repo scope.

@hoodmane
Copy link
Contributor

the way I see this working is by doing the following:
...

Yes this sounds generally right. The first thing that should be upstreamed is a lockFile argument as an alternative to lockFileURL to save the createObjectURL step. The second thing that I think should reasonably go upstream that would help you is support for loading a lockfile from the IDBFS. That way, you don't have to manually futz with the IDB, you can just do something like:

Path("/idbfs-mount/pyodide-lock.json").write_text(micropip.freeze())
FS.sync()

and then pass the appropriate options to loadPyodide() on startup to reuse the same lock.

Thoughts on this @ryanking13?

@fpliger
Copy link
Contributor

fpliger commented May 30, 2024

Agree with @amjith , the PR scope has definitely change and lock files wasn't necessarily a must have part of it (more on it below...). The original goal with this one is really to just be able to install dependencies from somewhere else.

In regards to lock_files, honestly, I kind of disagree that it's "the final goal" (unless we mean different things when we say lock files). Ideally, I'd personally want a way to build a "freeze" an "environment" (== pyodide + all dependencies you need for you project) that is already baked and don't need to install anything in runtime. But the question here is, is it a must to figure it our for this PR? I may be missing something but doesn't seem like it is a must to me...

Finally, I agree with @WebReflection that it actually has to have changes at the polyscript

@WebReflection
Copy link
Contributor

@fpliger yes, there are 2 different topics in here:

  1. being able to use the index_urls feature
  2. being able to improve next bootstrap via lock file

neither looks mutually exclusive to me and we can move forward with the index_urls story in polyscript and we cna just add it to the config things that pyodide understands, ignoring it for the time being in MicroPython, as the mip story is still in the refining/making stage there.

@hoodmane wait a minute ... are you saying that /idbfs-mount/ is user persistent out of the box? I am not sure I understand, if that's not the case, how would that work, because if I have to mount it, I need to loadPyodide first, then I can mount the IDB FS thingy ... am I missing something special automatically mounted in Pyodide vFS that would persist? 'cause that would be awesome to use already and also various users asked for a persistent FS and that might be the answer ... yet I think last time I've tried with that I had to bootstrap pyodide first, then mount stuff ... although the moment I import the module maybe the emscripten FS is already available before I bootstrap pyodide and I can mount things in there aleady? never thought about it, to be honest 🤔

@ryanking13
Copy link

Thoughts on this @ryanking13?

Well, when it comes to packaging, if there are 100 people, they all have different needs, so it's hard to say what's the right way to go. As discussed in this PR, I think there are generally 3 requests:

  1. Dynamic dependency resolution: micropip.install with custom indices.
  2. Using a lockfile to skip the dependency resolution
  3. Baking everything into the environment or binary so that single network requests does all the job.

In the end, I think they are mutually exclusive (as Andrea mentioned) and we need to provide all of those options to make everyone happy.

Anyway, about lockfiles, I think creating a lock file and storing it in some persistent DB to reuse it is a good idea. But probably we would need some API improvements to make it easier. For example, this draft PR (pyodide/pyodide#3664) suggests adding a hook in the Pyodide initialization stage so that users do some custom stuff (e.g. mounting indexedDB to the file system). I think we would need to develop something like that.

@hoodmane
Copy link
Contributor

this draft PR suggests adding a hook in the Pyodide initialization stage

Yeah I think we should go ahead with adding that. The Pyodide CLI runner needs it, but I just put in a special case hack. This would let us use a public mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants