Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support easier feature serving and model serving with KServe #4139

Open
franciscojavierarceo opened this issue Apr 23, 2024 · 12 comments
Open
Assignees
Labels
kind/feature New feature or request

Comments

@franciscojavierarceo
Copy link
Member

franciscojavierarceo commented Apr 23, 2024

Is your feature request related to a problem? Please describe.
At the moment, Model serving (via KServe) and Feature Serving (via Feast—the Feature Store) are separate components without any guidance on how to best serve production machine learning features and models.

It would be beneficial to the community to outline (or support) how these two components could best fit together so that users could launch production machine learning applications with a better understanding about best practices. At the moment, this is outlined by KServe but the approach has some limitations [1]. It would be beneficial to offer some form of integration that outlines how users could launch these two services in a way that maximizes the strengths of each component.

[1] The limitations will be discussed more thoroughly in the RFC but their approach is to treat the online feature store as a Transformer.

Describe the solution you'd like
I will be drafting a proposal in this document.

Describe alternatives you've considered
There are pros and cons to different approaches and I would like to solicit feedback from the community to understand what would result in the best tradeoffs.

Additional context

image

kubeflow/kubeflow#7564

@franciscojavierarceo franciscojavierarceo added the kind/feature New feature or request label Apr 23, 2024
@franciscojavierarceo franciscojavierarceo self-assigned this Apr 23, 2024
@tokoko
Copy link
Collaborator

tokoko commented Apr 23, 2024

@franciscojavierarceo thanks for kickstarting this. As I said on the call, I have some concrete ideas about changes in feast that could make this possible, but before we go there, let me say a couple of things about the differences between the approaches (preprocessor vs transformer, if I'm getting it right). Although I probably lean towards your point of view, I don't really think we can recommend either approach. The way I think about it, the approach taken usually depends on the user's existing ml serving practices and infrastructure. 1) When infra mostly consists of one-off independent model services, it makes sense to simply add another layer in front that will take care of feature store communication. 2) Alternatively if you're already heavily invested into a web of composable transformers and models depending on other models (kserve, seldon and so on) it makes a lot of sense to treat it as just another transformer. I think we should try to create a toolbox in feast that would apply to both scenarios.

@franciscojavierarceo
Copy link
Member Author

I'm not particularly familiar with the second scenario you outlined but I do agree that Feast should be a toolbox that could support both scenarios. That said, and given my previous experience, I'm going to focus my contributions to the RFC on case (1) since it is not outlined today and it is the one I've encountered the most frequently (and my suspicion is that it's the most common pattern most practicioners actually need).

@tokoko
Copy link
Collaborator

tokoko commented Apr 23, 2024

sure, I'm right there with you, that's my experience as well. another point is that I think we should aim to integrate with open inference protocol rather than any particular inference server. my understanding is that it's based on kserve v2 protocol and closest to the industry standard that there is. it's also pretty well-defined (both http and grpc) with a number of client implementations we could use, triton has good set of http and grpc python clients for it for example.

@tokoko
Copy link
Collaborator

tokoko commented May 1, 2024

@franciscojavierarceo I have been working on this internally and came up with some draft middleware implementation. Let me know if you're thinking along those lines as well.

For simpler model deployments (without model mesh) we should have a way for users to deploy a middleware service that wraps both feast and actual model server. I think we can make the assumption that model server needs to be OIP-compatible and we can go a step further and make our own middleware expose OIP interface as well. The difference will be that OIP model server will expect actual features as inputs and our middleware will only expect entity values and required request features (if any). Here's my very detailed uml for it :). Making middleware server expose OIP will also simplify using it for clients as there will be no need for them to use feast and can employ standard protocols or already existing client libraries for them (tritonclient is the best one for OIP, i think).

For the python version, we can even base own server on existing frameworks, mlserver has a very easy way of implementing custom OIP services without worrying too much about the actual server plumbing.

Another point to note here is that users will be able to make these services part of the kserve (or probably seldon as well) model mesh, but i don't know if that's such a good idea. As we will be doing an oip call to the underlying model server ourselves from the middleware instead of relying on the mesh to do it for us, the mesh itself won't be able to track that bit of communication.

@franciscojavierarceo
Copy link
Member Author

Awesome! I'll review this in more detail later. Do you want to collaborate on the doc? Feel free to add your name and suggest some changes.

@franciscojavierarceo
Copy link
Member Author

Let's discuss more in the doc. I think what you're calling out completely makes sense.

I think exposing an OIP makes sense, the only detail being that (as I outlined in my doc) sometimes users will want to retrieve just features or (in the future) retrieve features as inference is being generated (to be discussed more later probably).

@tokoko
Copy link
Collaborator

tokoko commented May 2, 2024

I started the solution section, please look through it when you get the chance, especially the part at the end under //TODO.

@franciscojavierarceo
Copy link
Member Author

The MLServer/Seldon seems really promising, as it supports PyTorch, Tensorflow, and XGBoost...

@tokoko
Copy link
Collaborator

tokoko commented May 4, 2024

One problem there right now is that they are on pydantic<2 (we are on pydantic>2) so can't add to the project yet. They are planning an upgrade in the next release.

@shuchu
Copy link
Collaborator

shuchu commented May 4, 2024

@tokoko
Copy link
Collaborator

tokoko commented May 4, 2024

@shuchu MLServer is an open source project (and kserve supports deploying it as well afaik). You should probably take a look at the proposal so far, I'm only advocating for using MLServer (Apache 2.0) and tritonclient (BSD-3) as utilitiies that would help us build a server exposing oip interface.

As to the integration with the model server, I agree. I think we shouldn't have an integration that's coupled with either one of them. We should have a vendor-neutral integration with an abstract OIP model server, which the users can deploy and manage with both kserve and seldon.

@shuchu
Copy link
Collaborator

shuchu commented May 4, 2024

Thank you, @tokoko . I take back what I said before. I agree with you that it's better for us to keep Feast in a neutral way.

Recently, I work on Kubeflow for other purposes. It seems the doc about Feast is quite old on Kubeflow's website. Let me see if I can update it. I will let you and @franciscojavierarceo know what kind of situation we have with Kubeflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants