Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security Model #4198

Open
tokoko opened this issue May 13, 2024 · 31 comments
Open

Security Model #4198

tokoko opened this issue May 13, 2024 · 31 comments
Labels
kind/feature New feature or request

Comments

@tokoko
Copy link
Collaborator

tokoko commented May 13, 2024

Is your feature request related to a problem? Please describe.
This is an offshoot ticket from #4032. Feast is slowly approaching a state in which all major components can be deployed as remote servers. This enables us to start thinking about a comprehensive security model for access to each of them (registry, online store, offline store)

Describe the solution you'd like
Here's a high-level overview of what I'm expecting from the security model:

  1. I'd avoid incorporating user management into feast as much as possible. We should probably have a pluggable authentication module (LDAP, OIDC, etc...) that takes user/password (or token), validates it and spits out the roles that have been assigned to this particular user. Each server will have to integrate with this module separately, http feature server will get user/pass from basic auth, grpc and flight will get them according to their own standard conventions and pass credentials to the module to get the list of assigned roles. (Somewhat inspired by Airflow security model)

  2. (Option 1) We enrich Feature Registry to also contain information about the roles available in the system and each feast object should be annotated with permissions. In other words, the user would run feast apply with something like this

admin_role = FeastUserRole(name='admin')
reader_role = FeastUserRole(name='reader')

FeatureView(
	name=...
	schema=...
	...
	permissions={
		'read': [role],
		'write': [admin_role] 
	}
)
  1. (Option 2) Another option is to try to mimic AWS IAM and brush up on our regexes. In this case instead of annotating objects with permissions, you're annotating roles with policies.
risk_role = FeastUserRole(
	name='team_risk_role',
	permissions=[
		FeastPermission(
			action='read', //read_online, read_offline, write_online, write_offline
			conditions=[{
				'name': 'very_important_*',
				'team': 'risk'
			}]
		)
	]
)

FeatureView(
	name='very_important_fv',
	schema=...
	...
	tags={
		'team': 'risk'
	}
)

The upside of the second approach is that it's a lot less invasive than the first one. You could potentially end up with a setup where permissions and objects are managed with some level of separation between them. I think I'm more in favor of this.

  1. Once a server gets ahold of user roles and permission information from the registry, all components will apply the same "rules engine" to authorize the requested action.
@tokoko tokoko added the kind/feature New feature or request label May 13, 2024
@dmartinol
Copy link
Contributor

@tokoko we started thinking about a possible solution, that we can share after completing the internal review, but first we'd like to ask a few questions to verify our understanding of the initial requirements (*):

  • We assume that the protected resources are all instances of FeatureView (including OnDemandFeatureView), FeatureService and theFeatureStore, correct?
  • Is the intention to also prevent unauthorized accesses from clients using the Feast SDK, instead of the servers? The reason for asking is that we are also thinking to add similar requirement to prevent undesired updates to the data stores from non-admin personnel. However, we are aware that this may be a corner case scenario in most production deployments, as the feature store definition (e.g. the feature_repo.yaml used to feast apply) should not be accessible to all the users (e.g., in a RHOAI setup, it should live in a separate "admin notebook" or in a well-protected git repo and branch), and maybe it's not a real concern at all.
  • On the implementation side, are you thinking to leverage any existing library like PyCasbin -that may also look overkill for now- or a simplified in-house solution implementing the security model? (personally, we'd avoid this option to avoid any vendor lock-in)
  • Should we also add a feast feast-permissions list [--verbose] to list the existing permissions together (in verbose mode) with the matching resources? (and maybe list the unprotected resources, which can help troubleshooting permission errors)

(*) We also like option 2 for the reasons you mentioned above. Additionally, we can share a reviewed definition to better align with the usual RBAC permission models coming from our previous work with Keycloak permission features.

@tokoko
Copy link
Collaborator Author

tokoko commented May 15, 2024

  • We assume that the protected resources are all instances of FeatureView (including OnDemandFeatureView), FeatureService and theFeatureStore, correct?

I'm not sure I'll be able to list everything comprehensively here, but I think there're two major sets of permissions (and protected resources as a result) to consider here. The first part is CRUD-like permissions for manipulating objects in feast registry: Entity, DataSource, FeatureView, StreamFeatureView, OnDemandFeatureView, FeatureService, SavedDataset, ValidationReference. (TAL at registry server proto for reference) These will probably need can_read and can_edit actions (But note that read here refers to reading object definition, not the data).

Another aspect is managing access to the underlying data. FeatureService and all variations of FeatureViews should probably have can_query action. DataSource and SavedDataset will require can_query and can_write actions and so on. I'm sure I'll miss something here.

  • Is the intention to also prevent unauthorized accesses from clients using the Feast SDK, instead of the servers? The reason for asking is that we are also thinking to add similar requirement to prevent undesired updates to the data stores from non-admin personnel. However, we are aware that this may be a corner case scenario in most production deployments, as the feature store definition (e.g. the feature_repo.yaml used to feast apply) should not be accessible to all the users (e.g., in a RHOAI setup, it should live in a separate "admin notebook" or in a well-protected git repo and branch), and maybe it's not a real concern at all.

You mean SDK usage without setting individual components as remote, right? no, I don't think that's the intention simply because that would be way too hard to accomplish. In such a case a client application needs direct access to the underlying resources, so we would have to somehow inject ourselves in that, also considering that different implementations of components will have completely different permissions. tbh, I don't think that's even possible. I think we should enforce permissions only on the servers. If someone has access to the underlying resources itself and configures feature_store.yaml with necessary credentials, they will be able to circumvent the security model and I think it might be completely fine for some use cases, for example materialization may be something that's orchestrated by a central ML Platform team only that doesn't really need to care about permissions.

  • On the implementation side, are you thinking to leverage any existing library like PyCasbin -that may also look overkill for now- or a simplified in-house solution implementing the security model? (personally, we'd avoid this option to avoid any vendor lock-in)

I'm with you on that one. I think we should start by agreeing on some sort of FeastSecurityManager abstract interface (a class with a method that takes user/pass and outputs a list of roles for example) w/o any external dependency. We could then use some off-the-shelf functionality for each auth method implementation.

  • Should we also add a feast feast-permissions list [--verbose] to list the existing permissions together (in verbose mode) with the matching resources? (and maybe list the unprotected resources, which can help troubleshooting permission errors)

Maybe, I'm not sure what that would look like though. Do you mean listing defined roles or permissions that can be specified in those roles?

(*) We also like option 2 for the reasons you mentioned above. Additionally, we can share a reviewed definition to better align with the usual RBAC permission models coming from our previous work with Keycloak permission features.

Cool, we can start there then.

@tokoko
Copy link
Collaborator Author

tokoko commented May 15, 2024

@dmartinol Sorry, I just took a look at pycasbin. I guess it's a rules engine, so disregard my answer above. It looks promising, but I'm fine with home-grown option as well, depends on how complicated our version will be to maintain I guess.

@dmartinol
Copy link
Contributor

Maybe, I'm not sure what that would look like though. Do you mean listing defined roles or permissions that can be specified in those roles?

Yep, something like

feast feast-permissions list --verbose
permissions
├── feast-admin [feast-admin]
│   └── FeatureStore
├── read-online-stores [role1, role2]
│   ├── FeatureService:fs1
│   ├── FeatureView:fv1
│   └── FeatureView:fv2
└── write-offline-stores [writer]
    └── FeatureView:fv3

@dmartinol
Copy link
Contributor

...manipulating objects in feast registry: Entity, DataSource, FeatureView, StreamFeatureView, OnDemandFeatureView, FeatureService, SavedDataset, ValidationReference. (TAL at registry server proto for reference)

So you mean the FeastObjectType enum (to be extended to support the map also SavedDataset and ValidationReference types).

BTW: what about the Registry type instead? e.g., how can we model the permissions to execute feast apply otherwise?

@tokoko
Copy link
Collaborator Author

tokoko commented May 15, 2024

Yes, that sounds about right. not sure what you mean about Registry type, can you elaborate? When a user runs feast apply it almost exclusively boils down to crud operations on the above mentioned list of resources applied to the registry. So those are the protected resources, Registry type is just an interface where crud of these objects are applied from. Maybe I'm missing something?

@dmartinol
Copy link
Contributor

Yes, that sounds about right. not sure what you mean about Registry type, can you elaborate? When a user runs feast apply it almost exclusively boils down to crud operations on the above mentioned list of resources applied to the registry. So those are the protected resources, Registry type is just an interface where crud of these objects are applied from. Maybe I'm missing something?

🤔 yes, seeing it from this perspective, this is fine. So, for completeness we probably need all the CRUD actions like:
create, read, update, delete plus query, query_online, query_offline and write, write_online, write_offline

@tokoko
Copy link
Collaborator Author

tokoko commented May 15, 2024

I'm undecided between having create, update, delete vs a single edit action.

@dmartinol
Copy link
Contributor

dmartinol commented May 16, 2024

@tokoko on the implementation side, do you think a programmatic solution is mandatory (e.g., like the PyCasbin enforcer), or can we avoid changing the code and instead use decorators to enforce permission policies?
BTW, in my opinion, we cannot use decorators because some affected functions manipulate multiple protected resources (e.g., FeatureViews) at the same time. Additionally, the code may not be structured to support such granular configuration at the individual API level. However, I'd like to hear your feedback on this.
+@tmihalac who raised the question

@tokoko
Copy link
Collaborator Author

tokoko commented May 16, 2024

@dmartinol @tmihalac I think the most logical points where permission enforcement should happen is in the methods of the major feast abstract classes (OfflineStore, OnlineStore, BaseRegistry). I think for the registry where CRUD-like operations live, granularity shouldn't be a problem because of how BaseRegistry is designed. For OnlineStore and OfflineStore, yes, sometimes you might get a request for multiple feature views at once, but I don't really see why that would be a problem for a decorator, tbh. decorators are just function preprocessors, right? Maybe I'm missing something, but I don't really see the difference between those options other than that decorators will probably look better...

@dmartinol
Copy link
Contributor

@tokoko we want to share with you a gist describing a proposal to implement this functionality.

The modelling part follows your initial requirement but tries to adapt it to some standards that we've found in Keycloak.
Apart from that, we also propose a possible architecture of the security management solution and some example of usage in the Feast code (both programmatic and decporation options).

@tokoko
Copy link
Collaborator Author

tokoko commented May 17, 2024

@dmartinol Can you clarify what's RoleBasedPolicy exactly for me? Looks like it's a list of roles (extracted from keycloack for example) that have this permission assigned to them. If that's the case:

  • I'm not sure I like the name 😄 Can't this just be a roles parameter that takes a list of strings?
  • Is it a good idea to have a list of roles scattered around with permission objects? Wouldn't it be better to have another FeastRole(name: str, permissions:List[FeastPermission]) resource? The downside is introducing another resource type that needs to be managed, of course.. just a suggestion. wdyt?

@tokoko
Copy link
Collaborator Author

tokoko commented May 17, 2024

Also, what's add_roles_for_user method in RoleManager? Does that mean there should be a way to assign roles to the user from feast itself or maybe I'm misunderstanding the class? If we have auth backends like LDAP or OIDC, I was thinking we could delegate role management to them fully, so that assigning roles to the user would happen in AD, Keycloak or somewhere like that.

To me something like this makes more sense instead of RoleManager:

class AuthManager:
    """auth management"""

    def authenticate(self, user: str, password: str) -> List[str]:
        """
        Returns a list of roles if authentication successful, empty if auth failed.
        """
        return False

And then we would have concrete classes like LdapAuthManager, OidcAuthManager and so on.

@dmartinol
Copy link
Contributor

@dmartinol Can you clarify what's RoleBasedPolicy exactly for me? Looks like it's a list of roles (extracted from keycloack for example) that have this permission assigned to them. If that's the case:

  • I'm not sure I like the name 😄 Can't this just be a roles parameter that takes a list of strings?

Quoting Keycloak docs, policies define the conditions that must be satisfied before granting access to an object, and we could have policies based on different criteria, e.g. (also speaking Keyclock-ish):

  • User-based policy: match the configured user against the user in the client request
  • Role-based policy: match the configured roles against the roles of the user in the client request
  • Attribute-based policy: ...

So, the reason for having RoleBasedPolicy was to make room for introducing a Policy interface that can be added later with a type field (one of role, user) or even with an enforce method to apply the policy verification.
These policy entities can be shared by multiple permissions, if it is the (likely) case:

read_policy = RoleBasedPolicy(['reader', 'viewer'])
admin_policy = RoleBasedPolicy(['admin'])

permission1 = FeastPermission(...,policies=[read_policy, admin_policy],...)
permission2 = FeastPermission(...,policies=[read_policy],...)
permission3 = FeastPermission(...,policies=[admin_policy],...)
  • Is it a good idea to have a list of roles scattered around with permission objects?

It is not a a list of roles scattered around with permission objects, but a list of policies to be applied to allow the execution of the protected actions 😉.

Wouldn't it be better to have another FeastRole(name: str, permissions:List[FeastPermission]) resource? The downside is introducing another resource type that needs to be managed, of course.. just a suggestion. wdyt?

The model that you are proposing describes the permissions allowed for any given role instead of the roles/policies allowed for any given permission. Since the cardinality is N-N, it probably doesn't change much, but please consider that roles are not policies, and in the future we could extend the concept and manage policies that are not based on roles.

@jeremyary do you have any modelling preferences here?

@dmartinol
Copy link
Contributor

Also, what's add_roles_for_user method in RoleManager? Does that mean there should be a way to assign roles to the user from feast itself or maybe I'm misunderstanding the class? If we have auth backends like LDAP or OIDC, I was thinking we could delegate role management to them fully, so that assigning roles to the user would happen in AD, Keycloak or somewhere like that.
...
And then we would have concrete classes like LdapAuthManager, OidcAuthManager and so on.

The idea here is that another entity like the AuthManager that you define next, takes care of handling auth & authzn and populate the roles in the RoleManager, so the policy enforcement can use this information w/o being impacted by the actual auth provider.
But yes, probably this flow can be reviewed a bit and surely a POC can help to clarify the design.

Maybe LdapAuthManager, OidcAuthManager are not so relevant in the K8s context, where there is a service-2-service interaction and there are no external users to be authenticated? Of course, if we define an open interface, anyone can contribute his own implementation.

@tokoko
Copy link
Collaborator Author

tokoko commented May 17, 2024

@dmartinol thanks, that makes sense and looks like keycloak uses those terms so mustn't confusing for everyone. I'm probably influenced by AWS terms where policies and permissions aren't really separated in any way, policies are more or less just a collection of permissions. My only concern is that if we end up sticking with just RoleBasedPolicy, it will just be a slightly complicated way of providing a list of strings, but we can worry about that later...

@tokoko
Copy link
Collaborator Author

tokoko commented May 17, 2024

Maybe LdapAuthManager, OidcAuthManager are not so relevant in the K8s context, where there is a service-2-service interaction and there are no external users to be authenticated? Of course, if we define an open interface, anyone can contribute his own implementation.

I may be the one missing some context here, sure. The way I think about it when a user authenticates in some way, auth provider is also usually the one that can provide additional info regarding roles and attributes. mlflow has something similar where authenticate call if successful returns Authorization object that contains all the relevant information. For ldap and oidc, a separate role manager seems odd, but I'm not sure how some k8s service mesh magic changes this flow. What's the source of user-related information in that case? Is it possible to make RoleManager an internal implementation detail of K8sAuthProvider(?) instead of an external interface? just an idea...

@franciscojavierarceo
Copy link
Member

I'm undecided between having create, update, delete vs a single edit action.

@tokoko I would suggest each, as the flexibility will be great for users

@franciscojavierarceo
Copy link
Member

I am also pro Option 2 at the moment

@dmartinol
Copy link
Contributor

@tokoko we prepared a POC to prove our initial design.
Please take a look and share your feedback 👍

@franciscojavierarceo
Copy link
Member

@tokoko we prepared a POC to prove our initial design. Please take a look and share your feedback 👍

Very nice!

@tokoko
Copy link
Collaborator Author

tokoko commented May 29, 2024

@dmartinol Looks really good overall, thanks. Obviously still going through it, but I'll leave remarks here along the way:

  • KeycloakAuthManager should probably be called OidcAuthManager, right? Is there anything specific to keycloak in the implementation?
  • I'm not sure if something like require_permissions decorator is applicable in our scenario. If I'm reading it correctly, methods need to be attached to some Resource type to be marked with it. Is that possible in feast? Our resources (FeatureView for example) don't have protected methods associated with them directly, in other words there's no query_online or query_offline methods in class FeatureView. Instead policy enforcement should happen at the start of methods like get_historical_features which aren't directly attached to protected resources.

@dmartinol
Copy link
Contributor

  • KeycloakAuthManager should probably be called OidcAuthManager, right? Is there anything specific to keycloak in the implementation?

Agree, there are no keycloak-specific details, and the required URLs can be retrieved from the discovery URL like http://localhost:8080/realms/poc/.well-known/openid-configuration that we can add to the external configuration

@dmartinol
Copy link
Contributor

  • I'm not sure if something like require_permissions decorator is applicable in our scenario. If I'm reading it correctly, methods need to be attached to some Resource type to be marked with it. Is that possible in feast? Our resources (FeatureView for example) don't have protected methods associated with them directly, in other words there's no query_online or query_offline methods in class FeatureView. Instead policy enforcement should happen at the start of methods like get_historical_features which aren't directly attached to protected resources.

This was also my initial concern (see previous comment, and you suggested to use decorators, which instead may be difficult to apply to

the start of methods like get_historical_features

because the decorator, IIUC, needs to identify the input arguments containing the protected resources (e.g. features in FeatureStore.get_historical_features), in order to hide the authorization process and apply the permission rules.

For this reason, we can adopt a programmatic approach (mentioned in the POC) at all identified entry points, allowing us to apply precise validation, such as:

   a : ResourceA = ...
   _get_security_manager().assert_permissions(a, AuthzedAction.EDIT)

I've updated the POC with an Orchestrator class that runs the permission checks via API, please have a look.

BTW, this approach works if we agree on the above assumption that we:

enforce permissions only on the servers

We are aware that client code can bypass these start methods and directly access the inner objects; however, this is outside the scope of our investigation.

@tokoko
Copy link
Collaborator Author

tokoko commented May 30, 2024

I guess I was referring to a more complicated decorator, something that would know how to inspect values passed in features param in get_historical_features for example, but you're right, we should probably abandon that approach. Orchestrator looks good to me. We will probably also need some sort of logic in all the places where permission enforcement happens, so that it's skipped when a user hasn't configured auth (usually on client-side).

@dmartinol
Copy link
Contributor

We will probably also need some sort of logic in all the places where permission enforcement happens, so that it's skipped when a user hasn't configured auth (usually on client-side).

There's a DefaultSeurityManager for that, with:

    def assert_permissions(
        self,
        resource: Resource,
        actions: Union[AuthzedAction, List[AuthzedAction]],
    ):
        return True

Of course, it's a POC, and we can elaborate on it further

@tokoko
Copy link
Collaborator Author

tokoko commented May 30, 2024

thanks, overlooked that part. Another question, I understand that role manager and policy enforcer should be global objects, but isn't SecurityManager being global a problem when it contains user context specific info (current user)?

@dmartinol
Copy link
Contributor

dmartinol commented May 30, 2024

isn't SecurityManager being global a problem when it contains user context specific info (current user)?

🤔 of course it is, sorry for missing this point! I think we can use the contextvars module to define thread local current user, WDYT?
Updated POC and added a test of concurrent requests

(not sure you got the reply, so adding a mention to @tokoko)

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 3, 2024

@dmartinol sorry, forgot to reply here. I'm not too familiar with contextvars. I guess they should work well with fastapi, but not too sure about the flight server (or registry server) as they aren't async.

@dmartinol
Copy link
Contributor

@tokoko by async you mean that the parameters are first sent with do_put and then the API is executed with do_get?
even if this is the case, we can check for permissions at do_get endpoint, when the execution is forwarded to the actual OfflineStore. It's probably worth to extend the POC and validate both gRPC use cases.

@dmartinol
Copy link
Contributor

@tokoko we added a section in the POC for the ArrowFlight server use case, using the middleware function to handle the authentication token, extract the user details and pass them to the context of the do_get API.
For regular grpc services, I think we could use server interceptors for the same purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants