-
Notifications
You must be signed in to change notification settings - Fork 957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: introduce max partition policy to oneDNN Graph API #1694
base: rfcs
Are you sure you want to change the base?
rfc: introduce max partition policy to oneDNN Graph API #1694
Conversation
- If not specified, the default partition policy remains to be fusion policy. | ||
- Max policy returns partitions max in partition size. The criteria of partitioning | ||
is based on pre-defined ops list and pre-defined rules. Connected ops | ||
that meet the conditions of pre-defined ops list and pre-defined rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sentence may be ambiguous, meet the condistions of pre-defined ops list and don't hit the pre-defined excluding rules will be selected into one partition
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meet the conditions of pre-defined ops list and pre-defined rules will be selected into one partition
means both pre-defined ops list and pre-defined rules need to be met to be selected into one partition, violate either will not be selected into one partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but i remember the rules are negative list. If op met one of rules, the op will be remove from the partition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I got your point. But since the pre-defined rules are backend specific, we cannot say it's always a negative list. So here I just keep general words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the pre-defined ops list and pre-defined rules be expose through API or API documentation? If not, how does the user know when to use the max policy? Or are you suggesting users to always specify max policy for their models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the pre-defined ops list and pre-defined rules be expose through API or API documentation?
Fusion policy doesn't expose fusion patterns through API, so I think max policy follows similar strategy. For document, maybe we can provide a page like pre-defined fusion patterns, if it helps.
If not, how does the user know when to use the max policy? Or are you suggesting users to always specify max policy for their models?
I feel this discussion can be merged with https://github.com/oneapi-src/oneDNN/pull/1694/files#r1283768894
The expected behaviors when using max policy: | ||
|
||
- If not specified, the default partition policy remains to be fusion policy. | ||
- Max policy returns partitions max in partition size. The criteria of partitioning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is motivation for differentiation between max and fusion? How user can decide which one should be used (not for a single model, but for overall integration)? Maybe "max" should be renamed to something like "experimental" ("experimental" is really bad word, we already have semantics for experimental features) to indicate that user shouldn't use it until they know it brings performance for their particular case? Another option to call it "all" if the intend is to take all graph and merge it into a single partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is motivation for differentiation between max and fusion?
From my understanding, fusion defines fixed number of patterns, while max's optimizations are not fixed, it will generate different optimization for different models based on model architecture, which might be good for OOB models.
How user can decide which one should be used (not for a single model, but for overall integration)?
It depends on how much control user wants to give to the library. Compared with fusion, max gives the library more control to optimize the graph.
But I can understand user wants to get better performance when they give the library more control, which is not guaranteed for max policy since we are not able to test all the models in the world (Please correct me if I'm wrong @ZhennanQin @niuxiaog).
Maybe "max" should be renamed to something like "experimental" ("experimental" is really bad word, we already have semantics for experimental features) to indicate that user shouldn't use it until they know it brings performance for their particular case? Another option to call it "all" if the intend is to take all graph and merge it into a single partition.
all
may not be true because take all graph and merge it into a single partition
is not guaranteed. experimental
sounds a doable option, what do you think? @ZhennanQin @TaoLv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max
policy is designed to support direct programming with oneDNN graph API to implement custom ops or custom fusion patterns. For example, end-user can build RMS_norm with small ops and pass it to onednn graph with max policy, and compiler backend can generate a fused kernel for this. This is frequent request for newly introduced activation or normalization. I think experimental
is not a good name since it can mean anything and doesn't reflect actual behavior. But we can use other term than max
to make it more clear.
- If not specified, the default partition policy remains to be fusion policy. | ||
- Max policy returns partitions max in partition size. The criteria of partitioning | ||
is based on pre-defined ops list and pre-defined rules. Connected ops | ||
that meet the conditions of pre-defined ops list and pre-defined rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the pre-defined ops list and pre-defined rules be expose through API or API documentation? If not, how does the user know when to use the max policy? Or are you suggesting users to always specify max policy for their models?
|
|
That's internal. Users don't know which ops are supported by graph compiler from API.
I believe framework integration is for general usage. How can frameworks or framework users know which model is from model zoo and which is not? @chunyuan-w @sanchitintel @yucai-intel @retonym @wenzhe-nrv Please share how you plan to integrate and expose this feature. Thanks. |
|
||
The expected behaviors when using max policy: | ||
|
||
- Max policy relies on predefined ops list to do partitioning. Connected ops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choice between max
and fusion
policy falls on the shoulders of the framework user. How this is expected to be exposed in Pytorch and Tensorflow?
How users are supposed to decide whether use fusion or max? The explanation here talks about implementation details mainly, but what are the consequences visible to framework user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is,
Max
policy is not expected to be exposed by framework, instead, it's used to support framework developer to implement their custom fusion kernels.
Fusion
policy is the recommended way to use onednn graph automatically, as we will do graph partition according to pre-defined fusion pattern and deliver good performance. While max
policy is to support direct programming with onednn graph API, and framework developers need to program the graph they want to fuse and pass it to onednn graph with max
policy, just like how they do C program and compile it with GCC.
max
policy can be used in below scenarios:
- The default
fusion
policy doesn't meet framework's requirement and developer wants to change the fusion behavior, for example, for a 3-layer MLP,fusion
policy will take them all, but if the developer only wants oneDNN graph to take the first two layers, and leave the last layer to be un-partitioned, then they can program withmax
policy and define a graph only contains first two layers. - Define new fusion patterns. For example, if framework developer want to implement RMSNorm op, they can build a graph with small ops and pass it to oneDNN graph with
max
policy, then a fused kernel will be returned.
In general, max
policy is not designed for framework users, but for framework developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My observation is that framework side also like to develop patterns and sometime they use custom op to support these patterns like flash_attention, sparse computation, int4, and etc. oneDNN Graph 's max policy would help framework develop to use oneDNN Graph instead of writing their own custom op. This gives them a chance to use oneDNN Graph and compare with their custom op, if oneDNN Graph does better, than they just call oneDNN Graph with max policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jianhui-Li, adding to your point (custom ops vs. patterns created by max partitioning), I feel like there's another aspect as well - a pattern added manually by backend teams may sometimes perform better than the pattern automatically chosen by the max partitioning approach, and sometimes, the opposite may be true.
So at runtime, should a framework (transparently to framework users) benchmark both approaches, and eventually use the approach that delivers better performance? In case a framework can't do this sort of benchmarking, then the framework may have to expose an API to the users to manually check if max partitioning approach (if it's non-default) may deliver better performance.
For example, DNNL backend will not support max partition policy in oneDNN v3.3 | ||
release, so when running with a combination of max partition policy and DNNL | ||
backend, max partition policy will fallback to fusion partition policy. | ||
1. The validation for max partition policy will not be landed in oneDNN v3.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not landing features without validation in oneDNN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the wording needs to be more precise, I'll refactor.
We have limited validation for max partition policy in v3.3 release:
- Basic functionality test and correctness check will be covered by gtests.
- Due to validation resource limitation, benchdnn graph plans to provide performance test in v3.3 as a stretch goal if the feature and test cases ready. (@chuanqi129 @thuang6 for more details)
@vpirogov Do you think this validation scope looks good to you?
Hi @ElaineBao, can framework developers tell if custom patterns that don't already exist with In other words, would |
Hi, @sanchitintel, I guess you are asking how a FW could determine when to use max and when to use fusion policy. The general rule is that the default recommended policy is still fusion policy. But for a specific workload that have customed patterns that have not been covered in fusion policy, FW can try to use max policy to see if it achieves better performance. |
Description
This RFC is to introduce max partition policy to oneDNN Graph API.
Rendered version: link