Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] hyperscan support ARM #197

Open
bzhaoopenstack opened this issue Nov 7, 2019 · 34 comments
Open

[DISCUSS] hyperscan support ARM #197

bzhaoopenstack opened this issue Nov 7, 2019 · 34 comments

Comments

@bzhaoopenstack
Copy link

Hi hyperscan team,

I'm an newbee for hyperscan project. I'm so excited to have a conversation with you.

We have a plan to make hyperscan to support ARM64 function. And we will propose a series of PRs to make this happen, including hardware platform logical judgement code ,ARM NEON instruction set support and etc.. We won't propose intrusive changes to existing code. Now the detailed design are still uncertain, just a draft. Hope community can take part in the detailed feature design at the beginning.

But before the whole work begins, we want to know community attitude about this. We hope the kind feedback from your side.

Thanks very much.

@bzhaoopenstack
Copy link
Author

Hi team, @xiangwang1 , @fatchanghao , @Nor7th
Are you around? Please see our proposal. Any ideas are welcome.

Thanks

@xiangwang1
Copy link
Contributor

Current Hyperscan is specifically designed and optimized for Intel CPUs, including the selection of algorithms and utilization of SIMD instructions. I think there could be potential performance hit if the work is only about porting x86 instructions to corresponding ones in ARM NEON.

From Intel's perspective, we are not in a position to port Hyperscan to ARM.

We may consider it unless there're common interests in the community where developers other than us could push this forward and prove it as a viable path to take.

@bzhaoopenstack
Copy link
Author

Hi @xiangwang1 ,
Thanks for reply. ;-)

I think that would be good if you can consider. I will explain more according to your feedback.

  1. First, our thought/plan is introducing a totally new support for ARM. I mean we won't porting the existing x86 instructions to ARM. We plan to rewrite the algorithms and utilization based on ARM NEON, make some performance improvement and split a new "branch" to make it support ARM. Also we will introduce hardware platform logical judgement to install or work based on the underlying devices(x86 or arm). So the plan is introducing a other code branch to hyperscan, there is no any affect to existing x86 code/function, because it will call NEON instructions and rewrited algorithms/utilization to execute on ARM, we just want to introduce a new platform, and make hyperscan can run on ARM with a high performance.

  2. From Intel's side, I really understand about that. But this might be a good chance to extend the applications of hyperscan. Let's make hyperscan better. ;-)

  3. I found there are several issues [1][2][3] mentioned the requests of hyperscan can work on arm. So I think users/developers have had a voice for a while. That's exactly what we want to do if this could be done. And that' great if those guys could be here to say something. ;-)

So we need the community to help review the plan(design), and give us good suggestion to how to make this done, including small part of platform judgment script modification, some others we don't realized and etc..

Thanks.

[1] #187
[2] #159
[3] #34

@bzhaoopenstack
Copy link
Author

Hi @xiangwang1 ,
How do you think about it? Wish your kind suggestion. If we notice that it's valuable, we can introduce more detailed plan to you. And that's great if you could help to review it.

Thanks

@codecat007
Copy link

We have successfully ported hyperscan to the ARM platform(aslo MIPS,NO SIMD instructions support,Performance improvement is not high), and it turns out that this is not difficult. But we didn't do much optimization work, you guys can go deeper.

@zzqcn
Copy link

zzqcn commented Nov 12, 2019

@codecat007 hi, I'm interested about the hyperscan porting and have some try in past days.
Could you share something for that?

@bzhaoopenstack
Copy link
Author

bzhaoopenstack commented Nov 13, 2019

Thanks for concern here. ;-)

@codecat007
Copy link

codecat007 commented Nov 15, 2019

@zzqcn You can use the simd library(just like simde: https://github.com/nemequ/simde) to implement an middle layer for simd fuction calls.

@zzqcn
Copy link

zzqcn commented Nov 15, 2019

@codecat007 Thanks for your reply. I converted SSE to Neon intrinsics via sse2neon, but the compiled hyperscan this way has runtime bugs on ARM. I will try simde instead.

@bzhaoopenstack
Copy link
Author

bzhaoopenstack commented Nov 19, 2019

I think we need to wait for maintainer team member to consider and reply for the following steps. Hope hyperscan team member could give some good advices. Maybe @xiangwang1

@zzqcn
Copy link

zzqcn commented Nov 20, 2019

@codecat007 With sse2neon and simde's help, I ported hyperscan 4.6.0 to ARM. It's basically working, with some bugs.

I build and run the unit test (just in unit/hyperscan/), then 3476 test cases PASSED and 169 FAILED. The failed cases: hyperscan_test_result.txt

Did you do some tests for your porting? Thanks for any suggestions.

@zzqcn
Copy link

zzqcn commented Nov 21, 2019

I have ported hyperscan v5.2.0 to ARMv7 with simde. All 3746 unit test cases PASSED (run and test with qemu-arm).

My fork: https://github.com/zzqcn/hyperscan, and my commit: zzqcn@249178a

I don't known much about SSE, Neon, etc, so any suggestion or code review is helpful for me.

@bzhaoopenstack
Copy link
Author

Hi guys, seems a post #212 in hyperscan and willing to support both of x86 and aarch64.

@zzqcn @codecat007 .

@xiangwang1 @fatchanghao @Nor7th
Hope hyperscan team could review it and leave some kind reviews.

@bzhaoopenstack
Copy link
Author

cc author to join this discussion. @tqltech

@daveMmd
Copy link

daveMmd commented May 12, 2020

@zzqcn Hi, I wonder the performance of the ported hyperscan. Does the added middle layer(simde) has a heavy impact on performance? Hope for your reply, thanks.

@mr-c
Copy link

mr-c commented May 12, 2020

@daveMmd On native x86 processors with AVX (for example) the usage of the SIMD Everywhere header-only library is optimized out by the compiler into the existing direct calls to the AVX intrinsics. On non-X86 platforms, the SIMD Everywhere headers enable the code to run where it wouldn't before, and often using the SIMD intrinsics of that non-X86 processor.

@daveMmd
Copy link

daveMmd commented May 16, 2020

@daveMmd On native x86 processors with AVX (for example) the usage of the SIMD Everywhere header-only library is optimized out by the compiler into the existing direct calls to the AVX intrinsics. On non-X86 platforms, the SIMD Everywhere headers enable the code to run where it wouldn't before, and often using the SIMD intrinsics of that non-X86 processor.

Thanks! Though the code is enabled to run both with SIMD intrinsics on x86 and non-x86 processors, the data structure is tailored for x86 processor. Thus I think there can be performance penalty on non-x86 processors. I just wonder how big the penalty is.

@mr-c
Copy link

mr-c commented May 16, 2020

the data structure is tailored for x86 processor. Thus I think there can be performance penalty on non-x86 processors. I just wonder how big the penalty is.

Good point, "SIMD Everywhere" doesn't prevent the addition of architecture specific variations later, but means you get a functional version today, which is nice for applications that have a hard dependency on hyperscan.

@tqltech
Copy link

tqltech commented May 19, 2020

@daveMmd On native x86 processors with AVX (for example) the usage of the SIMD Everywhere header-only library is optimized out by the compiler into the existing direct calls to the AVX intrinsics. On non-X86 platforms, the SIMD Everywhere headers enable the code to run where it wouldn't before, and often using the SIMD intrinsics of that non-X86 processor.

Thanks! Though the code is enabled to run both with SIMD intrinsics on x86 and non-x86 processors, the data structure is tailored for x86 processor. Thus I think there can be performance penalty on non-x86 processors. I just wonder how big the penalty is.

@mr-c @daveMmd We have modified hyperscan for armv8 processors. Improve the performance by using the NEON instructions, inline assembly, data alignment, instruction alignment, memory data prefetching, static branch prediction, code structure optimization, etc. The optimized hyperscan performance is about 80% of x86. The repository:https://github.com/kunpengcompute/hyperscan

@daveMmd
Copy link

daveMmd commented May 19, 2020

@tqltech Awesome! It must be a big work!

@hulksmaaash
Copy link

@daveMmd On native x86 processors with AVX (for example) the usage of the SIMD Everywhere header-only library is optimized out by the compiler into the existing direct calls to the AVX intrinsics. On non-X86 platforms, the SIMD Everywhere headers enable the code to run where it wouldn't before, and often using the SIMD intrinsics of that non-X86 processor.

Thanks! Though the code is enabled to run both with SIMD intrinsics on x86 and non-x86 processors, the data structure is tailored for x86 processor. Thus I think there can be performance penalty on non-x86 processors. I just wonder how big the penalty is.

@mr-c @daveMmd We have modified hyperscan for armv8 processors. Improve the performance by using the NEON instructions, inline assembly, data alignment, instruction alignment, memory data prefetching, static branch prediction, code structure optimization, etc. The optimized hyperscan performance is about 80% of x86. The repository:https://github.com/kunpengcompute/hyperscan

@tqltech I am curious to know if you have you measured your optimizations against what was done in the Marvell port for aarch64?
https://github.com/MarvellEmbeddedProcessors/hyperscan

@tqltech
Copy link

tqltech commented Jun 18, 2020

@daveMmd On native x86 processors with AVX (for example) the usage of the SIMD Everywhere header-only library is optimized out by the compiler into the existing direct calls to the AVX intrinsics. On non-X86 platforms, the SIMD Everywhere headers enable the code to run where it wouldn't before, and often using the SIMD intrinsics of that non-X86 processor.

Thanks! Though the code is enabled to run both with SIMD intrinsics on x86 and non-x86 processors, the data structure is tailored for x86 processor. Thus I think there can be performance penalty on non-x86 processors. I just wonder how big the penalty is.

@mr-c @daveMmd We have modified hyperscan for armv8 processors. Improve the performance by using the NEON instructions, inline assembly, data alignment, instruction alignment, memory data prefetching, static branch prediction, code structure optimization, etc. The optimized hyperscan performance is about 80% of x86. The repository:https://github.com/kunpengcompute/hyperscan

@tqltech I am curious to know if you have you measured your optimizations against what was done in the Marvell port for aarch64?
https://github.com/MarvellEmbeddedProcessors/hyperscan

@hulksmaaash I used the performance test tool hsbench that comes with hyperscan to measure the optimization results.

@Yikun
Copy link

Yikun commented Aug 25, 2020

Hi team, @xiangwang1 , @fatchanghao , @Nor7th

For now, does the team have any plan on aarch64 support of hyperscan upstream?

@hulksmaaash
Copy link

@Yikun I believe the answer is still the same from last year (#197 (comment)). I am curious, what is your interest in having aarch64 support for hyperscan? If there is enough external interest then I may be able to gather internal engineering support to justify the work and on-going maintenance.

@eliaslevy
Copy link

For our use case, we use Hyperscan on Linux, macOS, and Windows. With Macs beginning the transition to Apple silicon based on ARM, we are obviously interested in support for the architecture so we can continue our cross platform work using Hyperscan.

It is understandable why Intel may not be interested in supporting the architecture, but I would counter that adding support will ensure that the project continues to be a viable option for people that must support multiple platforms, instead of having them look for alternatives that they can use across the platforms they must support.

@Yikun
Copy link

Yikun commented Aug 26, 2020

@hulksmaaash Thanks for the reply.

I got some info from our product team, some friend are using Hyperscan on Linux in Kunpeng Server (which is the aarch64 based server). We also know there are some case in Amazon EC2 A1 Instances.

So we think the aarch64 support is really necessary.

@hulksmaaash
Copy link

hulksmaaash commented Sep 17, 2020

FYI, there is an Arm sponsored effort (see below) now to port and optimize hyperscan for Arm. The work as only just begun, but the end goal is to work with the maintainers to have the updates merged, and then continue to provide support for the aarch64 architecture as both the project and architecture progresses.

https://github.com/VectorCamp/hyperscan

@hulksmaaash
Copy link

For those interested, the first PR has been submitted that separates the architecture specific code to pave the way for adding aarch64 support....and any other future architectural specific code.

#272

@hulksmaaash
Copy link

hulksmaaash commented Dec 7, 2020

FYI - aarch64 port has been completed here:

with further NEON SIMD optimizations to come. PR will be submitted soon.

@hulksmaaash
Copy link

hulksmaaash commented Dec 8, 2020

PR for ARMv8 support submitted here: #287

@hulksmaaash
Copy link

hulksmaaash commented Dec 16, 2020

FYI, we have been informed that the project maintainers have

"no plan to give multi-arch support for Hyperscan"

and will

"keep Hyperscan as x86 only and deliver continuous designs and optimizations based on instruction-set from current and future Intel CPUs"

We will consider the best path forward to ensure Hyperscan will work for users who desire support for non-x86 architectures, and update those who express interest.

@evrial
Copy link

evrial commented Jan 21, 2021

Oh well, what a surprise. Progress train moves forward, RIP Intel.

@edsiper
Copy link

edsiper commented Jan 17, 2022

I was hoping this got portable to ARM too :/

@hulksmaaash
Copy link

I was hoping this got portable to ARM too :/

It did ;-) https://github.com/VectorCamp/vectorscan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests