Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

homework #10

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

homework #10

wants to merge 1 commit into from

Conversation

Mwsxy
Copy link

@Mwsxy Mwsxy commented Jan 23, 2022

提示:请在 PR 描述中说明使用的 CPU 有多少核、多少线程哦 :)
如果不希望写 PR 描述,也可以直接在代码注释里写,没问题的~

CPU: i7-6500U 2C4T(2 cores, 4 threads)
使用TBB完成循环的并行(中间也实验了下OpenMP)
saxpy部分试图使用手写SIMD指令加速,但没有什么效果,应该是编译器优化自动SIMD了吧?
输出结果如下:

原版

fill: 1.61105s
fill: 1.56055s
saxpy: 0.0412753s
sqrtdot: 0.104043s
5165.4
minvalue: 0.101617s
-1.11803
magicfilter: 0.51515s
55924034
scanner: 0.101111s
5.28566e+07

TBB后

fill: 0.6906s
fill: 0.753506s
saxpy: 0.040185s
sqrtdot: 0.024997s
5792.62
minvalue: 0.012356s
-1.11803
magicfilter: 0.256035s
55924034
scanner: 0.075566s
6.18781e+07

TOCK(fill);
return arr;
}

template <class T>
void saxpy(T a, std::vector<T> &x, std::vector<T> const &y) {
void saxpy(T a, std::vector<T> __restrict &x, std::vector<T> const __restrict &y) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看第四课,这个没有用哦!需要取出 x.data() 存到一个 __restrict 指针才行。

res.reserve(n*3);
tbb::parallel_for(tbb::blocked_range<size_t>(0, n),
[&](tbb::blocked_range<size_t> r) {
static thread_local std::vector<T> local;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

很有创意!的确可以这样来优化。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants