Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nc::linalg::lstsq very slow #85

Open
66Ton99 opened this issue Nov 27, 2020 · 10 comments
Open

nc::linalg::lstsq very slow #85

66Ton99 opened this issue Nov 27, 2020 · 10 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@66Ton99
Copy link

66Ton99 commented Nov 27, 2020

Describe the bug
It takes forever to do SVD::decompose()

To Reproduce
Run linalg.lstsq with big NdArray, my was 8905 rows and 316 cols

Expected behavior
It must be much more faster

@66Ton99 66Ton99 changed the title linalg.lstsq very slow nc::linalg::lstsq very slow Nov 27, 2020
@dpilger26 dpilger26 self-assigned this Dec 2, 2020
@dpilger26 dpilger26 added enhancement New feature or request help wanted Extra attention is needed labels Dec 2, 2020
@mu578
Copy link

mu578 commented Dec 16, 2020

hello, please feel free to scavenge things there, might help you to get rid of bottlenecks. I suspect a too low tol settings

https://github.com/moe123/macadam/tree/master/macadam/details/numa

@dpilger26
Copy link
Owner

Thank you @moe123, i'll take a look. I assume you are referring to your SVD implementation here? Do you have any bench marks for this code, like on an 8905x316 matrix like above for instance?

@mu578
Copy link

mu578 commented Dec 16, 2020

@dpilger26 not really on that for now, however, parameter description is missing, I should add that. Check your implementation first. You might want to converge with a too low value; or too high expectations, would explain why reduction is very slow and somehow goes to a very high number of iterations for nothing. BTW, by any standard a 8905 x 316 matrix is something very big + beyond the dimension fact; the very nature of data must be known, is it well conditioned?; the OP should reconsider his mathematical approach to solve his problem; maybe not using the right tools here.

@Aditya-11
Copy link

Numpy lstsq function is faster . BenchMark details lstsq of A (201,200) and B (201 , 1) in numpy python it takes 6 seconds and in Numcpp it takes about 98 seconds.

@mu578
Copy link

mu578 commented Feb 12, 2021

@Aditya-11 does not mean anything the two underlaying implementations differ by a lot; what are your calling arguments ?especially double inTolerance = 1e-12 if let by default, I am not surprised, which is to me one of the culprit as I early on mentioned briefly + surely this is driven by the pre-conditioning of your data; until, you are not deciding all to be rational this very issue will never be addressed. + How do you compile this very project? which compiler ? vectorization and so on ; iq is low here; that's only the true assertion in term of quantity to be made.

@Aditya-11
Copy link

Aditya-11 commented Feb 12, 2021

Thanks @moe123 , I have to check this inTolerance parameter. Compiler clang used . If preconditioning is bad then numpy answer should also come slower I guess .

@mu578
Copy link

mu578 commented Feb 12, 2021

@Aditya-11 no numpy implementation differs by a lot (it is a ~20 years old project, so older than you), it also does take account of rcond + select different strategies for decomps + applies residual decimation steps and so on. Preconditioning is never good or bad; numerical analysis is not about pseudo-moralistic casts; but facts; it is what it is contextually; so your guess is as just as good as your previous rant a million thumb-up won't change facts, so whatever, thus, here, we are evidently talking about the logical relationship between a tolerance threshold and the nature of input data for a given algorithm using a SVD solver. The implied tolerance should not exceed something around ~ 1E-7 in the most optimistic scenario within the context of this present implementation; what are your flags ? loop-vectorize ? size. + You all seem so smart that it is too much to ask to upload your actual benchmarks ; until now, the discussion is all about entitled B.S of unverifiable assertions.

@Aditya-11
Copy link

@moe123 , I have changed the argument inTolerance to 1e-6 , still it is much slower compared to python implementation .

Benchmark details : n = 200 , a_1 (200 * 200) , b_1 (200*1) .

in c++ :
for (int i = 0 ; i < n ; i++)
auto c1 = nc::linalg::lstsq(a_1,b_1,intolerance) ;

time : 100-160s

python :
for i in range(n) :
c_ = num.linalg.lstsq(a_1 , b_1)

time : 6-8 s

Any other stuff to be changed , I had previously implemented the gmres algorithm in python using numpy and then converted to c++ using Numcpp and it was slower .Therefore I thought that lstsq might be slow .

@mu578
Copy link

mu578 commented Feb 15, 2021

@Aditya-11 do you know what's the definition of science? the very definition is a set of methods that given to one can be reproduced by another *. What's your Valgrind output or similar; where does cpu-time happen?

(*) so would you give the content of your data that an apple-to-apple comparison can take place? same remark goes for the OP, I don't know what you are smoking and in which planet do you live on; but, that's far away from here and now; if you would kindly subscribe to reality there would be a tremendous improvement. To address an issue, you need to understand it; meaning literally circumventing it; this is true with anything happening in this world; however, it requires embracing the idea of making an effort; we are not yet there.

-1 Does the fact that numpy runs faster is of any help ? no that's a given reference which we would like very much to reproduce or approach.

-2 Does the complexity of this issue resides on telling one is faster and the other one slower, over and over again; and by such, not exposing the entire problem properly and openly giving it a context that can be shared and anyone can reproduce? no, the complexity will reside when this simple basic step is achieved and resolved; (for now, you failed at it, same goes for the OP); what are the set of rational options we have for improving the actual algorithm.

@cbenitez81
Copy link

Hi @Aditya-11 can you provide the flags used in CMake, it might be something familiar with compilation of the library/executable itself, threading etc. Both numpy and this package relies on lapack which as others mentioned is quite old an stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants