Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion - Fast floats #1013

Open
ProgrammerIn-wonderland opened this issue Apr 15, 2024 · 6 comments
Open

Suggestion - Fast floats #1013

ProgrammerIn-wonderland opened this issue Apr 15, 2024 · 6 comments

Comments

@ProgrammerIn-wonderland

Currently floating point numbers are handled by Berkeley Soft floats, as stated in the readme these are precise but slow.

Could there be an optional setting in libv86 for faster floats using 64bit js or wasm floats in place of 80 bit x87 floats (with the main downside being precision)? A hack like what box86 does for floats may help performance.

@copy
Copy link
Owner

copy commented Apr 15, 2024

Yes, that would be reasonable. That said:

  • Back in the day v86 used JS floats. The switch to softfloats happened after I found that some code breaks without 80-bit floats (iirc, the printf implementation of haiku's libc). So this would need to stay opt-in
  • I suspect modern compilers generate SSE instructions (which use 64-bit floats). Those aren't optimised in v86 yet (the jit generates a call for each instruction, except for some memory moves and integer arithmetic)
  • For softfloats the jit generates calls into the berkeley library. Afaik some operations have fast paths that the jit could inline into the generated code to recover some performance

@ProgrammerIn-wonderland
Copy link
Author

I decided to see what compilers do with the following code

float addTen(float num) {
    return 10 + num;
}

on https://godbolt.org/

for 64bit targets (compiled with -O3) gcc seems to use SSE adding instructions (specifically addss) while for 32bit targets (compiled with -m32 and -O3) gcc seems to just use x87 instructions like fadd.

On the contrary, clang uses addss for both 32 and 64bit targets

not sure if this information is helpful but I hope it provides some insight

@SuperMaxusa
Copy link
Contributor

while for 32bit targets (compiled with -m32 and -O3) gcc seems to just use x87 instructions like fadd.

Can you try to add -mfpmath=sse to compiler parameters (see https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/x86-Options.html)? With this option, gcc uses movss and addss instead fadd: https://godbolt.org/z/3M6YW7cYK

@ProgrammerIn-wonderland
Copy link
Author

ProgrammerIn-wonderland commented Apr 17, 2024

while for 32bit targets (compiled with -m32 and -O3) gcc seems to just use x87 instructions like fadd.

Can you try to add -mfpmath=sse to compiler parameters (see https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/x86-Options.html)? With this option, gcc uses movss and addss instead fadd: https://godbolt.org/z/3M6YW7cYK

yeah uses sse for floating point addition for me now.
is sse math currently slower than x87 math in v86 or has this not been benchmarked

also compiling clang with -march=i686 yields normal x87 instructions. I believe most 32bit distros target i686 so it should mean sse shouldn't be present in distro packages since i686/pentiumpro didn't have sse either way

@ProgrammerIn-wonderland
Copy link
Author

ProgrammerIn-wonderland commented Apr 17, 2024

Screenshot_20240416_232217
my rather unscientific test ran on my laptop (Intel Core i7-1165g7) with a compiled version of https://www.netlib.org/benchmark/linpackc.new shows that SSE seems to be faster inside of v86

@SuperMaxusa
Copy link
Contributor

I believe most 32bit distros target i686 so it should mean sse shouldn't be present in distro packages since i686/pentiumpro didn't have sse either way

In ArchLinux32 only i486 doesn't use SSE: https://archlinux32.org/architecture/.
Additionally, some Linux distros and software are classificating an i(3 or 6)86 as all 32-bit CPUs up to P4 without dividing on supported instructions such as CMOV and SSE1-3: https://gitlab.alpinelinux.org/alpine/tsc/-/issues/20, https://lists.debian.org/debian-devel/2015/09/msg00595.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants