safe vs. unsafe foreign imports #34

iustin · 2014-12-31T22:08:50Z

Hi,

I'm a bit surprised that various I/O (in the filesystem sense, not Haskell IO) functions are imported as unsafe. While they won't call back to Haskell, depeding on which filesystem they talk to, they can take arbitrary amounts of time, blocking an execution context for this entire duration.

Would it make sense to also have "safe" version of the functions, for the case when performance is less of an issue compared to safety? Would it make sense to at least document that they are imported unsafe?

For the record, I've run a simple benchmark comparing "chmod" performance on various local and remote filesystems and the difference between unsafe and safe is between 10% slower (nfs) to 25% slower (local real filesystems) to 30% (tmpfs). So there is a performance hit indeed.

The text was updated successfully, but these errors were encountered:

cartazio · 2015-01-01T00:23:11Z

this sounds like a valid/good idea! What should be the default is a more complex question, but that aside, both styles should be available!

simonmar · 2015-01-05T09:14:09Z

Unsafe foreign imports which can block for unpredictable amounts of time cause performance problems that only emerge when scaling to multiple cores, because they delay the GC sync. This is a really annoying problem if it happens to you, because it's almost impossible to diagnose, and if it happens due to an unsafe call in a library then it's also really hard to fix.

So the safe versions should be the default, and if we have them at all, the unsafe versions should be named unsafeFoo or put in a separate .Unsafe module, with clear documentation about the possible implications.

gregorycollins · 2015-01-05T16:41:17Z

Agreed, if the call would ever block (and that includes most filesystem functions) that means you want "safe".

iustin · 2015-01-05T16:53:38Z

Glad to see people agree :)

I'll try and prepare a patch then.

redneb · 2015-01-05T17:18:49Z

Even worse, open(2) (which is used by openFd) is marked unsafe despite the fact the it can block forever, eg when opening a named pipe with O_RDONLY and if no one has opened it for writing.

Let me also point out that for many of the functions in unix, the import actually happens in System.Posix.Internals from base.

hvr · 2015-01-09T15:06:32Z

@Lustin are you going to provide such a patch soon? I think we should try to get this as unix-2.7.1.1 (if we can consider this a patch-level bump bugfix) into GHC 7.10.1RC2 ...

iustin · 2015-01-10T08:28:21Z

@hvr - yep, now that the weekend has arrived I will - I don't quite have time during the week, sorry :)

iustin · 2015-01-11T23:09:09Z

So, question time. I've looked at the unix library code and also a bit at base (thanks @redneb for the hint). 'unix' alone has around 180 unsafe 'foreign imports'. We have a couple of ways to change things.

Provide only safe variants for the imports that can block or take a long time. This would be simplest and cleanest solution, but has the drawback of not giving any choice to the user. It's debatable whether a choice is useful here, but who knows.
Provide the safe and unsafe alternatives. Because in many cases, the C imports are not used as is, but instead are wrapped in a layer that makes them easier to use in Haskell, it means that we would have to either duplicate the code, or add a 'Bool' parameter that denotes the safe/unsafe version. This is ugly, so even if we only presented to the user the two functions (the safe and unsafe alternatives, partially applied), I don't quite like this solution. Actually implementing it would mean one of the following:
- separate "Unsafe" modules; this is unwieldy due to the number of modules that would be needd (we'd go from System.Posix.Files, System.Posix.Files.ByteString to Files, Files.Unsafe, Files.ByteString, Files.ByteString.Unsafe, urgh)
- functions in the same module, prefixed with unsafe; this would be cleaner since at least all the code is in the same place, and keeps module count sane
Provide a cabal flag to switch between the safe and unsafe alternatives; while this could work for unix, I don't see it working for base, so I just mention it as a possibility, but don't actually suggest it.

Given these options, I would propose going with the first one - no choice, but rather an informed switch from unsafe to safe, case-by-case. Howver, I've run some more benchmarks and, on the relatively slow laptop that I tested, criterion gives me:

a "noop" C call (void noop() { }) runs in around 20ns unsafe, and 240ns safe; that means, the worst slowdown is 12×
a getpid call, which is the most trivial after noop, is 26ns vs 250ns (<10×)
a getenv call, less trivial but still cheap, is 153ns vs 365ns (2.4×)
a getgrgid call, which is expensive and can block (when nsswitch contains remote service), takes (with local files only) 24.6µs vs 26µs, so only 5% slower

These kind of slowdowns range from close to noise to very significant, so I'm not sure anymore if we can afford not to give the option of switching, if we find a function call that is fast but could still block.

Thoughts on which way to go? So far I've narrowed down the list of potential "dangerous" ffi imports in 'unix' to around 80, mostly related to files, but not only. But still it's a lot of functions that would need to be duplicated…

cartazio · 2015-01-12T02:34:15Z

I'd vote for providing .UnSafe modules personally, if only becomes some of these system calls have different performance characteristics depending on the underlying unix system, so being overly prescriptive might be a bad idea.

simonmar · 2015-01-12T08:49:29Z

One other option:

Default to safe for all the dangerous imports
Provide a compile-time cabal flag to switch to unsafe
Upload unix-unsafe to Hackage which is the same as unix but with the imports defaulting to unsafe.

hvr · 2015-01-12T09:05:13Z

@cartazio I'm against .UnSafe modules, since as it was pointed out, this would blow up the hierarchy to contain stuff like .ByteString.UnSafe (or .UnSafe.ByteString);

(on a related note, at some point we might want to add ByteString-I/O based (i.e. ByteString for the payload) operations, that may bring yet another .ByteString... suffix to the hierachy)

@simonmar the main problem I see with unix-unsafe package is that you'd have to depend on unix, as otherwise you'd end up with divergent types, if you need to pass values of types defined in unix to operations of unix-unsafe; this may be quite a maintainance overhead of keeping unix-unsafe in sync w/ unix

Fwiw, I like the idea of a manual cabal flag to switch to the old unsafe semantics, but defaulting everything that's dangerously blocking to safe by default. I'd rather err on the safe-side and give people the option to turn off all safety-guards locally if they really want to. Let somebody else create an unix-unsafe package if there's really demand for it.

hvr · 2015-01-12T16:22:55Z

Fyi, as this isn't a new regression we'll probably not target GHC 7.10.1 for this fix

cartazio · 2015-01-12T20:51:38Z

will making all the current calls use safe be part of 7.10?

iustin · 2015-01-12T22:40:51Z

@hvr: makes sense; also, we shouldn't hurry to fix this and introduce an API that is painful to correct later.
@cartazio: as I understand, no.

The only thing I don't like about the cabal flag is that this will be a bit cumbersome to fix in "base" (which also has this kind of unsafe imports). I mean that I'd be surprised if anyone actually ever rebuilds base in their install, but maybe it does happen. On the other hand, the flag keeps the API clean and simple.

In any case, I'll write the cabal patch - it might take a bit to make sure I mark the correct imports safe; we'll see how it goes from there. Thanks for the feeback!

hvr · 2015-01-12T22:45:23Z

@cartazio while making all (potentially) blocking calls safe is the morally-right-thing-to-do(tm), we have the problem of not knowing how bad the performance impact will be for real-world code. And it's a bit late in the release cycle for 7.10.1 to introduce such a risk, while OTOH we have had those unsafe FFI calls for ages and nobody complained till now...

Maybe we can consider it for GHC 7.10.2 if we have some benchmark data for syscall-heavy real-world (test)programs by then...

cartazio · 2015-01-13T01:49:12Z

@iustin @hvr hrmm, i'm not sure how the flag based approach makes sense, how would I mix using the safe and unsafe calls in a single program then?

@hvr yeah, that makes sense, we'll need some testing / verification.

hvr · 2015-01-13T07:18:32Z

@cartazio the flag approach wouldn't allow that obviously, it's more about being able to turn it off globally and revert to the old unsafe behavior for special use cases (like benchmarking, comparing old/new behaviour, and/or programs where you are confident it won't make any problems)

simonmar · 2015-01-13T14:08:57Z

The point of unix-unsafe is that it lets you express it as a dependency, and it can co-exist with unix. I don't think there would be a problem with mixing types from the two packages, because unix is used primarily for its operations, not its types. I doubt we ever have types from unix appearing in APIs.

hvr · 2015-01-13T17:41:49Z

@simonmar but if we have a separate package anyway why the hassle w/ the flag?

simonmar · 2015-01-14T21:42:51Z

@hvr the flag means that the difference between the unix and unix-unsafe packages is a single line in the unix.cabal file, so it's easy to automatically generate unix-unsafe and to check that it's consistent with unix.

hvr · 2015-01-14T22:09:45Z

@simonmar ...actually two lines (assuming you don't use {;}-syntax, and have a very long single-line .cabal file =)

simonmar · 2015-01-14T22:11:57Z

Surely just a default: False turns into default: True? Anyway if it's two lines I wouldn't call it a problem.

simonmar · 2015-01-14T22:12:45Z

Ah, the package name of course :)

cartazio · 2015-01-14T23:34:24Z

Would someone wanting to use both need to use package import syntax?
On Jan 14, 2015 5:13 PM, "Simon Marlow" notifications@github.com wrote:

Ah, the package name of course :)

—
Reply to this email directly or view it on GitHub
#34 (comment).

hvr · 2015-01-15T07:10:17Z

Would someone wanting to use both need to use package import syntax?

yes, as they'd have conflicting module names if the only change is a different .cabal file

glguy · 2015-10-13T07:16:22Z

Would it be possible to come to a consensus on this issue? Is the way forward to make a new unix-unsafe with the two line change in the cabal file?

iustin · 2015-10-14T19:32:59Z

@glguy - not quite. Not all calls need to be marked safe. My plan, before I dropped the ball on this one, was:

investigate and annotate all FFI calls that can block with CANBLOCK
the default unix would #define CANBLOCK safe
unix-unsafe would do the opposite

Thanks for bumping this up!

rwbarton · 2016-01-27T17:30:55Z

I noticed this too while preparing a patch to provide tcdrain() on Android—a function whose only purpose is to block!

The plan above sounds reasonable to me.

Since connectNamedPipe is basically a wait/sleep function, we need to use *safe* foreign call, because they delay the GC sync. For example see Simon Marlow's comment: haskell/unix#34 (comment)

nh2 · 2018-05-16T21:48:19Z

iustin · 2018-05-17T19:50:22Z

Fun. So even GHC itself has issues, not just the unix library?

I had back in 2015 an in-progress patch, but that's rotten code by now, and I hadn't had the need/motivation to push this further so far.

nh2 · 2018-05-17T19:53:27Z

Hey Iustin, do you have the old patch somewhere? I may be able to work on this in the future an it would be interesting to see what you did.

iustin · 2019-02-11T20:43:38Z

Hey, so inbox cleanup time: I looked for my development tree and I cannot find it anywere, not even in backups. So I think you (or anyone else) should start from scratch…

simonmar · 2019-11-19T10:10:03Z

Anyone feel like fixing this? We just ran into a real-world case where unlink() was blocking for long periods (sometimes up to 2 minutes!) which caused our server to be unresponsive during that time due to the unsafe FFI call. These things can be very difficult to diagnose when they happen in production.

nh2 · 2019-11-19T11:03:34Z

I'm personally very busy currently but FP Complete would be happy to do it for Facebook to fix this (in unix, GHC and across the library ecosystem) if it need be quick.

iustin · 2019-11-19T12:24:37Z

For the record, I found my old patch in a laptop backup :), I can post it for reference (it doesn't merge at all, of course).

nh2 · 2019-11-19T12:28:59Z

@simonmar On https://gitlab.haskell.org/ghc/ghc/issues/13296#note_132146 you mentioned work on making safe syscalls faster (https://phabricator.haskell.org/D1466 - RFC: foreign import ccall nonblocking).

Is this something that should still be done along with changes in here and GHC to address the performance reservations you had back then, or do you think we should switch unix (or more) to safe even if that work remains for later?

cartazio · 2019-11-19T14:16:48Z

What’s stopping us from moving the known problematic calls to safe? Their overhead will be on the order of a quarter microsecond. But that’s fine

…

On Tue, Nov 19, 2019 at 7:29 AM Niklas Hambüchen ***@***.***> wrote: @simonmar <https://github.com/simonmar> On https://gitlab.haskell.org/ghc/ghc/issues/13296#note_132146 you mentioned work on making safe syscalls faster (https://phabricator.haskell.org/D1466 - *RFC: foreign import ccall nonblocking*). Is this something that should still be done along with changes in here and GHC to address the performance reservations you had back then, or do you think we should switch unix (or more) to safe even if that work remains for later? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#34?email_source=notifications&email_token=AAABBQSFBSGFWE6LXY3AZKLQUPLYZA5CNFSM4AZWIHN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEOAXOQ#issuecomment-555486138>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQTCMQYB53LVXBQBXXLQUPLYZANCNFSM4AZWIHNQ> .

iustin · 2019-11-19T20:02:10Z

@cartazio - the conclusion back in 2015 was that the slowdown was 10× between safe and unsafe calls; for some workloads, this might not be a valid trade-off.

In any case, I'm attaching the WIP patch I had back then. The work is not very complex, but rather tedious - look at all call sites, and decide (by reading man pages and such) whether that's a safe, unsafe, or interruptible call, and annotate it as such.

0001-WIP.patch.gz

simonmar · 2019-11-20T09:23:26Z

@simonmar On https://gitlab.haskell.org/ghc/ghc/issues/13296#note_132146 you mentioned work on making safe syscalls faster (https://phabricator.haskell.org/D1466 - RFC: foreign import ccall nonblocking).

Is this something that should still be done along with changes in here and GHC to address the performance reservations you had back then, or do you think we should switch unix (or more) to safe even if that work remains for later?

I suspect for the majority of calls in unix the extra overhead of safe won't matter. At least from my perspective it would be much better to avoid the unpredictable GC stalls than to save a bit of time on every call.

I think it would be nice to work on that patch some more and see if it's worthwhile, but it's probably not something I'll get to unless the performance of safe calls ends up on my critical path somehow.

Carter said:

What’s stopping us from moving the known problematic calls to safe? Their overhead will be on the order of a quarter microsecond. But that’s fine

If it is 200ns or so that would clearly be fine, I'm not sure I completely believe that figure though. A safe call can trigger a context switch, but multiple safe calls will still only trigger a single context switch so it doesn't give you a genuine idea of the overhead to time 10^6 safe calls and then divide the time by 10^6.

iustin · 2019-11-20T10:26:13Z

The mistake I made back when I did try to fix this was to try to send a single big patch. If flag at build time is (still) OK, then I can send small patches, converting sets of functions over.

Another problem is that I have no idea how this could be tested for correctness at all.

iustin · 2019-11-20T13:15:47Z

@simonmar - regarding context switches - since many of these calls end up as syscalls, they will involve a context switch anyway, no?

I wonder, for example, if it's ever worth for an open() call to be marked unsafe, since the slowdown will be userspace/kernel switch, and not just haskell thread switch?

simonmar · 2019-11-22T07:57:15Z

@simonmar - regarding context switches - since many of these calls end up as syscalls, they will involve a context switch anyway, no?

I believe system calls are more efficient than context switches.

I wonder, for example, if it's ever worth for an open() call to be marked unsafe, since the slowdown will be userspace/kernel switch, and not just haskell thread switch?

The answer here is to measure it :)

nh2 · 2019-11-22T13:47:09Z

I haven't measured it myself yet, but this answer https://stackoverflow.com/questions/32748530/on-linux-is-access-faster-than-stat/32749760#32749760 suggests that cached syscalls of the stat family take 1-4 microseconds. But I don't know on which machine or Linux version, so we should still verify it.

If the overhead turns out negligible as we hope for, my favourite outcome would be to use safe unconditionally, without a flag.

iustin · 2019-11-22T17:52:23Z

I don't think that's a good idea. First, there is the safe vs interruptible difference.

Even with that aside, for things which don't cause a syscall but instead are pure userspace, using unsafe seems wasteful, no?

That's assuming we can determine what a given libc function does in a reliable way.

If not, I agree, safe is the right answer. Hmm, more convinced now :)

iustin · 2019-11-22T17:59:45Z

Put another way, I expect proudly probably 60% interruptible calls, 35% safe calls, 5% unsafe (e.g getpid* family).

nh2 · 2019-11-22T18:03:48Z

First, there is the safe vs interruptible difference.

By all means interruptible is even better where supported, but that seems like a follow-up improvement to me.

If we find it's easy to do it in the same go, I'm for it (I just found in the past that checking whether interruptible will work is more time intensive than switching from unsafe to safe, so it may make sense to first do the latter to get rid fo the hangs that arise indpendent of interruptibility).

Even with that aside, for things which don't cause a syscall but instead are pure userspace, using unsafe seems wasteful, no?

Sure, we're still talking about your scope from the issue description:

various I/O (in the filesystem sense, not Haskell IO)

nh2 · 2019-11-22T18:07:51Z

Ah I also just noticed I can make a plug here:

Regarding interruptible, hatrace could be used to test that the interruption signal is indeed delivered to the syscall.

If we don't want to add tests in unix, we could also add them to hatrace's test suite like I did for a GHC bug.

See haskell/unix#34.

iustin mentioned this issue Jan 5, 2015

safe vs. unsafe FFI calls redneb/disk-free-space#1

Closed

dbeacham mentioned this issue May 22, 2017

Addition of unsafe calls ndmitchell/hexml#10

Closed

nh2 mentioned this issue Nov 19, 2018

Use unsafe for most of the FFI thumphries/foundationdb#18

Open

iustin mentioned this issue Nov 20, 2019

Introduce flag for overall FFI calls safe/unsafe behaviour #140

Closed

WJWH mentioned this issue Sep 9, 2020

Make safe/unsafe foreign calls size dependent haskell/bytestring#282

Open

nh2 added a commit to JohnLato/posix-paths that referenced this issue Mar 5, 2021

Use safe calls for FS IO to not block a capability.

e036f49

See haskell/unix#34.

nh2 added a commit to JohnLato/posix-paths that referenced this issue May 15, 2021

Use safe calls for FS IO to not block a capability.

fd74e05

See haskell/unix#34.

hasufell added type: enhancement help wanted labels Jul 10, 2022

redneb mentioned this issue Oct 11, 2022

Use capi for syscalls that break under musl's handling of 64-bit time_t #252

Merged

hsyl20 mentioned this issue Dec 1, 2023

waitToSetLock can block but is an unsafe FFI import #310

Open

safe vs. unsafe foreign imports #34

safe vs. unsafe foreign imports #34

Comments

iustin commented Dec 31, 2014

cartazio commented Jan 1, 2015

simonmar commented Jan 5, 2015

gregorycollins commented Jan 5, 2015

iustin commented Jan 5, 2015

redneb commented Jan 5, 2015

hvr commented Jan 9, 2015

iustin commented Jan 10, 2015

iustin commented Jan 11, 2015

cartazio commented Jan 12, 2015

simonmar commented Jan 12, 2015

hvr commented Jan 12, 2015

hvr commented Jan 12, 2015

cartazio commented Jan 12, 2015

iustin commented Jan 12, 2015

hvr commented Jan 12, 2015

cartazio commented Jan 13, 2015

hvr commented Jan 13, 2015

simonmar commented Jan 13, 2015

hvr commented Jan 13, 2015

simonmar commented Jan 14, 2015

hvr commented Jan 14, 2015

simonmar commented Jan 14, 2015

simonmar commented Jan 14, 2015

cartazio commented Jan 14, 2015

hvr commented Jan 15, 2015

glguy commented Oct 13, 2015

iustin commented Oct 14, 2015

rwbarton commented Jan 27, 2016

nh2 commented May 16, 2018

iustin commented May 17, 2018

nh2 commented May 17, 2018

iustin commented Feb 11, 2019

simonmar commented Nov 19, 2019

nh2 commented Nov 19, 2019

iustin commented Nov 19, 2019

nh2 commented Nov 19, 2019

cartazio commented Nov 19, 2019 via email

iustin commented Nov 19, 2019

simonmar commented Nov 20, 2019

iustin commented Nov 20, 2019

iustin commented Nov 20, 2019 • edited

simonmar commented Nov 22, 2019

nh2 commented Nov 22, 2019

iustin commented Nov 22, 2019 • edited

iustin commented Nov 22, 2019

nh2 commented Nov 22, 2019

nh2 commented Nov 22, 2019

iustin commented Nov 20, 2019 •

edited

iustin commented Nov 22, 2019 •

edited