Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add portable support for file open with data caching supressed/eliminated. #322

Open
dcoutts opened this issue May 14, 2024 · 6 comments
Open

Comments

@dcoutts
Copy link
Contributor

dcoutts commented May 14, 2024

Is your feature request related to a problem? Please describe.

The problem is trying to use modern SSDs to their maximum performance for random I/O (particularly random reads) on normal files (not raw block devices), across multiple cores/capabilities. To do this one needs two things: good async I/O APIs and opening files in a mode that bypasses the page cache. Bypassing the page cache is needed to achieve the maximum IOPS, especially when submitting IO operations from many OS threads at once (so from many RTS capabilities). Good async I/O APIs is out of scope for this feature request.

A similar problem is wanting to do lots of random I/O while optimising the memory of the host system by not polluting the page cache with disk pages that will only be used once (to make best use of the page cache for other files that are used). Again for this use case one wants to open a file in a mode that bypasses or suppresses the page cache.

Another similar problem is wanting to do disk I/O performance benchmarking, and one needs to work around the caching that the OS does: either by dropping caches before a run and avoiding re-reading the same page twice, or avoiding caching altogether.

Describe the solution you'd like

The solution is to allow opening a file in a mode that attempts to suppresses or eliminates the use of disk/page caching for this use of this file. This is a feature that all widely used unix-like OSs support, but it is not standardised by posix:

For platforms that do not support any of these methods, the fallback should simply be to do nothing. The semantics of continuing to do caching is contained within the semantics of no caching (but with different performance characteristics).

Note also that given we will document the semantics as trying to do less/no caching, then we also don't worry about the slight difference in behaviour between OSX and FreeBSD and Linux on the use of the page cache. (OSX will use cached pages for the file if they are present already, while Linux will ignore cached pages even if there are cached pages already. This difference is only relevant for I/O benchmarks, and such programs need to be aware of a lot of platform specific details already).

The feature should be implemented as an extra boolean flag in the OpenFileFlags. The name of this field should be descriptive since there is no POSIX name to follow (and different platforms call it different things, so e.g. direct would be inappropriate). Suggestions include noCache :: Bool, since that's simply descriptive (though it happens to be what OSX uses too).

Additionally (and this is a matter of API design tastes where reasonable people may differ) one may wish to provide some feature flag that one can test to see if support is present (since no exception will be thrown if it is not present).

The documentation for the feature should also clearly describe that when using this feature, some platforms impose additional constraints on the alignment of file reads/writes and the memory buffers used for reads/writes. Optionally it may also make sense to provide some constants to give the most portable values for disk and memory alignment, or an action to obtain these alignment hints. Feedback on this aspect of the API is welcome.

Describe alternatives you've considered

The alternative is an extension package, unix-odirect or something, with just the file open support and nothing else.

Additional context

My colleagues and I are happy to implement this feature, including docs etc and shepherd it through PR review.

Related older tickets: #48 and #6. But these propose just using and exposing the non-portable O_DIRECT rather than trying to provide portable support.

API breaking changes

It would be an extra member of the OpenFileFlags record, with a default (normal caching behaviour) in the defaultFileFlags value. So this should not break most exising library users which create the OpenFileFlags record value by overriding defaultFileFlags rather than using the raw constructor.

Posix compliance

This is a feature available in all major Posix compatible OSs (even windows) but it is not standardised by POSIX.

Relevant excerpts from man pages (linked above):

  • Linux open O_DIRECT:

Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user-space buffers. The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and necessary metadata are transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT. See NOTES below for further discussion.

A semantically similar (but deprecated) interface for block devices is described in raw(8).

  • FreeBSD open O_DIRECT:

O_DIRECT may be used to minimize or eliminate the cache effects of reading and writing. The system will attempt to avoid caching the data you read or write. If it cannot avoid caching the data, it will minimize the impact the data has on the cache. Use of this flag can drastically reduce performance if not used with care.

  • OSX fcntl F_NOCACHE:

Turns data caching off/on. A non-zero value in arg turns data caching off. A value of zero in arg turns data caching on.

@hasufell
Copy link
Member

Sounds reasonable to me. Were you preparing a PR?

@dcoutts
Copy link
Contributor Author

dcoutts commented May 16, 2024

Yes, we would intend to prepare a PR.

@dcoutts
Copy link
Contributor Author

dcoutts commented May 24, 2024

New information: it turns out that the platforms that do support direct I/O or equivalent, can all set it via fcntl. It doesn't have to be set at file open time.

In particular, Linux, FreeBSD and NetBSD all support setting O_DIRECT via fcntl, and as noted above, OSX only supports it via fcntl and not file open.

So this may well be the better way to go, to add something like this to the System.Posix.Fcntl module:

fileCaching :: Fd -> Bool -> IO ()

Again, it would be a no-op on platforms that do not support such hints (e.g. solaris, openbsd).

@dcoutts
Copy link
Contributor Author

dcoutts commented May 24, 2024

Or maybe fileSetCaching / fileGetCaching. Names are hard.

The general pattern of fcntl is get/set. The two existing functions in the module don't follow the get/set pattern, but they also do not actually use fcntl.

Advice and opinions welcome.

@dcoutts
Copy link
Contributor Author

dcoutts commented May 24, 2024

Further update: OSX actually does not support a way to get the caching mode, only a way to set it.

So a portable API would be just fileSetCaching with no fileGetCaching. Or other alternative names (fileSetNoCaching, fileNoCaching).

And the CI test would be just: does the call not throw an exception (due to the syscall not returning -1). So no (portable) ability to test that what we get is the value we set.

This arguably makes sense for a portable API anyway, given that it's supposed to be a no-op on platforms where it's not supported, and on non-supporting platforms there is no such state to get.

@hasufell
Copy link
Member

So a portable API would be just fileSetCaching with no fileGetCaching.

We do have a number of APIs that are sort of platform specific. The pattern is:

#if !defined(HAVE_EXECV)
{-# WARNING executeFile
"operation will throw 'IOError' \"unsupported operation\" (CPP guard: @#if HAVE_EXECV@)" #-}
executeFile _ _ _ _ = ioError (ioeSetLocation unsupportedOperation "executeFile")
#else

So I don't see a problem with adding fileGetCaching too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants