Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sys_get_temp_dir() instead of vendor-dir as temp dir for downloading and extracting #10361

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Sarke
Copy link

@Sarke Sarke commented Dec 13, 2021

So this is because I scratched my head several times regarding #9627, #9945, etc. This is a small merge that solved the problem.

I'm using Docker, inside Virtualbox, on a Windows host, with Dropbox syncing the shared folder going across all. Yes, not ideal, but the issue of the shared folder not keeping up, be it a VM shared folder, a network drive, or whatever.

The above mentioned bugs are marked fixed, but still some people are having issues (incl. myself). The fix was to add a short delay for the filesystem operations to allow it to catch up. This is quite arbitrary because the length of the required delay will vary from system to system.

But the problem is one layer deeper: why is Composer using the vendor dir as a temp dir? This is really what causes the problem: it will download a temp file, extract the temp file to a temp dir, delete the temp file, and move the temp dir to the actual install path.

Since many people are having issues due to the above mentioned scenarios (VMs, NFS drives, etc), this is not ideal. Composer uses sys_get_temp_dir() in other situations, so I am unsure of why it wasn't used for the downloader.

So, I put sys_get_temp_dir() in two places that fixes this. I am new to the Composer code so I'm sure this merge can use some tweaks.

One thought is why does FileDownloader not use the cache dir?

Another is that I only made the changes that affect me, but I see there are several different archives supported so some of them could be changed to use the system tmp dir as well. Perhaps a central temp dir helper would be useful, and this would also allow this to more easily be a configurable option (if so desired).

Unrelated, but I found setting up the test environment a bit of a hassle with the 100+ warning about deprecations in PHPUnit 10 on PHP 8.1. Perhaps a Docker test environment could be made available to ease the entry for potential contributors? Looking at the github CI files it looks like PHP 7.3 is the preferred version to test on?

@stof
Copy link
Contributor

stof commented Dec 13, 2021

This caused issues in the past, when the temp directory and the vendor directory are not on the same drive, when attempting to move the directory.

Unrelated, but I found setting up the test environment a bit of a hassle with the 100+ warning about deprecations in PHPUnit 10 on PHP 8.1.

PHPUnit 10 is not released yet, so I doubt that this is the case.

@GromNaN
Copy link
Contributor

GromNaN commented Dec 13, 2021

If place for the temp dir is different from one system to an other, and that can't be detected automatically, what about a new config?

@Sarke
Copy link
Author

Sarke commented Dec 13, 2021

This caused issues in the past, when the temp directory and the vendor directory are not on the same drive, when attempting to move the directory.

What kind of issues? I didn't see any tests for this.

Also, why not extract to the desired path right away, instead of extracting to a temp dir and then moving?

Unrelated, but I found setting up the test environment a bit of a hassle with the 100+ warning about deprecations in PHPUnit 10 on PHP 8.1.

PHPUnit 10 is not released yet, so I doubt that this is the case.

This is on the latest stable (v9.5), and that's why they are warnings and not fatal errors:

117) Composer\Test\Util\PerforceTest::testCheckServerClientError
The at() matcher has been deprecated. It will be removed in PHPUnit 10. Please refactor your test to not rely on the order in which methods are invoked.

118) Composer\Test\Util\RemoteFilesystemTest::testCopyWithSuccessOnRetry
The at() matcher has been deprecated. It will be removed in PHPUnit 10. Please refactor your test to not rely on the order in which methods are invoked.

119) Composer\Test\Util\RemoteFilesystemTest::testBitBucketPublicDownloadWithAuthConfigured with data set #0 ('https://bitbucket.org/seldaek...me.txt', '1234')
The at() matcher has been deprecated. It will be removed in PHPUnit 10. Please refactor your test to not rely on the order in which methods are invoked.

@Sarke
Copy link
Author

Sarke commented Dec 13, 2021

If place for the temp dir is different from one system to an other, and that can't be detected automatically, what about a new config?

But this is set in the PHP config, so Composer doesn't need to determine where this is.

@Sarke
Copy link
Author

Sarke commented Dec 14, 2021

One thought is why does FileDownloader not use the cache dir?

To expand on this, we could just download the file straight to the cache dir and extract it straight to the final dir in the vendor dir (without moving). This would remove the need for a temp dir in both cases, and fix the filesystem delay issues.

@stof
Copy link
Contributor

stof commented Dec 14, 2021

We cannot extract directly in the right location. For instance, the zip archive generated by GitHub (which are the majority of zips downloaded by composer, as the majority of packages registered on packagist.org are on GitHub) have a top-level folder wrapping the content of the repo. So we need to extract the archive into a temp location and then move the subfolder to the final location.

@ktomk
Copy link
Contributor

ktomk commented Dec 14, 2021

This caused issues in the past, when the temp directory and the vendor directory are not on the same drive, when attempting to move the directory.

Yes, on the level of move this makes sense (it may not be possible to move files across different file-systems). But on the level of copy (not move) it should not cause any issues. As the directory is temporary, a copy operation looks more correct in my eyes. The temporary store can be lightened up during the copy transaction or afterwards. I'm aware this might not be as simple as it sounds on paper.

We cannot extract directly in the right location. For instance, the zip archive generated by GitHub (which are the majority of zips downloaded by composer, as the majority of packages registered on packagist.org are on GitHub) have a top-level folder wrapping the content of the repo. So we need to extract the archive into a temp location and then move the subfolder to the final location.

Sounds reasonable but still thought most unzipping utility should have that functionality without dumping into a temporary directory. If it helps I'd say perhaps here is one point where an improvement can be achieved: if it helps removing the temporary location from the equation overall. I can imagine this depends on different zip adapters (unzip(1), ext-zip etc.) so again might not be as simple as it sounds, just saying.

@Sarke
Copy link
Author

Sarke commented Dec 17, 2021

So then, how about an option to set the tmp folder used for extracting?

@Seldaek
Copy link
Member

Seldaek commented Dec 19, 2021

But the problem is one layer deeper: why is Composer using the vendor dir as a temp dir? This is really what causes the problem: it will download a temp file, extract the temp file to a temp dir, delete the temp file, and move the temp dir to the actual install path.

Just to correct this statement, what really causes this problem is VirtualBox and co shipping broken filesystem drivers which do not allow you to read a file right after writing it. Composer is merely trying to work around these issues.

Sounds reasonable but still thought most unzipping utility should have that functionality without dumping into a temporary directory. If it helps I'd say perhaps here is one point where an improvement can be achieved: if it helps removing the temporary location from the equation overall. I can imagine this depends on different zip adapters (unzip(1), ext-zip etc.) so again might not be as simple as it sounds, just saying.

It seems like unzip does not allow you to do this at least. Using ZipArchive would work but that's also much slower.


I am overall not so sure what to do here. What we have works as long as the filesystem isn't borked.. @Sarke have you tried setting COMPOSER_RUNTIME_ENV=virtualbox in your env to make sure the virtualbox workarounds kick-in?

@ktomk
Copy link
Contributor

ktomk commented Dec 19, 2021

I'm also a bit puzzled and still wondering.

And if it helps and this patch with sys_get_tempdir() works, then having the complete install outside of the project tree should do as well. Just move after install, working example with existing options:

$ COMPOSER_VENDOR_DIR="$HOME/tmp/composer-vendor" composer install
Installing dependencies from lock file (including require-dev)
...
$ rm -rf vendor
$ mv -vi -- "$HOME/tmp/composer-vendor" vendor

Just tried it with composer in a development setup, this is a working example on linux, e.g. you should be able to test it within the guest.

I could imagine such a procedure can be made more portable with a composer script and then you have it also directly in the project configuration (thanks to @putenv support in scripts). See as well COMPOSER_VENDOR_DIR and <vendor-dir>.

Same procedure can be applied for COMPOSER_RUNTIME_ENV=virtualbox, HTH.

@Seldaek
Copy link
Member

Seldaek commented Dec 19, 2021

Installing outside the project tree isn't as simple as it looks, with things like composer/installers, packages get installed outside the vendor dir, blended with existing project files etc. It's easy to come up with happy path solutions but those don't cover all cases.

@Sarke
Copy link
Author

Sarke commented Dec 19, 2021

Just to correct this statement, what really causes this problem is VirtualBox and co shipping broken filesystem drivers which do not allow you to read a file right after writing it. Composer is merely trying to work around these issues.

Someone mentioned they had trouble with a NFS volume as well. Isn't the issue that any filesystem with enough lag will have problems?

Also, the cause can be out of your hands, but the reason for using the vendor dir as a temp dir is not.

What we have works as long as the filesystem isn't borked.. @Sarke have you tried setting COMPOSER_RUNTIME_ENV=virtualbox in your env to make sure the virtualbox workarounds kick-in?

Yes, and I linked the issue detailing that solution. I also mentioned that the delay is arbitrary and will change from system to system; so how do we know if the delay is long enough?

It seems like unzip does not allow you to do this at least. Using ZipArchive would work but that's also much slower.

Perhaps an option to set the desired unziper would be enough?

@Seldaek
Copy link
Member

Seldaek commented Dec 19, 2021

One question is whether forcing ZipArchive usage isn't enough to fix it. My understanding (which is really just stabbing-in-the-dark understanding) is that it is not only a timing issue but one of forking a new unzip process doing the extraction of the files, and then the files aren't available to the parent process for a while.. But I would imagine if the PHP process does the file writing it might be directly available to it. Otherwise I can't see how these filesystems could be used in any real world conditions.

As I don't have a setup to repro this, it'd be great if you can try it @Sarke. Essentially a patch would look like:

if (!function_exists('proc_open')) {

Turned into:

        if (!function_exists('proc_open') || Platform::getEnv('COMPOSER_RUNTIME_ENV') === 'virtualbox') {

Then set the env and see if it helps?

@Sarke
Copy link
Author

Sarke commented Dec 19, 2021

Sure, I will try in a bit.

Of note though, it's not just the files being available after reading, but lag when deleting as well. If trying to delete a file that was just written, it will error.

@Seldaek
Copy link
Member

Seldaek commented Dec 19, 2021

Right, the Composer\Util\Filesystem::removeDirectory has a removeDirectoryPhp fallback that could be used too if using external processes is an issue, this is similar as it uses rm to do the job faster.

@ktomk
Copy link
Contributor

ktomk commented Dec 19, 2021

Installing outside the project tree isn't as simple as it looks, with things like composer/installers, packages get installed outside the vendor dir, blended with existing project files etc. It's easy to come up with happy path solutions but those don't cover all cases.

Yes, the working directory remains unchanged. Therefore those installers and plugins that blend into it run into the same issue as composer install before when extracting into the projects' vendor subdirectory. So just changing the temporary in-vendor download and extraction directory won't help for those, it's just shifting the problem a bit around.

Someone mentioned they had trouble with a NFS volume as well. Isn't the issue that any filesystem with enough lag will have problems?

Yes, the general problem could be described as lag or that the file-system is out of sync. These configurations easily cause problems with various utilities, process managers and (application) servers. Sometimes reloading the configuration already helps, sometimes you need to restart the VM because the system can not recover on its own. Depends on the file-system activity and host/guest operating system. I once had a throughput optimized file system removing some barriers on a Linux and when doing a composer install with a magento or spryker project using virtual box, the host systems kernel panicked due to the file-system activity during install. lessons learned ;). So just saying, perhaps seeing lag sometimes is not that bad compared with trashing the VM host.

And if this thread finds some conclusions turning such setups into slower but correct is perhaps the best way to deal with it.

Thinking about this and as the file-system activity is high for download and extracting an option <tmp-vendor-dir> with a default of <vendor-dir>/composer/ that a user can set to $TMP/composer.XXXX/ or similar for an install could remove the i/o pressure from such shared folders setups - both on host or guest systems.

If the moving fails - or anything from the reasons to have it inside vendor in the first place - you can at least decide which death you'd like to die ;)

@Sarke
Copy link
Author

Sarke commented Dec 19, 2021

It's a bit finicky to test because it doesn't always error.

When I made the suggested change, it errored on

public function ensureDirectoryExists($directory)
{
if (!is_dir($directory)) {
if (file_exists($directory)) {
throw new \RuntimeException(
$directory.' exists and is not a directory.'
);
}
if (!@mkdir($directory, 0777, true)) {
throw new \RuntimeException(
$directory.' does not exist and could not be created.'

Exception trace:
 () at composer/src/Composer/Util/Filesystem.php:270
 Composer\Util\Filesystem->ensureDirectoryExists() at composer/src/Composer/Util/Filesystem.php:377
 Composer\Util\Filesystem->copy() at composer/src/Composer/Util/Filesystem.php:352
 Composer\Util\Filesystem->copyThenRemove() at composer/src/Composer/Util/Filesystem.php:438
 Composer\Util\Filesystem->rename() at composer/src/Composer/Downloader/ArchiveDownloader.php:193
 Composer\Downloader\ArchiveDownloader->Composer\Downloader\{closure}() at n/a:n/a
 call_user_func() at composer/vendor/react/promise/src/React/Promise/FulfilledPromise.php:20
 React\Promise\FulfilledPromise->then() at composer/src/Composer/Downloader/ArchiveDownloader.php:214
 Composer\Downloader\ArchiveDownloader->install() at composer/src/Composer/Downloader/DownloadManager.php:283

I also noticed that the unmodified Composer code errored on

// retry after a bit on windows since it tends to be touchy with mass removals
if (Platform::isWindows()) {
usleep(350000);
$deleted = @rmdir($path);
}

Since I'm using a Linux VM on a Windows host, this might be changed to this? Since it's actually a Windows filesystem, but it's not detected as such.

if (Platform::isWindows() || Platform::getEnv('COMPOSER_RUNTIME_ENV') === 'virtualbox') {

I will continue testing.

@ktomk
Copy link
Contributor

ktomk commented Dec 19, 2021

When I made the suggested change, it errored on [..]

is_dir() etc are affected by stat cache. you could try adding a call to clearstatcache() in front and see if it helps. The PHP manual page also has some more information which functions are affected etc..

@Sarke
Copy link
Author

Sarke commented Dec 19, 2021

Yes, the general problem could be described as lag or that the file-system is out of sync. These configurations easily cause problems with various utilities, process managers and (application) servers. Sometimes reloading the configuration already helps, sometimes you need to restart the VM because the system can not recover on its own. Depends on the file-system activity and host/guest operating system. I once had a throughput optimized file system removing some barriers on a Linux and when doing a composer install with a magento or spryker project using virtual box, the host systems kernel panicked due to the file-system activity during install. lessons learned ;). So just saying, perhaps seeing lag sometimes is not that bad compared with trashing the VM host.

Yeah, I was just don't want the "It's VirtualBox's fault, not on us" to be the conclusion, since filesystem lag can happen for a multitude of reasons.

And if this thread finds some conclusions turning such setups into slower but correct is perhaps the best way to deal with it.

Since there seem to be several instances where a delay and/or retry is added (like the Windows delete issue above), perhaps a more robust filesystem retry could be used in all situations? If it fails, it failed, doesn't always have to be on Windows or on VirtualBox. Detect the failure and retry after a 10ms, then 250ms, then 1s. It will only slow down if it has an issue, so why not?

Thinking about this and as the file-system activity is high for download and extracting an option <tmp-vendor-dir> with a default of <vendor-dir>/composer/ that a user can set to $TMP/composer.XXXX/ or similar for an install could remove the i/o pressure from such shared folders setups - both on host or guest systems.

If the moving fails - or anything from the reasons to have it inside vendor in the first place - you can at least decide which death you'd like to die ;)

100% agree.

@ktomk
Copy link
Contributor

ktomk commented Dec 19, 2021

Detect the failure and retry after a 10ms, then 250ms, then 1s. It will only slow down if it has an issue, so why not?

Because it is a cascading workaround, those normally cause more (and then more) problems. They also often look nice on paper but are a nightmare in practice.

I mean I'm in for all the fun but you normally don't want that in production, this needs real testing during development and continuously in the application life-cycle. Just saying, composer comes with no strings attached, you can do with it whatever you want, so feel free to wrap up an implementation ;)

Jokes aside, you see yourself how hard these timings problems are to deal with just reproducing them is a game of luck and patience. I think the pointers Jordi gave are good and as you actually can reproduce sometimes you're most likely the person who is in the position to get the best insights. If I were you I'd go on with testing and learning about what that lag is in the first place and why the processes are out of sync. Or if not shelling out is the solution in your case as well etc. .

@Sarke
Copy link
Author

Sarke commented Dec 19, 2021

I was just thinking out loud. Using the temp folder like this PR is the ideal solution for me, but it wasn't acceptable, so...

@ktomk
Copy link
Contributor

ktomk commented Dec 19, 2021

It seems like unzip does not allow you to do this at least. (ref)

Double checked and can confirm there is no such command line argument/switch for unzip(1).

However: On a system that supports symbolic links I was able to extract a zip package from Github (1.10.24.zip) into a dedicated composer/composer folder (e.g. like in the vendor dir) by creating a symbolic link to that folder with the name of the first pathname component, composer-1.10.24, pointing to a composer/composer (vendor/package) folder.

Standard unzip operation worked like a breeze, I also tested on a dated alpine container (3.3). This is how I tested.

Instead of moving/renaming - if I understood the original problem right - this is creating a symbolic link before unzip and removing it afterwards. Unzipping then directly goes into the packages subfolder in vendor.

This might not solve any fs latency issues directly but could spare to go into a temporary extract location and moving/copying from there. Does this sound right?

@Seldaek
Copy link
Member

Seldaek commented Dec 19, 2021

@Sarke I am not strictly against changing the temp dir we use, but it has to keep working for other scenarios and not break things to fix the virtualbox use case. What @stof mentioned about different windows drive is one issue, which might be detectable. I wonder if there are other cases where things could go bad. open_basedir restrictions could be a problem.. I am not sure if permissions could be problematic on linux where permissions are inherited from the project dir right now as we extract in vendor, but if extracted in /tmp and then moved over the permissions might suddenly differ.

It's all a rather complex issue touching many different environments, so it's hard to test for everything or even think of all the possible cases.

@ktomk that's an interesting hack, might be workable to do this only on linux, and keep the current extract-then-move for other platforms.

@stof
Copy link
Contributor

stof commented Dec 20, 2021

Be careful when using symlinks: this will change the relative location of the source code compare to the vendor folder, which will break cases where the code does things like require __DIR__. '/../../autoload.php' to require the autoloader (because the actual source code location will no longer be vendor/namespace/package/)

@Seldaek
Copy link
Member

Seldaek commented Dec 20, 2021

The point is to use a symlink to trick unzip into the right path, then you remove the symlink and are left with the real files in vendor/foo/bar, so this is not a concern.

@stof
Copy link
Contributor

stof commented Dec 20, 2021

That trick still requires knowing the name of the root folder, which is currently discovered by composer after unzipping.

@ktomk
Copy link
Contributor

ktomk commented Dec 20, 2021

That trick still requires knowing the name of the root folder, which is currently discovered by composer after unzipping.

I have not yet looked much into the current implementation. Albeit for that it is certainly to extend my test case with an example how to do it on command-line beforehand (I experimented with infozip mode yesterday). From what I've seen, the test-case should also handle at least one of the other unzip utilities (e.g. 7z) as the unpackager entity as far as I've seen is looking for multiple and I'd like to better understand what this means in context of the hack.

@Sarke
Copy link
Author

Sarke commented Dec 21, 2021

That trick still requires knowing the name of the root folder, which is currently discovered by composer after unzipping.

We could use ZipArchive to inspect the zip and get the root folder.

A few other options are to prefer using the 7z binary if it's installed, or prefer tar.gz or tar.bz2 archives, since they all have the "trim root folder" option.

@alikhanich
Copy link

Any news on that?

@ktomk
Copy link
Contributor

ktomk commented Feb 2, 2022

@djirik Not from my end, sorry. I could not even update the test so far, but if, you could have seen that in my fork. Thanks for your interest though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants