Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Route requests for missing static files using remote asset metadata #1417

Merged
merged 18 commits into from
May 30, 2024

Conversation

brandonpayton
Copy link
Member

@brandonpayton brandonpayton commented May 17, 2024

What is this PR doing?

This PR:

  • Includes a list of remote asset paths with minified WP builds
  • Updates request handling to only handle missing files as static when they are explicitly listed as remote assets
    (perhaps this should be split into multiple PRs)
  • Updates request routing to behave more like a regular web server

Related to #1365 in that we need better information to judge whether to handle a request for a missing static file as PHP.

What problem is it solving?

Prior to this PR, we could not easily tell whether we should request a missing static asset from the web or delegate the request to PHP.

How is the problem addressed?

By including a list of remote asset paths with minified WP builds, we can judge which missing static file paths represent remote assets and which can be delegated to PHP.

Testing Instructions

  • CI tests
  • Examine a minified WP zip and the contents of its playground-remote-asset-paths file
  • Test manually with npm run dev and observe that Playground loads normally with no unexpected errors in the console

@brandonpayton brandonpayton added [Type] Bug An existing feature does not function as intended [Feature] PHP.wasm [Package][@php-wasm] Universal labels May 17, 2024
@brandonpayton brandonpayton self-assigned this May 17, 2024
@brandonpayton
Copy link
Member Author

Apparently, a commit hook auto-committed a bunch of other changes that I had left outstanding. I will fix this and force push shortly.

@brandonpayton
Copy link
Member Author

Apparently, a commit hook auto-committed a bunch of other changes that I had left outstanding. I will fix this and force push shortly.

No, I think that was me with an errant git add -u. I thought I'd seen some automated behavior like this in the past, but maybe I was wrong then as well. 🙃

@brandonpayton
Copy link
Member Author

The minified WP builds need to be updated as part of this PR. I had some trouble rebuilding them locally because the command in the README didn't produce updated zips.

I ended up running the following commands to rebuilds the zips:

  • npx nx run playground-wordpress-builds:bundle-wordpress:major-and-beta
  • npx nx run playground-wordpress-builds:bundle-wordpress:nightly

This didn't update the zip for WP 6.1. It looks like we're still showing that as an option on playground.wordpress.net, but we only build the 3 previous major versions. Do we need to manually remove WP 6.1?

}

#seemsLikeAPHPFile(path: string) {
return path.endsWith('.php') || path.includes('.php/');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should .php/ also check for endsWith? This would prevent paths like /test.php/index.js from being executed.

Suggested change
return path.endsWith('.php') || path.includes('.php/');
return path.endsWith('.php') || path.endsWith('.php/');

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that check is wrong, but I was leaving it for a follow up PR because it is fixing a different problem from this PR. Originally, I'd actually removed that check altogether, which is also the wrong thing to do.

I have a note to make a different PR for this and will make sure the something.php/ case is tested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a note to make a different PR for this and will make sure the something.php/ case is tested.

Nevermind, we can go ahead and refactor everything here.

Comment on lines 187 to 206
// TODO: Move file to /internal?
// TODO: This feels like some kind of metadata we mentioned when discussing boot protocol
const remoteAssetListPath = `${requestHandler.documentRoot}/playground-remote-asset-paths`;
if (primaryPhp.fileExists(remoteAssetListPath)) {
const remoteAssetPaths = primaryPhp
.readFileAsText(remoteAssetListPath)
.split('\n');
requestHandler.addRemoteAssetPaths(remoteAssetPaths);
// TODO: Delete the listing file?
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes to all TODOs. It's a custom file that's not part of WP and we should keep it in internal.

Deleting the file should help with preventing issues when the WP structure is updated, for example after a WP update, or version switch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes to all TODOs. It's a custom file that's not part of WP and we should keep it in internal

Sounds good!

# List assets that will be available to remotely so Playground can judge whether
# to request a remote asset or delegate the request for a missing file to PHP.
# TODO: What is a better name for this file?
RUN find wordpress-static -type f | sed 's#^wordpress-static/##'> playground-remote-asset-paths
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wordpress-remote-asset-paths?

Suggested change
RUN find wordpress-static -type f | sed 's#^wordpress-static/##'> playground-remote-asset-paths
RUN find wordpress-static -type f | sed 's#^wordpress-static/##'> wordpress-remote-asset-paths

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But playground-remote-asset-paths also works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 The main point was to communicate that this is a special setup for minified WP builds, and it seemed a playground-specific thing to me. If I think about someone listing the files in one of the zips, I think it is clearer to see "playground-" than "wordpress-" when they ask "What is this thing?".

I was on the fence about this, but after talking it out, I think I'll leave it alone unless someone else feels strongly otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I save this comment and then think the opposite. Within playground, I would rather read /internal/wordpress-remote-asset-paths, so let's go with that.

Copy link
Collaborator

@adamziel adamziel May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is only useful on the web and only for some WordPress builds but not for others. Also, the "active" WordPress build may change in runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so it sounds like it belongs with some kind of Wordpress-associated metadata.

I'll start by acting as if WP cannot be swapped out and address the request handling behavior and then address the possibility of WP switching, all within the scope of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "active" WordPress build may change in runtime.

@adamziel it looks like the web interface reloads the page every time the config changes. Where can the "active" WordPress build change anytime during runtime? Is this something that can happen with wp-now without reloading everything? (Planning to look further but wanted to get the question out there)

@@ -313,7 +316,7 @@ export class PHPRequestHandler<PHP extends BasePHP> {
this.rewriteRules
);
const fsPath = joinPaths(this.#DOCROOT, normalizedRequestedPath);
if (!seemsLikeAPHPRequestHandlerPath(fsPath)) {
if (!(await this.#seemsLikeAPHPRequestHandlerPath(fsPath))) {
Copy link
Collaborator

@adamziel adamziel May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I'd refactor the routing logic completely. There's no more need for "seems" functions – we can resolve the URL to something like { "filePath": string, "isRemote": boolean } with 100% certainty. Then we'd test whether filePath is a file with .php extension. A requested path like /index.php/a/b/c could mean different things, e.g. I'm requesting a static file called "c" that's inside a directory tree like $DOCROOT/index.php/a/b/, or I'm requesting index.php and /a/b/c is my "nice URL". Let's do what a web server would do here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do a full refactor let's also take this use case into account. , I started working on it, but lacked the metadata to complete it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do what a web server would do here.

Right. Let's get it done. I'll work on that.

If we do a full refactor let's also take this use case into account.

Got it. Hopefully that will just naturally work after the refactor, but we can test it.

@brandonpayton
Copy link
Member Author

Here is a rough outline of how I think the handling should work:

let fsPath = joinPaths(this.#DOCROOT, normalizedRequestedPath);

// Simply handle PHP as PHP
if (fsPath.endsWith('.php')) {
	if (primaryPhp.isDir(fsPath)) {
		fsPath = joinPaths(fsPath, 'index.php');
	}

	if (primaryPhp.fileExists(fsPath)) {
		// TODO: Serve as PHP
	} else {
		// TODO: 404
	}
}

// Try as regular file
if (primaryPhp.fileExists(fsPath) && primaryPhp.isFile(fsPath)) {
	// TODO: Serve static file
} else if (this.#remoteAssetPaths.has(fsPath)) {
	// TODO: Request and serve static file
}

// Try as directory with default index.php
const localDefaultFile = joinPaths(fsPath, 'index.php');
if (
	primaryPhp.isDir(fsPath) &&
	primaryPhp.fileExists(localDefaultFile) &&
	primaryPhp.isFile(localDefaultFile)
) {
	// TODO: Serve as PHP
}

// TODO: Delegate request to WordPress /index.php, retaining query args and relaying original request URI

I still need to work through:

  • how to actually dispatch PHP, delegating to WordPress /index.php while relaying original request URI and query args
  • whether we want a different interface for identifying remote assets in PHPRequestHandler
  • flagging remote assets for retrieval

@adamziel
Copy link
Collaborator

It may require going left to right segment by segment until you reach a file, e.g. in /wp-content/index.php/a/b it would enter wp-content, find the index.php file, and stop resolving there. However, if index.php was a directory, it would go in and look for a node called a.

@brandonpayton
Copy link
Member Author

It may require going left to right segment by segment until you reach a file, e.g. in /wp-content/index.php/a/b it would enter wp-content, find the index.php file, and stop resolving there. However, if index.php was a directory, it would go in and look for a node called a.

I have been working based on this bit of Nginx config which I do not believe does anything as complicated as that:
https://github.com/WordPress/Advanced-administration-handbook/blob/0f491e44c719abdf73989b278fc2bec8e4a4d03a/server/nginx.md#L162-L173

    location / {
        # This is cool because no php is touched for static content.
        # include the "?$args" part so non-default permalinks doesn't break when using query string
        try_files $uri $uri/ /index.php?$args;
    }

    location ~ \.php$ {
        #NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini
        include fastcgi.conf;
        fastcgi_intercept_errors on;
        fastcgi_pass php;
    }

That is not to say we cannot do something more involved. But, as an example, WP Cloud doesn't directly run /test.php for the path /test.php/a/b. But it does delegate the request for /test.php/a/b to WordPress, which is what I believe yesterday's sample logic would do.

@brandonpayton
Copy link
Member Author

Hmm, unlike yesterday's sample logic, WP Cloud does delegate requests for missing PHP files to WordPress.

I don't think the WP.org's example nginx config would do that though for .php files. I plan to test this later.

@brandonpayton
Copy link
Member Author

brandonpayton commented May 21, 2024

I updated the routing logic in 3f69fdc to do roughly the following:

if dir URI:
    set PATH to default PHP file within that dir

if file has .php extension
    if file exists at PATH
        return PHP response
else
    if file exists at PATH
        return static file response
    if PATH is known remote asset
        return indication that path describes a remote asset

delegate all unhandled requests to WordPress /index.php

This behavior is close to this bit of the WordPress.org nginx example

    location / {
        # This is cool because no php is touched for static content.
        # include the "?$args" part so non-default permalinks doesn't break when using query string
        try_files $uri $uri/ /index.php?$args;
    }

    location ~ \.php$ {
        #NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini
        include fastcgi.conf;
        fastcgi_intercept_errors on;
        fastcgi_pass php;
    }

But the implementation in this PR differs from the above config, unless I'm mistaken, because non-existent PHP files are delegated to WordPress. (In the config above, the location ~\.php block is given priority by nginx because it uses a regex, so missing.php should match that location and not match the location block where it tries paths and falls back to /index.php.) In my tests, WP Cloud does delegate missing PHP files to WordPress. This is more flexible behavior than failing outright, so I've adopted it here.

@adamziel I am not familiar with serving URLs like /something/index.php/a/b by running /something/index.php, at least not for WordPress. Is this needed?

So far, there are two main things left to do in this PR:

  • Review existing tests and explicitly test all cases of this routing logic.
  • Fix PR to handle switching WordPress versions.

@brandonpayton
Copy link
Member Author

One more note:
With this updated routing logic, all existing tests pass, and Playground seems to load without any issues.

@brandonpayton
Copy link
Member Author

Regarding sitemap.xml:

  • Pretty permalinks must be enabled for things like sitemap.xml to be provided by WordPress
  • Prior to these changes, requesting sitemap.xml through the top-level Playground web UI would 404 and show a blank page with an error logged to the console
  • With these changes and pretty permalinks enabled, it delegates the request to WordPress and renders the response

Screenshot 2024-05-21 at 1 56 14 PM

@brandonpayton
Copy link
Member Author

Regarding sitemap.xml:

Pretty permalinks must be enabled for things like sitemap.xml to be provided by WordPress

One thing I'm wondering:
AFAICT, the only way that WordPress actually returns a 404 status when handling a random, nonexistent request URI is to have pretty permalinks enabled. When those are enabled, WP responds with a pretty 404 page and an actual 404 status.

When plain permalinks are selected and WordPress is handling a random, non-existent request URI, it responds with the home page and a 200, all without a redirect.

@adamziel @bgrgicak would it make sense to default WordPress within Playground to use pretty permalinks? Otherwise, all requests for non-existent files that we dedicate to WordPress will receive default homepage HTML and a 200 response status. It seems safer and clearer to actually have WordPress respond with 404s.

@brandonpayton
Copy link
Member Author

Given #1454 "Add missing trailing slash on landingPage wp-admin path", I think we should probably be responding with a redirect when a URI points to a directory without a trailing slash.

It is no longer needed because request handler auto-redirects
to trailing slash paths when the path points to an existing
directory.
@brandonpayton
Copy link
Member Author

Given #1454 "Add missing trailing slash on landingPage wp-admin path", I think we should probably be responding with a redirect when a URI points to a directory without a trailing slash.

I added logic to redirect to add a trailing slash when an existing directory is requested without one. And the tests are now updated to cover the refactored routing logic.

The main thing remaining is to understand how the active WordPress build can change at runtime (as @adamziel mentioned) and accommodate that in how we maintain the collection of known remote asset paths.

I guess the active WordPress build can change as simply as the filesystem can be changed over time. But are there other ways the entire WordPress version can change? Can multiple scopes use the same PHP request handler? These are the remaining questions for this PR.

cc @bgrgicak

@bgrgicak
Copy link
Collaborator

@adamziel @bgrgicak would it make sense to default WordPress within Playground to use pretty permalinks? Otherwise, all requests for non-existent files that we dedicate to WordPress will receive default homepage HTML and a 200 response status. It seems safer and clearer to actually have WordPress respond with 404s.

I would prefer to stick to the default WP behavior for permalink structures.

Does WP on Apache/NGINX also return 200?

@brandonpayton
Copy link
Member Author

brandonpayton commented May 23, 2024

Does WP on Apache/NGINX also return 200?

Yes, this is my experience on WP Cloud as well when no permalink structure ("plain" permalinks) is selected.

I would prefer to stick to the default WP behavior for permalink structures.

It turns out pretty permalinks are the default during WP install:
https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-admin/includes/upgrade.php#L471

That is good because features like sitemap.xml generation appear to depend on them being enabled.

Unfortunately, WP in Playground does not default to pretty permalinks, and I don't yet know why.

WP install does issue a request against the site to check whether pretty permalinks can be enabled by default:
https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-admin/includes/upgrade.php#L516-L522

I think that would require the changes in this PR to work. But it doesn't default to pretty permalinks with these changes either. Perhaps we're not running WP install yet or it runs differently than usual.

I made a note to file an issue for this if it isn't resolved in this PR.

@@ -24,7 +24,7 @@ describe('Blueprint step installPlugin', () => {
it('should log the user in', async () => {
await login(php, {});
const response = await requestHandler.request({
url: '/wp-admin',
url: '/wp-admin/',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the test because the handler now returns a redirect when the request URI identifies an existing dir without including a trailing slash.

@@ -49,14 +49,6 @@ export async function setupPlatformLevelMuPlugins(php: UniversalPHP) {
await php.writeFile(
'/internal/shared/mu-plugins/0-playground.php',
`<?php
// Redirect /wp-admin to /wp-admin/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should no longer be necessary since the request handler now redirects to add a trailing slash for existing dirs.

@brandonpayton
Copy link
Member Author

I merged the latest from trunk and adapted these changes to work within the new bootWordPress() function. If I understand that function correctly, we get a new PHPRequestHandler each time it is called, so it seems like that should be enough to update the list of remote assets each time a new WordPress build is selected. If a build doesn't include a list of remote assets, we don't attempt to read it and assume all assets are included in the zip.

I believe this is ready for review.

@brandonpayton brandonpayton marked this pull request as ready for review May 24, 2024 04:37
@brandonpayton brandonpayton requested a review from a team as a code owner May 24, 2024 04:37
Copy link
Collaborator

@bgrgicak bgrgicak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I try to open an unexisting static file like /sitemap.xml this PR will redirect to /sitemap.xml/ while trunk will return 404.

Ideally static files that don't exist should execute/index.php, but the path should stay unchanged.

Comment on lines 334 to 336
if (primaryPhp.isFile(localDefaultPath)) {
fsPath = localDefaultPath;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for index.html if index.php doesn't exist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a good idea. I don't know of a reason we wouldn't want to serve index.html when there is no index.php.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in f655598

@brandonpayton
Copy link
Member Author

If I try to open an unexisting static file like /sitemap.xml this PR will redirect to /sitemap.xml/ while trunk will return 404.

Ideally static files that don't exist should execute/index.php, but the path should stay unchanged.

This appears to be default WP behavior when plain permalinks are selected. The same occurs when testing from the WP codebase using npm run env:start.

Copy link
Collaborator

@adamziel adamziel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a nitpick that can be either addressed here or in a follow-up PR alongside removing the /wp-admin redirect from platform mu-plugins. Otherwise this looks good. Thank you!


// We can only satisfy requests for directories with a default file
// so let's first resolve to a default path when available.
if (primaryPhp.isDir(fsPath)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a more extensive note explaining how this is default WP behavior - it's not obvious this logic is the right one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a more extensive note explaining how this is default WP behavior - it's not obvious this logic is the right one.

Good point! Added this:

// Ensure directory URIs have a trailing slash. Otherwise,
// relative URIs in index.php or index.html files are relative
// to the next directory up.
//
// Example:
// For a request to "/wp-admin", the relative link "edit.php"
// resolves to "/edit.php" rather than "/wp-admin/edit.php".
//
// This is correct behavior for the browser:
// https://www.rfc-editor.org/rfc/rfc3986#section-5.2.3
//
// But the intent for `/wp-admin/index.php` is that its relative
// URIs are relative to `/wp-admin/`.
//
// In fact, WordPress also redirects like this when given a chance.
// - https://github.com/WordPress/wordpress-develop/blob/b6a3b9c7d1ce33cbeca6f95871a26c48141e524b/src/wp-includes/canonical.php#L696
// - https://github.com/WordPress/wordpress-develop/blob/b6a3b9c7d1ce33cbeca6f95871a26c48141e524b/src/wp-includes/canonical.php#L1036-L1045
// - https://github.com/WordPress/wordpress-develop/blob/b6a3b9c7d1ce33cbeca6f95871a26c48141e524b/src/wp-includes/link-template.php#L3558

@brandonpayton brandonpayton merged commit 36b126e into trunk May 30, 2024
5 checks passed
@brandonpayton brandonpayton deleted the list-remote-static-files branch May 30, 2024 22:47
brandonpayton added a commit that referenced this pull request May 30, 2024
brandonpayton added a commit that referenced this pull request May 30, 2024
…tadata (#1417)" (#1474)

## What is this PR doing?

This reverts commit 36b126e.

## What problem is it solving?

When using browser storage and reloading Playground after an initial
load, WordPress renders as if it cannot find remote CSS assets.

![Screenshot 2024-05-31 at 1 16
58 AM](https://github.com/WordPress/wordpress-playground/assets/530877/50517355-9e80-4379-8593-00ef48a218a9)

The issue should be something small, but I am reverting until we can fix
it.
brandonpayton added a commit that referenced this pull request Jun 6, 2024
…1417)

This PR:
- Includes a list of remote asset paths with minified WP builds
- Updates request handling to only handle missing files as static when
they are explicitly listed as remote assets
(perhaps this should be split into multiple PRs)
- Updates request routing to behave more like a regular web server

Related to #1365 in that we need better information to judge whether to
handle a request for a missing static file as PHP.

Prior to this PR, we could not easily tell whether we should request a
missing static asset from the web or delegate the request to PHP.

By including a list of remote asset paths with minified WP builds, we
can judge which missing static file paths represent remote assets and
which can be delegated to PHP.

- CI tests
- Examine a minified WP zip and the contents of its
wordpress-remote-asset-paths file
- Test manually with `npm run dev` and observe that Playground loads
normally with no unexpected errors in the console
brandonpayton added a commit that referenced this pull request Jun 6, 2024
…1417)

This PR:
- Includes a list of remote asset paths with minified WP builds
- Updates request handling to only handle missing files as static when
they are explicitly listed as remote assets
(perhaps this should be split into multiple PRs)
- Updates request routing to behave more like a regular web server

Related to #1365 in that we need better information to judge whether to
handle a request for a missing static file as PHP.

Prior to this PR, we could not easily tell whether we should request a
missing static asset from the web or delegate the request to PHP.

By including a list of remote asset paths with minified WP builds, we
can judge which missing static file paths represent remote assets and
which can be delegated to PHP.

- CI tests
- Examine a minified WP zip and the contents of its
wordpress-remote-asset-paths file
- Test manually with `npm run dev` and observe that Playground loads
normally with no unexpected errors in the console
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] PHP.wasm [Package][@php-wasm] Universal [Type] Bug An existing feature does not function as intended
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants