-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add *consistent* wildcard support #270
base: master
Are you sure you want to change the base?
Conversation
…ards to match paths with empty path segments
Let me fix the codebase, it's my bad for merging a dependabot PR (which I also had to disable due to sheer volume of noise) which bumped prettier without the concurrent re-formatting. I'll fix that now and you can rebase the changes. Aside from that, I'm less sure about the actual behavior of |
The other thing to keep in mind with Overall the most unfortunate decision in consistent behavior has been the inclusion of magic prefixes which makes all this behavior harder to follow. If it was explicit the confusion would not exist, e.g. |
Also forgot to mention in my initial review but the summary table you created is really awesome regardless of the actual output of the PR 💯 |
@blakeembrey thanks for the responses and follow-up questions.
No, I would say definitely not. In the absence of a wildcard character, a variable should always have a non-empty value, IMO. The reason it makes sense to me to allow empty path segments when a wildcard is specified is that in this case, the intention is clear: devs want to match for anything. So any valid path should match... regardless of whether or not it contains empty path segments. (This is particularly useful, fyi, in routing libraries. See, e.g., react-router or wouter for examples of how they are using the same syntax. It is quite unexpected behavior to specify a 404 for anything that doesn't match the previous routes, only to have a blank page instead of your 404 page show up because the actual url typed in by the user happened to contain an empty path segment.)
But in this case my PR is consistent with v2 thru v5, right? So wouldn't that be fixing a regression?
True, however... the behavior of the WICG PR will do a similar thing by allowing EDIT: and honestly I'm not sure the subdomain issue is much of a problem in the first place, because even though EDIT Thoughts? |
Isn’t that why you’d want the asterisk or (.*) group to make it clear? I don’t completely disagree, but it does make for some changes in other places. E.g. does path matching that makes arrays today leave in the empty holes or remove them? Does the optional delimiter also allow empty? |
Also something worth keeping in mind is whether we actually care about all these edge cases. In theory if you’re allowing repeated delimiters in one place shouldn’t we just allow it everywhere? But that makes for a more complex regex. And I’ve been of the opinion that the simple solution there is to fix it by normalizing paths at the router level before path matching. Arguably even the trailing slash should be at the router level to normalize it first. |
This could be done, however, there are a couple reasons why this would pose a major burden: (1) Most front-end routing libraries (e.g. wouter, react-router) only expose a subset of the path-to-regexp syntax, and that subset does not include custom regexps at all! 99.999% of the routing architecture use-case can be accomplished without custom regexps, so it's a bit awkward to require one solely for the most common use-case in routing.
This is the part I don't really care about too much since I'm doing my own splitting logic when necessary... and only using the path-to-regexp part of the library (usually splitting is not necessary at all since the remaining path represented by the wildcard is either delegated to a subrouter, or sent to a 404 page). I do believe that the question of "how regexp match results are split" should be an entirely separate & independent question from the standardization of the actual generated regexp. That being said, it might make sense to leave in the holes here. Or take them out, either way. My gut tells me to leave them in, that way the original path can be reconstructed by joining the match with the "/" delimiter, rather than being lossy. Since the dev did not specify an overriding pattern, we're simply making the default pattern be
No. The use-case for the optional quantifier is pretty much 100% across the board either an optional path segment or an optional affix of a single path segment. Therefore, the optional quantifier should not imply a different default pattern than that of a mandatory path segment variable. The use-cases where a dev would want to match a variable or empty string in the former case are basically non-existent, since nobody writes their routes that way (for a very good reason). And if they did, they could simply use 2 patterns: 1 for the non-empty case (e.g.
That's not a robust solution though, because sometimes empty path segments actually mean something. (E.g. https://en.wikipedia.org/wiki///). If you normalize prior to putting the URL through the router, then you are arbitrarily deleting information from the URL that might indeed be used by subrouters. (If only to simply display the correct error message to the user, e.g. "404 path suffix foo////bar was not found in the directory!") This poses a particular problem for generic routing libraries, who cannot make the assumption that the user's routes can be normalized without changing the semantics. |
@blakeembrey P.S. -- if you get a chance and haven't already, please do check out the original issue I filed in URLPattern: whatwg/urlpattern#163. In that issue, I go a lot more in-depth into all the pros & cons that I see here, as well as listing out tables of possible different behaviors. (The behavior I've implemented in this PR is equivalent to "Proposed Behavior 1" in that issue.) |
I have added the tests from this PR description to the library in cb27d37 to be released in v7. Unfortunately I haven't supported The approach I took to get as close as possible was to allow a set of characters to repeated indefinitely. In this case just |
@blakeembrey Cool stuff! I made sure all the tests passed whenever I wrote this PR (if I recall correctly). Which test(s) was failing that you wanted to succeed with the standalone |
It's mostly the decoding of repeat parameters, I had to make it a regex match to get each sub-section due to the split now being regex-like, and I'm more in the camp of treating On the other hand, I'm trying to recall conversations with @wanderview on this and IIRC the Finally, on that note, I'm also wondering out loud whether trying to make these two things reconcile is a waste and instead I should just support the basics expected of glob matching such as |
@blakeembrey gotcha! Am I correct in stating then, that the current behavior you've implemented should nearly* match my "Proposed Behavior 2"? If that's the case, obviously that works for me since I proposed it as the minimal necessary change to ensure basic consistency. My perspective would be, it makes perfect sense to go with "Proposed Behavior 2" if you are planning on adding glob-matching-consistent support for (In either direction you go, I believe
|
So currently all I've done is add an option to make So I believe it currently matches "Proposed Behavior 1" as a result of that. I do want to make sure encoding/decoding works with whatever I do, so having But if they are going to behave differently, I am wondering if it should just match glob semantics that people already know, allowing for it to be used as |
@blakeembrey ahhh, after running through all my tests with the new behavior, I see what you are saying. First of all, amazing work!!! Except for that one test case ( Second thing: it's worth noting that, for that one failing test case, the results in the current behavior are consistent between named and anonymous wildcard groups. I.e.: However, I think it would be easy enough to fix this in both cases (should you want to) simply by prefixing that component of the regex by something like the following:
i.e., throwing away any preceding delimiters if they occur at the beginning of the string (in which case they are not handled by the previous path segment's regexp simply because there is no previous path segment regexp to strip them away). Thoughts? EDIT: Or, on further inspection: I took a look at the generated output for
This is very similar to the regex for
So, it seems it would suffice to simply change the first capturing group in the list to be just like the others, except with
or, even just preceding the whole expression with an optional slash, similar to how the expression is suffixed with an optional slash, i.e:
which can be simplified, by the way, to:
|
I considered this but it's a little odd to me. No strong opinions, but it does tend to feel like To you, is there any difference at the point of the above between |
@blakeembrey I edited my above comment to add a few more suggestions. I nearly agree with you: Shouldn't My strong opinion is that if My weaker opinion is that they should both give consistent matching results. But if in this one case, (Note: I think this choice would also apply to all paths of the form |
No good reason. It just feels kind of janky for that to behave differently to |
I think I'm seeing |
@blakeembrey Those two, unlike
Having said all that, I do not have a strong opinion on this point after all, since I realized that this distinction will never come up for me anyways in real life... given that all URL paths always start with a forward slash (and therefore I can always pre-normalize path patterns to include a preceding forward-slash if they are missing one prior to generating the regexp). Making the corresponding root-level forward slash optional in the path pattern would be convenient since I could then omit all the first forward slashes from patterns and they'd still match the right paths, and it would boost that consistency between (Actually, probably the biggest tangible bonus I can think of to doing that would be to simplify the documentation: you could just say that But I think either way would work as both solve all of the actual, tangible use-cases here. |
@blakeembrey to illustrate what I mean above: const url = new URL('https://example.com');
url.pathname = 'any'; // No forward slash!
console.log(url.pathname); prints:
To summarize: since a pathname without a leading forward slash is illegal, one behavior that would make sense (and bring us to 100% consistency) would be that any path pattern without a leading forward slash would assume that it should be lenient towards the leading forward slash that will (necessarily) be present, since that's the only possible semantics it could have other than "match no path at all". |
Good call, I'll think about it today. There are use-cases outside of paths but it could be a config option. On a final note, if we did have full consistency with that change, how do you feel about |
@blakeembrey I'm not sure what Or are you talking about as they occur inside the generated regexp? What is the context in which those would be used? |
Oh, the simple change in my PR was that all modifiers "just worked" as is. So it's the same behavior as if it's |
PR example: 246ec9f I've since reverted trying to think over how |
@blakeembrey I think I would need a specific example of a path with one of those syntaxes in it to say for sure, but if you're talking something like:
(Not to mention, then |
Maybe it's not clear, it's just the same as |
Ahhhhh, I see what you mean. I thought you were going for That makes perfect sense to me. For example, currently I would accomplish the same thing via:
There are probably not as many use-cases for these versions though... but if it's simpler to implement, that is certainly compelling... simplicity is often the best strategy. One thing I would mention is that |
On further thought, if you did go that route, I would advise only doing it for That is in part due to the potential confusion I mentioned above, but also in part due to a different inconsistency: we'd have unnamed wildcards, unnamed optional segments, but no unnamed required segments. So you'd almost have to add a standalone In general, I think the only one this is actually important for is |
Thanks, I agree with your assessment, it’s part of why I’ve been thinking the asterisk should just follow the URLPattern route, although everything can pass your tests. It’s also why I was thinking about other existing matching schemes, like globs. |
@blakeembrey on further further thought (sorry for my many iterations here), I would recommend not doing it for anything except
That being said, still think it's important that And if you decide to make that one further tweak above concerning leniency to the root leading slash (bringing the other operators into alignment with |
I realize that there is already a PR to add wildcard support. However, that PR is going to add some pretty big inconsistencies between named and anonymous wildcard groups, in a very confusing way. See: whatwg/urlpattern#163
The response over there so far has been "too bad, so sad, who cares."
So I'm taking the initiative to create this PR to at least demonstrate that there is a way forward with wildcard support that actually makes sense.
Why do I care? Because once URLPattern comes out of experimental status, we will be stuck with the choices we make today, FOREVER. If the
path-to-regexp
library itself chooses the right way forward, then maybe URLPattern will follow suit, since it seems to care a lot about compatibility with this library.How did I go about making the two behaviors compatible with each other? Simply by not disallowing empty path segments in named wildcard groups, and not requiring a trailing slash in anonymous wildcard groups.
Hopefully my efforts here at least give some people food for thought.
P.S. the pre-commit hook that added 600 lines to this PR is really annoying.
P.P.S. Fixes #214. Also fixes or re-fixes the previously closed issues #228, #212, #196, #194 (for the case where it makes sense), #103, #87, and #37.
Detailed breakdown:
/files/:path*\.:ext*
/:everything*
abc/:everything*
#/*
*
/foo/:bar*
/entity/:id/*
/entity/:id/*
/test/*
/some-path/*
(✅ = match; ❌ = no match; 💀 = parse error; all tests run in strict mode)