Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New 'completions' command, and various tab-completion fixes and improvements #649

Open
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

phil-s
Copy link

@phil-s phil-s commented Aug 15, 2020

This branch implements the new command completions for issue #555,
and also fixes most of the pre-existing problems I encountered while testing
the command and my wrapper. The fixes may well resolve some of the other
tab-completion issues which have been raised previously.

The command is not enabled by default; I'm enabling it in my config file by setting:

  // Make the 'completions' command available.
  'commands' => class_exists('\Psy\Command\CompletionsCommand')
    ? [ new \Psy\Command\CompletionsCommand, ]
    : [],

There are plenty of changes here, but I've kept the individual commits
very focused, so hopefully it's all fairly easy to review.


The Psy\TabCompletion\Matcher systems are doing smart things with PHP
tokens, but are also regularly getting tripped up by them and failing to
offer any completions at all (while sometimes simultaneously treating huge
numbers of irrelevant thing as completion options in the background).

The completion process starts with:

$tokens = \token_get_all('<?php ' . $line);

And then generally the last token is popped from the array and checked to
see whether it could be a valid token for that Matcher's interests, and the
problem is that these tests consider a very limited set of tokens, which
then excludes the large number of PHP keywords which (a) have their own
token, and (b) are completely valid prefixes for an as-yet incomplete
identifier.

So for instance, if you defined a function abstraction() and then try to
use tab completion to input that name, you can complete from "abstrac" and
"abstracti", but "abstract" gives you no results, because that's the token
T_ABSTRACT, and FunctionsMatcher::hasMatched(), like most of the
matchers, ignores text which wasn't parsed as T_STRING.

I think the following would be the current list of un-completable
identifiers, but obviously it's subject to change as the language evolves,
so maintaining a big whitelist doesn't seem like the way forward.

T_ABSTRACT                      abstract
T_LOGICAL_AND                   and
T_ARRAY                         array
T_AS                            as
T_BREAK                         break
T_CALLABLE                      callable
T_CASE                          case
T_CATCH                         catch
T_CLASS                         class
T_CLONE                         clone
T_CONST                         const
T_CONTINUE                      continue
T_DECLARE                       declare
T_DEFAULT                       default
T_EXIT                          die
T_DO                            do
T_ECHO                          echo
T_ELSE                          else
T_ELSEIF                        elseif
T_EMPTY                         empty
T_ENDDECLARE                    enddeclare
T_ENDFOR                        endfor
T_ENDFOREACH                    endforeach
T_ENDIF                         endif
T_ENDSWITCH                     endswitch
T_ENDWHILE                      endwhile
T_EVAL                          eval
T_EXIT                          exit
T_EXTENDS                       extends
T_FINAL                         final
T_FINALLY                       finally
T_FN                            fn
T_FOR                           for
T_FOREACH                       foreach
T_FUNCTION                      function
T_GLOBAL                        global
T_GOTO                          goto
T_IF                            if
T_IMPLEMENTS                    implements
T_INCLUDE                       include
T_INCLUDE_ONCE                  include_once
T_INSTANCEOF                    instanceof
T_INSTEADOF                     insteadof
T_INTERFACE                     interface
T_ISSET                         isset
T_LIST                          list
T_NAMESPACE                     namespace
T_NEW                           new
T_LOGICAL_OR                    or
T_PRINT                         print
T_PRIVATE                       private
T_PROTECTED                     protected
T_PUBLIC                        public
T_REQUIRE                       require
T_REQUIRE_ONCE                  require_once
T_RETURN                        return
T_STATIC                        static
T_SWITCH                        switch
T_THROW                         throw
T_TRAIT                         trait
T_TRY                           try
T_UNSET                         unset
T_USE                           use
T_VAR                           var
T_WHILE                         while
T_LOGICAL_XOR                   xor
T_YIELD                         yield

That's 67 tokens (which is nearly 50%). The remaining 72 are:

T_AND_EQUAL                     &=
T_ARRAY_CAST                    (array)
T_BAD_CHARACTER
T_BOOLEAN_AND                   &&
T_BOOLEAN_OR                    ||
T_BOOL_CAST                     (bool) or (boolean)
T_CHARACTER
T_CLASS_C                       __CLASS__
T_CLOSE_TAG                     ?> or %>
T_COALESCE                      ??
T_COALESCE_EQUAL                ??=
T_COMMENT                       // or #, and /* */
T_CONCAT_EQUAL                  .=
T_CONSTANT_ENCAPSED_STRING      "foo" or 'bar'
T_CURLY_OPEN                    {$
T_DEC                           --
T_DIR                           __DIR__
T_DIV_EQUAL                     /=
T_DNUMBER                       0.12, etc.
T_DOC_COMMENT                   /** */
T_DOLLAR_OPEN_CURLY_BRACES      ${
T_DOUBLE_ARROW                  =>
T_DOUBLE_CAST                   (real), (double) or (float)
T_DOUBLE_COLON                  ::
T_ELLIPSIS                      ...
T_ENCAPSED_AND_WHITESPACE       " $a"
T_END_HEREDOC
T_FILE                          __FILE__
T_FUNC_C                        __FUNCTION__
T_HALT_COMPILER                 __halt_compiler
T_INC                           ++
T_INLINE_HTML
T_INT_CAST                      (int) or (integer)
T_IS_EQUAL                      ==
T_IS_GREATER_OR_EQUAL           >=
T_IS_IDENTICAL                  ===
T_IS_NOT_EQUAL                  != or <>
T_IS_NOT_IDENTICAL              !==
T_IS_SMALLER_OR_EQUAL           <=
T_LINE                          __LINE__
T_LNUMBER                       123, 012, 0x1ac, etc.
T_METHOD_C                      __METHOD__
T_MINUS_EQUAL                   -=
T_MOD_EQUAL                     %=
T_MUL_EQUAL                     *=
T_NS_C                          __NAMESPACE__
T_NS_SEPARATOR                  \
T_NUM_STRING                    "$a[0]"
T_OBJECT_CAST                   (object)
T_OBJECT_OPERATOR               ->
T_OPEN_TAG                      <?php, <? or <%
T_OPEN_TAG_WITH_ECHO            <?= or <%=
T_OR_EQUAL                      |=
T_PAAMAYIM_NEKUDOTAYIM          ::
T_PLUS_EQUAL                    +=
T_POW                           **
T_POW_EQUAL                     **=
T_SL                            <<
T_SL_EQUAL                      <<=
T_SPACESHIP                     <=>
T_SR                            >>
T_SR_EQUAL                      >>=
T_START_HEREDOC                 <<<
T_STRING                        parent, self, etc.
T_STRING_CAST                   (string)
T_STRING_VARNAME                "${a
T_TRAIT_C                       __TRAIT__
T_UNSET_CAST                    (unset)
T_VARIABLE                      $foo
T_WHITESPACE                    \t \r\n
T_XOR_EQUAL                     ^=
T_YIELD_FROM                    yield from

We need all the Matchers to stop caring whether or not the last token in the
list was parsed as T_STRING (in particular), because implicitly that token
is not yet complete, and therefore the parser can't know what it's
supposed to be. They should ignore the token type, and instead just check
that text to see whether it would be valid as the prefix of an identifier,
which we can do by matching against the CONSTANT_SYNTAX regexp[1].


In addition to the bug where no completions are supplied, the token
behaviour can also do something akin to the opposite, and cause certain
matchers to return everything they know about as a completion, whether or
not it matches the initial input.

This happens when a matcher's getMatches() method calls
AbstractMatcher::getInput() which firstly sets $var to an empty string,
but then (on account of tokens) never gets to change it to anything else.

getMatches() then 'filters' All Of The Things It Knows About using
AbstractMatcher::startsWith(), to find which of those things begins with
an empty string. That method happily agrees that everything starts with
an empty string, and so All Of The Things are merged into the set of
completions.

You can observe this by adding print_r($matches); (or other logging) to
AutoCompleter::processCallback() right before it returns, and then
attempting completion. var or $var for example.

GNU Readline is masking this bug by not presenting any of the candidates
which do not actually start with the original incomplete word/prefix[2],
making it seem as if the right thing is happening behind the scenes; but the
new 'completions' command would list all of them, which is undesirable.

(We could make the command do its own additional filtering step, but it's
better to produce the correct set of candidates in the first place.)


Finally, there's a lot of complexity (and consequently some bugs) caused by
not normalising the token sequence to ensure that the Thing To Be Completed
is always the final token. At present, if the user effectively
tab-completes at an empty string, then we end up with a token sequence
ending with the previous (and already-complete) token. Some of the
matchers are attempting to handle this by looking for tokens-of-interest in
both the current and the previous token; but it greatly simplifies the
logic if we instead ensure that the token sequence is more consistent, so
that if we're completing an empty string then the token sequence simply ends
with an empty string 'token'.


[1] I do have one question about CONSTANT_SYNTAX and VAR_SYNTAX:

https://www.php.net/manual/en/language.variables.basics.php gives a
near-identical pattern (and suggests that this applies to all PHP
identifiers, rather than just variables); but your pattern includes the
character 0x7f (DEL), whereas the one in the manual starts that range from
0x80. I'm not sure whether this indicates a change/fix to the manual since
the code was written, or if you've intentionally added that DEL char to your
regexps (but I can't think why that would be).

const CONSTANT_SYNTAX = '^[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$';
const VAR_SYNTAX = '^\$[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$';

The manual says:

Variable names follow the same rules as other labels in PHP. A valid
variable name starts with a letter or underscore, followed by any number
of letters, numbers, or underscores. As a regular expression, it would be
expressed thus: ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$

Note: For our purposes here, a letter is a-z, A-Z, and the bytes from 128
through 255 (0x80-0xff).

So this may be a bug in the current regexps?


[2] What the incomplete word/prefix will actually be is another source of
confusion. AFAICS, PHP's readline support provides no control over the set
of characters which act as word breaks for the purpose of completing the
current 'word'; and the defaults we're stuck with produce some fairly
inconsistent behaviours for completing PHP code.

The current code was handling things appropriately, but there was absolutely
no documentation of the reasons why certain things were the way they were.
I ended up spending a bunch of time trying to make things more consistent
for the new 'completions' command, only to find that those changes didn't
work in readline itself.

This was a documentation problem in large part (and the new documentation
will be beneficial to anyone looking to hack on this code in the future),
but we also need to define those word break characters in the code so that
the 'completions' command will supply the same arguments that Readline would
supply to the callback function.

Phil Sainty added 24 commits August 15, 2020 11:45
We establish the readline-equivalent $input word for consistency, so
that the $input value that we pass to processCallback() will be
equivalent to the value passed by GNU Readline in the same situation.

This is not strictly necessary for the present code, but may prove to
be useful for subsequent enhancements.
ClassNamesMatcher was the only matcher that was using preg_quote();
all of the others mis-handled regexp special characters.
This will be used in place of tests for the T_STRING token, which is
inherently unreliable for completion purposes, because an incomplete
identifier can be any of ~70 different tokens.
Unless there are multiple token variables being acted upon, name the
completion candidate $input, for consistency with the other Matchers.
We rejected plain '$' as $token, so don't allow self::T_VARIABLE.

T_OPEN_TAG is fine, but was already handled.
A T_VARIABLE cannot be completed to a Keyword.
Don't assume that one command isn't the prefix of another.

Do the cheap empty() test first.
…tched() methods

tokenIsValidIdentifier($token, true) allows the empty string to match.
- Optimisation for ClassAttributesMatcher and ClassMethodsMatcher.
- Also remove additional redundant type-checking, already catered for
  by '===' equality test.
This ensures that a $token which is valid neither as an identifier nor
as a variable will only appear as a 'previous' token, and never as the
token being actively completed.  This reduces the variety of cases
which the matchers need to care about.

This also ensures that completing against whitespace does not attempt
to complete the preceding token (which was already complete, and not
what the user asked for).
Doing so prevents a CodeArgument from seeing that trailing whitespace,
which in turn meant that the 'completions' command was unable to tell
whether the user wanted to complete the previous token, or (after a
space) an empty string.
Both indicate that whatever comes next is a new expression.
This includes blacklisting T_NEW and T_NS_SEPARATOR when not completing
classes; and blacklisting T_OBJECT_OPERATOR and T_DOUBLE_COLON when not
completing attributes/methods for objects and classes (respectively).
If we encounter a needCompleteClass() token, don't continue to prepend
the tokens which preceded that.
ClassNamesMatcher::getMatches() passes the incomplete class, so we
need to support an incomplete (potentially empty) final component.
@phil-s
Copy link
Author

phil-s commented Aug 15, 2020

Some Style CI issues to resolve there. I'm done for tonight, but I'll try to fix those tomorrow.

Copy link
Owner

@bobthecow bobthecow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. Sorry about the code style thrashing 😬

I'm a littttle bit worried that all these changes and bug fixes didn't break any tests.

On master, it looks like fully a third of matchers have no test coverage at all :(

https://codecov.io/gh/bobthecow/psysh/tree/master/src/TabCompletion/Matcher

The remainder are in a decent place, but the ones that are < 100% tend to drop coverage for edge cases, so it wouldn't surprise me if we would have found more of the bugs you've uncovered if we'd tested empty states, etc.

For a change this size, I'd feel a lot better about things if (1) we made sure existing functionality was covered and passing on master before this pull request, and (2) we added edge case tests for all the bugs fixed, and made sure they fail on master and work after this pull request.

* @return array
*/
protected function getVariables()
protected function getVariables($dollarPrefix = false)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like an implementation detail for the VariablesMatcher. Will anything else ever need it? Should that matcher do the prefixing itself?

Copy link
Author

@phil-s phil-s Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether anything else will ever use it; but if they do, I think it would be nicer if it was as simple as passing an argument to the existing function, so I'd be inclined to leave it where it is. (I'm happy to change it if you disagree, though.)

src/TabCompletion/AutoCompleter.php Outdated Show resolved Hide resolved
src/TabCompletion/AutoCompleter.php Show resolved Hide resolved
src/TabCompletion/Matcher/AbstractMatcher.php Show resolved Hide resolved
src/TabCompletion/Matcher/AbstractMatcher.php Show resolved Hide resolved
src/Shell.php Outdated Show resolved Hide resolved
src/TabCompletion/Matcher/AbstractMatcher.php Outdated Show resolved Hide resolved
@phil-s
Copy link
Author

phil-s commented Aug 17, 2020

I've pushed some fixup commits (to be squashed later) to deal with the initial style issues.

@codecov
Copy link

codecov bot commented Aug 17, 2020

Codecov Report

Merging #649 into master will decrease coverage by 0.02%.
The diff coverage is 77.31%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #649      +/-   ##
============================================
- Coverage     68.34%   68.32%   -0.03%     
- Complexity     2287     2302      +15     
============================================
  Files           131      132       +1     
  Lines          5447     5566     +119     
============================================
+ Hits           3723     3803      +80     
- Misses         1724     1763      +39     
Impacted Files Coverage Δ Complexity Δ
src/Command/CompletionsCommand.php 0.00% <0.00%> (ø) 3.00 <3.00> (?)
src/Input/CodeArgument.php 100.00% <ø> (ø) 2.00 <0.00> (ø)
...on/Matcher/ClassMethodDefaultParametersMatcher.php 0.00% <0.00%> (ø) 8.00 <0.00> (ø)
...etion/Matcher/FunctionDefaultParametersMatcher.php 0.00% <0.00%> (ø) 6.00 <0.00> (ø)
src/TabCompletion/Matcher/MongoClientMatcher.php 0.00% <0.00%> (ø) 6.00 <0.00> (ø)
src/TabCompletion/Matcher/MongoDatabaseMatcher.php 0.00% <0.00%> (ø) 6.00 <0.00> (ø)
...n/Matcher/ObjectMethodDefaultParametersMatcher.php 0.00% <0.00%> (ø) 10.00 <0.00> (ø)
src/Shell.php 73.95% <11.11%> (-1.05%) 196.00 <2.00> (+2.00) ⬇️
.../TabCompletion/Matcher/ObjectAttributesMatcher.php 80.76% <75.00%> (ø) 7.00 <0.00> (-1.00)
src/TabCompletion/Matcher/ObjectMethodsMatcher.php 85.18% <75.00%> (ø) 8.00 <0.00> (-1.00)
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6d42021...695f0d5. Read the comment docs.

@phil-s
Copy link
Author

phil-s commented Aug 17, 2020

Oh good grief. The "resolve conversation" button silently ignores the comment you entered for the purpose of closing the conversation? I'd actually commented on everything I closed :(

@bobthecow
Copy link
Owner

@phil-s what's the status on this PR?

@phil-s
Copy link
Author

phil-s commented Oct 30, 2020

You were right in that more tests are needed. I started looking at that (which highlighted some bugs), before a combination of work getting busy and myself getting slightly injured stalled my progress. I'll have to check where I got to -- probably some more WIP commits I could push.

Base automatically changed from master to main January 17, 2021 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants