New 'completions' command, and various tab-completion fixes and improvements #649

phil-s · 2020-08-15T09:45:21Z

This branch implements the new command completions for issue #555,
and also fixes most of the pre-existing problems I encountered while testing
the command and my wrapper. The fixes may well resolve some of the other
tab-completion issues which have been raised previously.

The command is not enabled by default; I'm enabling it in my config file by setting:

  // Make the 'completions' command available.
  'commands' => class_exists('\Psy\Command\CompletionsCommand')
    ? [ new \Psy\Command\CompletionsCommand, ]
    : [],

There are plenty of changes here, but I've kept the individual commits
very focused, so hopefully it's all fairly easy to review.

The Psy\TabCompletion\Matcher systems are doing smart things with PHP
tokens, but are also regularly getting tripped up by them and failing to
offer any completions at all (while sometimes simultaneously treating huge
numbers of irrelevant thing as completion options in the background).

The completion process starts with:

$tokens = \token_get_all('<?php ' . $line);

And then generally the last token is popped from the array and checked to
see whether it could be a valid token for that Matcher's interests, and the
problem is that these tests consider a very limited set of tokens, which
then excludes the large number of PHP keywords which (a) have their own
token, and (b) are completely valid prefixes for an as-yet incomplete
identifier.

So for instance, if you defined a function abstraction() and then try to
use tab completion to input that name, you can complete from "abstrac" and
"abstracti", but "abstract" gives you no results, because that's the token
T_ABSTRACT, and FunctionsMatcher::hasMatched(), like most of the
matchers, ignores text which wasn't parsed as T_STRING.

I think the following would be the current list of un-completable
identifiers, but obviously it's subject to change as the language evolves,
so maintaining a big whitelist doesn't seem like the way forward.

T_ABSTRACT                      abstract
T_LOGICAL_AND                   and
T_ARRAY                         array
T_AS                            as
T_BREAK                         break
T_CALLABLE                      callable
T_CASE                          case
T_CATCH                         catch
T_CLASS                         class
T_CLONE                         clone
T_CONST                         const
T_CONTINUE                      continue
T_DECLARE                       declare
T_DEFAULT                       default
T_EXIT                          die
T_DO                            do
T_ECHO                          echo
T_ELSE                          else
T_ELSEIF                        elseif
T_EMPTY                         empty
T_ENDDECLARE                    enddeclare
T_ENDFOR                        endfor
T_ENDFOREACH                    endforeach
T_ENDIF                         endif
T_ENDSWITCH                     endswitch
T_ENDWHILE                      endwhile
T_EVAL                          eval
T_EXIT                          exit
T_EXTENDS                       extends
T_FINAL                         final
T_FINALLY                       finally
T_FN                            fn
T_FOR                           for
T_FOREACH                       foreach
T_FUNCTION                      function
T_GLOBAL                        global
T_GOTO                          goto
T_IF                            if
T_IMPLEMENTS                    implements
T_INCLUDE                       include
T_INCLUDE_ONCE                  include_once
T_INSTANCEOF                    instanceof
T_INSTEADOF                     insteadof
T_INTERFACE                     interface
T_ISSET                         isset
T_LIST                          list
T_NAMESPACE                     namespace
T_NEW                           new
T_LOGICAL_OR                    or
T_PRINT                         print
T_PRIVATE                       private
T_PROTECTED                     protected
T_PUBLIC                        public
T_REQUIRE                       require
T_REQUIRE_ONCE                  require_once
T_RETURN                        return
T_STATIC                        static
T_SWITCH                        switch
T_THROW                         throw
T_TRAIT                         trait
T_TRY                           try
T_UNSET                         unset
T_USE                           use
T_VAR                           var
T_WHILE                         while
T_LOGICAL_XOR                   xor
T_YIELD                         yield

That's 67 tokens (which is nearly 50%). The remaining 72 are:

T_AND_EQUAL                     &=
T_ARRAY_CAST                    (array)
T_BAD_CHARACTER
T_BOOLEAN_AND                   &&
T_BOOLEAN_OR                    ||
T_BOOL_CAST                     (bool) or (boolean)
T_CHARACTER
T_CLASS_C                       __CLASS__
T_CLOSE_TAG                     ?> or %>
T_COALESCE                      ??
T_COALESCE_EQUAL                ??=
T_COMMENT                       // or #, and /* */
T_CONCAT_EQUAL                  .=
T_CONSTANT_ENCAPSED_STRING      "foo" or 'bar'
T_CURLY_OPEN                    {$
T_DEC                           --
T_DIR                           __DIR__
T_DIV_EQUAL                     /=
T_DNUMBER                       0.12, etc.
T_DOC_COMMENT                   /** */
T_DOLLAR_OPEN_CURLY_BRACES      ${
T_DOUBLE_ARROW                  =>
T_DOUBLE_CAST                   (real), (double) or (float)
T_DOUBLE_COLON                  ::
T_ELLIPSIS                      ...
T_ENCAPSED_AND_WHITESPACE       " $a"
T_END_HEREDOC
T_FILE                          __FILE__
T_FUNC_C                        __FUNCTION__
T_HALT_COMPILER                 __halt_compiler
T_INC                           ++
T_INLINE_HTML
T_INT_CAST                      (int) or (integer)
T_IS_EQUAL                      ==
T_IS_GREATER_OR_EQUAL           >=
T_IS_IDENTICAL                  ===
T_IS_NOT_EQUAL                  != or <>
T_IS_NOT_IDENTICAL              !==
T_IS_SMALLER_OR_EQUAL           <=
T_LINE                          __LINE__
T_LNUMBER                       123, 012, 0x1ac, etc.
T_METHOD_C                      __METHOD__
T_MINUS_EQUAL                   -=
T_MOD_EQUAL                     %=
T_MUL_EQUAL                     *=
T_NS_C                          __NAMESPACE__
T_NS_SEPARATOR                  \
T_NUM_STRING                    "$a[0]"
T_OBJECT_CAST                   (object)
T_OBJECT_OPERATOR               ->
T_OPEN_TAG                      <?php, <? or <%
T_OPEN_TAG_WITH_ECHO            <?= or <%=
T_OR_EQUAL                      |=
T_PAAMAYIM_NEKUDOTAYIM          ::
T_PLUS_EQUAL                    +=
T_POW                           **
T_POW_EQUAL                     **=
T_SL                            <<
T_SL_EQUAL                      <<=
T_SPACESHIP                     <=>
T_SR                            >>
T_SR_EQUAL                      >>=
T_START_HEREDOC                 <<<
T_STRING                        parent, self, etc.
T_STRING_CAST                   (string)
T_STRING_VARNAME                "${a
T_TRAIT_C                       __TRAIT__
T_UNSET_CAST                    (unset)
T_VARIABLE                      $foo
T_WHITESPACE                    \t \r\n
T_XOR_EQUAL                     ^=
T_YIELD_FROM                    yield from

We need all the Matchers to stop caring whether or not the last token in the
list was parsed as T_STRING (in particular), because implicitly that token
is not yet complete, and therefore the parser can't know what it's
supposed to be. They should ignore the token type, and instead just check
that text to see whether it would be valid as the prefix of an identifier,
which we can do by matching against the CONSTANT_SYNTAX regexp[1].

In addition to the bug where no completions are supplied, the token
behaviour can also do something akin to the opposite, and cause certain
matchers to return everything they know about as a completion, whether or
not it matches the initial input.

This happens when a matcher's getMatches() method calls
AbstractMatcher::getInput() which firstly sets $var to an empty string,
but then (on account of tokens) never gets to change it to anything else.

getMatches() then 'filters' All Of The Things It Knows About using
AbstractMatcher::startsWith(), to find which of those things begins with
an empty string. That method happily agrees that everything starts with
an empty string, and so All Of The Things are merged into the set of
completions.

You can observe this by adding print_r($matches); (or other logging) to
AutoCompleter::processCallback() right before it returns, and then
attempting completion. var or $var for example.

GNU Readline is masking this bug by not presenting any of the candidates
which do not actually start with the original incomplete word/prefix[2],
making it seem as if the right thing is happening behind the scenes; but the
new 'completions' command would list all of them, which is undesirable.

(We could make the command do its own additional filtering step, but it's
better to produce the correct set of candidates in the first place.)

Finally, there's a lot of complexity (and consequently some bugs) caused by
not normalising the token sequence to ensure that the Thing To Be Completed
is always the final token. At present, if the user effectively
tab-completes at an empty string, then we end up with a token sequence
ending with the previous (and already-complete) token. Some of the
matchers are attempting to handle this by looking for tokens-of-interest in
both the current and the previous token; but it greatly simplifies the
logic if we instead ensure that the token sequence is more consistent, so
that if we're completing an empty string then the token sequence simply ends
with an empty string 'token'.

[1] I do have one question about CONSTANT_SYNTAX and VAR_SYNTAX:

https://www.php.net/manual/en/language.variables.basics.php gives a
near-identical pattern (and suggests that this applies to all PHP
identifiers, rather than just variables); but your pattern includes the
character 0x7f (DEL), whereas the one in the manual starts that range from
0x80. I'm not sure whether this indicates a change/fix to the manual since
the code was written, or if you've intentionally added that DEL char to your
regexps (but I can't think why that would be).

const CONSTANT_SYNTAX = '^[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$';
const VAR_SYNTAX = '^\$[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$';

The manual says:

Variable names follow the same rules as other labels in PHP. A valid
variable name starts with a letter or underscore, followed by any number
of letters, numbers, or underscores. As a regular expression, it would be
expressed thus: ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$

Note: For our purposes here, a letter is a-z, A-Z, and the bytes from 128
through 255 (0x80-0xff).

So this may be a bug in the current regexps?

[2] What the incomplete word/prefix will actually be is another source of
confusion. AFAICS, PHP's readline support provides no control over the set
of characters which act as word breaks for the purpose of completing the
current 'word'; and the defaults we're stuck with produce some fairly
inconsistent behaviours for completing PHP code.

The current code was handling things appropriately, but there was absolutely
no documentation of the reasons why certain things were the way they were.
I ended up spending a bunch of time trying to make things more consistent
for the new 'completions' command, only to find that those changes didn't
work in readline itself.

This was a documentation problem in large part (and the new documentation
will be beneficial to anyone looking to hack on this code in the future),
but we also need to define those word break characters in the code so that
the 'completions' command will supply the same arguments that Readline would
supply to the callback function.

We establish the readline-equivalent $input word for consistency, so that the $input value that we pass to processCallback() will be equivalent to the value passed by GNU Readline in the same situation. This is not strictly necessary for the present code, but may prove to be useful for subsequent enhancements.

ClassNamesMatcher was the only matcher that was using preg_quote(); all of the others mis-handled regexp special characters.

This will be used in place of tests for the T_STRING token, which is inherently unreliable for completion purposes, because an incomplete identifier can be any of ~70 different tokens.

Unless there are multiple token variables being acted upon, name the completion candidate $input, for consistency with the other Matchers.

We rejected plain '$' as $token, so don't allow self::T_VARIABLE. T_OPEN_TAG is fine, but was already handled.

A T_VARIABLE cannot be completed to a Keyword.

Don't assume that one command isn't the prefix of another. Do the cheap empty() test first.

…ed() methods

…tched() methods tokenIsValidIdentifier($token, true) allows the empty string to match.

- Optimisation for ClassAttributesMatcher and ClassMethodsMatcher. - Also remove additional redundant type-checking, already catered for by '===' equality test.

This ensures that a $token which is valid neither as an identifier nor as a variable will only appear as a 'previous' token, and never as the token being actively completed. This reduces the variety of cases which the matchers need to care about. This also ensures that completing against whitespace does not attempt to complete the preceding token (which was already complete, and not what the user asked for).

Doing so prevents a CodeArgument from seeing that trailing whitespace, which in turn meant that the 'completions' command was unable to tell whether the user wanted to complete the previous token, or (after a space) an empty string.

Both indicate that whatever comes next is a new expression.

This includes blacklisting T_NEW and T_NS_SEPARATOR when not completing classes; and blacklisting T_OBJECT_OPERATOR and T_DOUBLE_COLON when not completing attributes/methods for objects and classes (respectively).

If we encounter a needCompleteClass() token, don't continue to prepend the tokens which preceded that.

ClassNamesMatcher::getMatches() passes the incomplete class, so we need to support an incomplete (potentially empty) final component.

phil-s · 2020-08-15T09:49:09Z

Some Style CI issues to resolve there. I'm done for tonight, but I'll try to fix those tomorrow.

bobthecow

Heh. Sorry about the code style thrashing 😬

I'm a littttle bit worried that all these changes and bug fixes didn't break any tests.

On master, it looks like fully a third of matchers have no test coverage at all :(

https://codecov.io/gh/bobthecow/psysh/tree/master/src/TabCompletion/Matcher

The remainder are in a decent place, but the ones that are < 100% tend to drop coverage for edge cases, so it wouldn't surprise me if we would have found more of the bugs you've uncovered if we'd tested empty states, etc.

For a change this size, I'd feel a lot better about things if (1) we made sure existing functionality was covered and passing on master before this pull request, and (2) we added edge case tests for all the bugs fixed, and made sure they fail on master and work after this pull request.

bobthecow · 2020-08-15T14:58:49Z

src/TabCompletion/Matcher/AbstractContextAwareMatcher.php

     * @return array
     */
-    protected function getVariables()
+    protected function getVariables($dollarPrefix = false)


This feels like an implementation detail for the VariablesMatcher. Will anything else ever need it? Should that matcher do the prefixing itself?

I'm not sure whether anything else will ever use it; but if they do, I think it would be nicer if it was as simple as passing an argument to the existing function, so I'd be inclined to leave it where it is. (I'm happy to change it if you disagree, though.)

src/TabCompletion/AutoCompleter.php

src/TabCompletion/Matcher/AbstractMatcher.php

src/Shell.php

src/TabCompletion/Matcher/AbstractMatcher.php

… T_STRING

…asMatched() methods

…Is()

phil-s · 2020-08-17T01:15:20Z

I've pushed some fixup commits (to be squashed later) to deal with the initial style issues.

… T_STRING

…Is()

Travis says: Fatal error: Default value for parameters with a class type hint can only be NULL in /home/travis/build/bobthecow/psysh/src/TabCompletion/Matcher/AbstractMatcher.php on line 216

codecov · 2020-08-17T02:51:51Z

Codecov Report

Merging #649 into master will decrease coverage by 0.02%.
The diff coverage is 77.31%.

@@             Coverage Diff              @@
##             master     #649      +/-   ##
============================================
- Coverage     68.34%   68.32%   -0.03%     
- Complexity     2287     2302      +15     
============================================
  Files           131      132       +1     
  Lines          5447     5566     +119     
============================================
+ Hits           3723     3803      +80     
- Misses         1724     1763      +39

Impacted Files	Coverage Δ	Complexity Δ
src/Command/CompletionsCommand.php	`0.00% <0.00%> (ø)`	`3.00 <3.00> (?)`
src/Input/CodeArgument.php	`100.00% <ø> (ø)`	`2.00 <0.00> (ø)`
...on/Matcher/ClassMethodDefaultParametersMatcher.php	`0.00% <0.00%> (ø)`	`8.00 <0.00> (ø)`
...etion/Matcher/FunctionDefaultParametersMatcher.php	`0.00% <0.00%> (ø)`	`6.00 <0.00> (ø)`
src/TabCompletion/Matcher/MongoClientMatcher.php	`0.00% <0.00%> (ø)`	`6.00 <0.00> (ø)`
src/TabCompletion/Matcher/MongoDatabaseMatcher.php	`0.00% <0.00%> (ø)`	`6.00 <0.00> (ø)`
...n/Matcher/ObjectMethodDefaultParametersMatcher.php	`0.00% <0.00%> (ø)`	`10.00 <0.00> (ø)`
src/Shell.php	`73.95% <11.11%> (-1.05%)`	`196.00 <2.00> (+2.00)`	⬇️
.../TabCompletion/Matcher/ObjectAttributesMatcher.php	`80.76% <75.00%> (ø)`	`7.00 <0.00> (-1.00)`
src/TabCompletion/Matcher/ObjectMethodsMatcher.php	`85.18% <75.00%> (ø)`	`8.00 <0.00> (-1.00)`
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6d42021...695f0d5. Read the comment docs.

…ass()

phil-s · 2020-08-17T05:28:39Z

Oh good grief. The "resolve conversation" button silently ignores the comment you entered for the purpose of closing the conversation? I'd actually commented on everything I closed :(

bobthecow · 2020-10-30T05:06:24Z

@phil-s what's the status on this PR?

phil-s · 2020-10-30T06:08:32Z

You were right in that more tests are needed. I started looking at that (which highlighted some bugs), before a combination of work getting busy and myself getting slightly injured stalled my progress. I'll have to check where I got to -- probably some more WIP commits I could push.

Phil Sainty added 24 commits August 15, 2020 11:45

Document the word-break limitations of the Readline completion

9b33ce3

Add 'completions' to the needCompleteClass() list

05afb93

Make AbstractMatcher::startsWith() treat $prefix as a verbatim string

d177094

ClassNamesMatcher was the only matcher that was using preg_quote(); all of the others mis-handled regexp special characters.

Add tokenIsValidIdentifier() for generic tab-completion tests

025784a

This will be used in place of tests for the T_STRING token, which is inherently unreliable for completion purposes, because an incomplete identifier can be any of ~70 different tokens.

Use tokenIsValidIdentifier() in getInput() instead of matching T_STRING

6b8a9d1

Renaming variables for consistency

69739d1

Unless there are multiple token variables being acted upon, name the completion candidate $input, for consistency with the other Matchers.

Do not return completions if valid input was not obtained

4bb2898

Add missing documentation

f40b749

Bug fix for ClassNamesMatcher::hasMatched()

20a5c0a

We rejected plain '$' as $token, so don't allow self::T_VARIABLE. T_OPEN_TAG is fine, but was already handled.

Bug fix for KeywordsMatcher::hasMatched()

3c92986

A T_VARIABLE cannot be completed to a Keyword.

Bug fix for CommandsMatcher::hasMatched()

17d1c4b

Don't assume that one command isn't the prefix of another. Do the cheap empty() test first.

Use tokenIsValidIdentifier() instead of T_STRING matching in hasMatch…

e0c0b2a

…ed() methods

Use tokenIsValidIdentifier() with T_OBJECT_OPERATOR matching in hasMa…

f206480

…tched() methods tokenIsValidIdentifier($token, true) allows the empty string to match.

Comments

3cd36d0

Minor refactoring

63ae75d

- Optimisation for ClassAttributesMatcher and ClassMethodsMatcher. - Also remove additional redundant type-checking, already catered for by '===' equality test.

Treat T_OPEN_TAG and ';' the same way for the purposes of completion

1655584

Both indicate that whatever comes next is a new expression.

Support string tokens in AbstractMatcher::hasToken() and tokenIs()

0611ead

Fix inconsistencies with completion of variables

3a829c7

Improve the 'previous token' blacklisting in hasMatched() methods

2b566fe

This includes blacklisting T_NEW and T_NS_SEPARATOR when not completing classes; and blacklisting T_OBJECT_OPERATOR and T_DOUBLE_COLON when not completing attributes/methods for objects and classes (respectively).

Bug fix for getNamespaceAndClass()

c66e4aa

If we encounter a needCompleteClass() token, don't continue to prepend the tokens which preceded that.

Allow an incomplete class name in getNamespaceAndClass()

4c7a285

ClassNamesMatcher::getMatches() passes the incomplete class, so we need to support an incomplete (potentially empty) final component.

bobthecow reviewed Aug 15, 2020

View reviewed changes

bobthecow mentioned this pull request Aug 15, 2020

Tab completion doubles method name #506

Open

Phil Sainty added 3 commits August 17, 2020 11:21

fixup! Add 'completions' command

80717d0

fixup! Add tokenIsValidIdentifier() for generic tab-completion tests

72a0924

fixup! Use tokenIsValidIdentifier() in getInput() instead of matching…

4ae9f12

… T_STRING

Phil Sainty added 5 commits August 17, 2020 12:10

fixup! Use tokenIsValidIdentifier() instead of T_STRING matching in h…

bfaeb46

…asMatched() methods

fixup! If not completing a valid prefix, consider it an empty string

61fd4d2

fixup! Support string tokens in AbstractMatcher::hasToken() and token…

09e7422

…Is()

fixup! Fix inconsistencies with completion of variables

91c485f

fixup! Allow an incomplete class name in getNamespaceAndClass()

e585663

Phil Sainty added 11 commits August 17, 2020 13:17

fixup! Fix inconsistencies with completion of variables

4d0041b

fixup! Add 'completions' command

a16f51c

fixup! If not completing a valid prefix, consider it an empty string

0e0b3a1

fixup! Use tokenIsValidIdentifier() in getInput() instead of matching…

454e3db

… T_STRING

fixup! Add tokenIsValidIdentifier() for generic tab-completion tests

f00e8c2

fixup! Add tokenIsValidIdentifier() for generic tab-completion tests

cdd69c2

fixup! Fix inconsistencies with completion of variables

5a22d52

fixup! Document the word-break limitations of the Readline completion

51f66e1

fixup! Add 'completions' command

cf8a487

fixup! Support string tokens in AbstractMatcher::hasToken() and token…

57be706

…Is()

fixup! Add tokenIsValidIdentifier() for generic tab-completion tests

bb27322

Travis says: Fatal error: Default value for parameters with a class type hint can only be NULL in /home/travis/build/bobthecow/psysh/src/TabCompletion/Matcher/AbstractMatcher.php on line 216

Phil Sainty added 2 commits August 17, 2020 14:54

Update the list of commands in AbstractMatcher::needCompleteClass()

b89af21

fixup! Update the list of commands in AbstractMatcher::needCompleteCl…

695f0d5

…ass()

Base automatically changed from master to main January 17, 2021 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New 'completions' command, and various tab-completion fixes and improvements #649

New 'completions' command, and various tab-completion fixes and improvements #649

phil-s commented Aug 15, 2020

phil-s commented Aug 15, 2020

bobthecow left a comment

bobthecow Aug 15, 2020

phil-s Aug 17, 2020 •

edited

phil-s commented Aug 17, 2020

codecov bot commented Aug 17, 2020 •

edited

phil-s commented Aug 17, 2020

bobthecow commented Oct 30, 2020

phil-s commented Oct 30, 2020

New 'completions' command, and various tab-completion fixes and improvements #649

Are you sure you want to change the base?

New 'completions' command, and various tab-completion fixes and improvements #649

Conversation

phil-s commented Aug 15, 2020

phil-s commented Aug 15, 2020

bobthecow left a comment

Choose a reason for hiding this comment

bobthecow Aug 15, 2020

Choose a reason for hiding this comment

phil-s Aug 17, 2020 • edited

Choose a reason for hiding this comment

phil-s commented Aug 17, 2020

codecov bot commented Aug 17, 2020 • edited

Codecov Report

phil-s commented Aug 17, 2020

bobthecow commented Oct 30, 2020

phil-s commented Oct 30, 2020

phil-s Aug 17, 2020 •

edited

codecov bot commented Aug 17, 2020 •

edited