Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern groups #313

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Pattern groups #313

wants to merge 13 commits into from

Conversation

frnhr
Copy link

@frnhr frnhr commented Mar 13, 2016

Implementation of Pattern Groups

... a solution for medium-sized interfaces, when subparsers are an overkill, but interface is still getting too large for comfort.

Problem

Consider this not-so-much contrived example:

prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_file <source_path> (--dest_file=OUT_FILE | (--dest_db=OUTDB [--dest_user=DEST_USER --dest_pass=DEST_PASS]))
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] (--out_file=OUTFILE | (--out_db=OUTDB [-u OUTUSER -p OUTPASS]))

... or, the same interface specified like so:

prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_file <source_path> --dest_file=OUT_FILE 
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_file <source_path> --dest_db=OUTDB [--dest_user=DEST_USER --dest_pass=DEST_PASS]
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] --out_file=OUTFILE
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] --out_db=OUTDB [-u OUTUSER -p OUTPASS]

Not very readable. Granted, It could have been somewhat shortened by using one-letter options, and "IN" instead of "SOURCE", but that is not really the direction we should be going.

Analysis

There are several problems with this particular example.

Long common subpattern

Common subpatterns can be put in front of other subpatterns, or after them. The former increases readability because other subpatterns are visually aligned, while at the same time decreasing it by moving important subpatterns to the right and possibly off the screen or wrapped to the next line.

prog -c --ccc cc  aaa --aaa=AAA B
prog -c --ccc cc  aaa bbb -b -bbbb=BBBBB (-f | -g)

vs :

prog aaa --aaa=AAA B -c --ccc cc
prog aaa bbb -b -bbbb=BBBBB (-f | -g) -c --ccc cc

Multiple axes of patterns

If there are multiple mutually exclusive subpatterns, pattern inevitably gets longer and complexity increases. Each group (pair or larger) of mutually exclusive subpatterns can be considered an axes in mathematical space of accepted usages.

Those spaces can, in principle, have multiple dimensions. Usage patterns by their nature are well suited for describing 1-dimensional spaces.

For clarity let us suppose possible subpatterns are A1 | A2 | A3 and B1 | B2 | B3. The space of accepted usages is:

A1B3 A2B3 A3B3
A1B2 A2B2 A3B2
A1B1 A2B1 A3B1
possibility 1 - list all combinations explicitly:
prog C A1 B1
prog C A1 B2
prog C A1 B3
prog C A2 B1
prog C A2 B2
prog C A2 B3
prog C A3 B1
prog C A3 B2
prog C A3 B3

This might seem readable at first, but if sections are of different lengths (and they usually are), it quickly becomes Matrix.

possibility 2 - extract one element
prog C A1 (B1 | B2 | B3)
prog C A2 (B1 | B2 | B3)
prog C A3 (B1 | B2 | B3)

Or similarly:

prog C (A1 | A2 | A3) B1
prog C (A1 | A2 | A3) B2
prog C (A1 | A2 | A3) B3

This is often the least problematic approach, but the patterns tend to get too long, again reducing readability.

possibility 3 - onebignastyonelinerofpatterns
prog C (A1 | A2 | A3) (B1 | B2 | B3)
no limits

There is no limit to either 3 points of any axes, or to 2 axes. Example could be contrived with subpatterns A1-A5, B1-B2, C and D1-D6.

Readability is decreased with each addition. Arguably, more then proportionally to the increase in actual interface complexity.

Solution - Pattern groups

Proposed solution is to extract much-repeating subpatterns and represent them with a group.

Keeping with the notation from the previous discussion, this interface:

Usage: prog C (A1 | A2 | A3) (B1 | B2 | B3)

... becomes:

Usage: prog C -groupA- -groupB-

GroupA: A1 | A2 | A3
GroupB: B1 | B2 | B3

Back to the example from the beginning, now implemented with pattern groups:

prog [-common_options-] -source- -destination-

Common Options:
  -l LOGFILE 
  -v VERBOSITY 
  -z TIMEZONE

Source:
   read_file <source_path> |
   (read_db <source_db> 
      [--source_user=SOURCE_USER 
       --source_pass=SOURCE_PASS] )

Destination:
  --dest_file=OUT_FILE | 
 (--dest_db=OUTDB 
    [--dest_user=DEST_USER --dest_pass=DEST_PASS] )

This type of specification is significantly more readable. And actually a bit shorter (in this example only about 30 characters shorter, but that is somewhat besides the point).

Mix and match

There is no obligation to put every repeatable subpattern in a group. Groups can be matched with other pattern elements in any way.

For example, important subpatterns can be kept in the usage pattern (even if they are repeated), while other subpattern can be extracted to groups. This example is equivalent to the one above, possibly even more readable:

prog  read_file <source_path> -destination- [-common_options-]
prog  read_db <source_db> [-db_options-]  -destination- [-common_options-]

Common Options:
  -l LOGFILE 
  -v VERBOSITY 
  -z TIMEZONE

DB Options:
  --source_user=SOURCE_USER 
  --source_pass=SOURCE_PASS

Destination:
  --dest_file=OUT_FILE | 
  (--dest_db=OUTDB 
    [--dest_user=DEST_USER --dest_pass=DEST_PASS] )

Note how only part of the second repeating subpattern (database data) is now extracted into a group.

Implementation

Pattern groups (like -source- in the example above) are only "syntactic sugar". Before Docopt does any real parsing, the group elements (e.g. -source-) are replaced by their respective pattern definitions (e.g. read_file <source_path> |(read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] )).

Regex expression for group elements

Since elements like -something- are not useful for CLI specification, they are used here as a regex pattern for groups.

Regex pattern will match [letters, numbers, underscore, dash] between [space_etc][dash] and [dash][space_etc].

Alternative group delimiters

Similar implementations are possible with other characters, as in {my_group}, ~my_group~, $my_group$, etc. However, introducing new characters was deemed unnecessary, and possibly (though not likely) a source of collisions with other systems (e.g. {} with Python's string.format, $ and ~ with Bash).

Why Merge?

  • Implementation adds only 20 lines of code.
    Not counting tests, comments, blank lines...
    Oh, and total lines of code in docopt.py is still below 400! (398 by my count)
  • Backward compatible
  • Solves my problems :)
    Interfaces have a tendency to grow. This keeps them readable, while adding very little complexity for both the application developer and Docopt.

@frnhr frnhr mentioned this pull request Mar 13, 2016
@keleshev
Copy link
Member

keleshev commented Apr 21, 2017

I don't see why the following:

prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_file <source_path> --dest_file=OUT_FILE 
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_file <source_path> --dest_db=OUTDB [--dest_user=DEST_USER --dest_pass=DEST_PASS]
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] --out_file=OUTFILE
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] --out_db=OUTDB [-u OUTUSER -p OUTPASS]

Couldn't be rewritten as:

usage:
  prog [options] read_file <source_path> --dest_file=OUT_FILE 
  prog [options] read_file <source_path> --dest_db=OUTDB [--dest_user=DEST_USER --dest_pass=DEST_PASS]
  prog [options] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] --out_file=OUTFILE
  prog [options] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] --out_db=OUTDB [-u OUTUSER -p OUTPASS]

options:
  ...

Anyway, Docopt incorporates only POSIX standard, plus widely used conventions. So new syntax/semantics don't have any chance of being incorporated.

@noraj
Copy link

noraj commented Jun 9, 2021

#459 #110 #294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants