Pattern groups #313

frnhr · 2016-03-13T05:19:33Z

Implementation of Pattern Groups

... a solution for medium-sized interfaces, when subparsers are an overkill, but interface is still getting too large for comfort.

Problem

Consider this not-so-much contrived example:

prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_file <source_path> (--dest_file=OUT_FILE | (--dest_db=OUTDB [--dest_user=DEST_USER --dest_pass=DEST_PASS]))
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] (--out_file=OUTFILE | (--out_db=OUTDB [-u OUTUSER -p OUTPASS]))

... or, the same interface specified like so:

prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_file <source_path> --dest_file=OUT_FILE 
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_file <source_path> --dest_db=OUTDB [--dest_user=DEST_USER --dest_pass=DEST_PASS]
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] --out_file=OUTFILE
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] --out_db=OUTDB [-u OUTUSER -p OUTPASS]

Not very readable. Granted, It could have been somewhat shortened by using one-letter options, and "IN" instead of "SOURCE", but that is not really the direction we should be going.

Analysis

There are several problems with this particular example.

Long common subpattern

Common subpatterns can be put in front of other subpatterns, or after them. The former increases readability because other subpatterns are visually aligned, while at the same time decreasing it by moving important subpatterns to the right and possibly off the screen or wrapped to the next line.

prog -c --ccc cc  aaa --aaa=AAA B
prog -c --ccc cc  aaa bbb -b -bbbb=BBBBB (-f | -g)

vs :

prog aaa --aaa=AAA B -c --ccc cc
prog aaa bbb -b -bbbb=BBBBB (-f | -g) -c --ccc cc

Multiple axes of patterns

If there are multiple mutually exclusive subpatterns, pattern inevitably gets longer and complexity increases. Each group (pair or larger) of mutually exclusive subpatterns can be considered an axes in mathematical space of accepted usages.

Those spaces can, in principle, have multiple dimensions. Usage patterns by their nature are well suited for describing 1-dimensional spaces.

For clarity let us suppose possible subpatterns are A1 | A2 | A3 and B1 | B2 | B3. The space of accepted usages is:

A1B3 A2B3 A3B3
A1B2 A2B2 A3B2
A1B1 A2B1 A3B1

possibility 1 - list all combinations explicitly:

prog C A1 B1
prog C A1 B2
prog C A1 B3
prog C A2 B1
prog C A2 B2
prog C A2 B3
prog C A3 B1
prog C A3 B2
prog C A3 B3

This might seem readable at first, but if sections are of different lengths (and they usually are), it quickly becomes Matrix.

possibility 2 - extract one element

prog C A1 (B1 | B2 | B3)
prog C A2 (B1 | B2 | B3)
prog C A3 (B1 | B2 | B3)

Or similarly:

prog C (A1 | A2 | A3) B1
prog C (A1 | A2 | A3) B2
prog C (A1 | A2 | A3) B3

This is often the least problematic approach, but the patterns tend to get too long, again reducing readability.

possibility 3 - onebignastyonelinerofpatterns

prog C (A1 | A2 | A3) (B1 | B2 | B3)

no limits

There is no limit to either 3 points of any axes, or to 2 axes. Example could be contrived with subpatterns A1-A5, B1-B2, C and D1-D6.

Readability is decreased with each addition. Arguably, more then proportionally to the increase in actual interface complexity.

Solution - Pattern groups

Proposed solution is to extract much-repeating subpatterns and represent them with a group.

Keeping with the notation from the previous discussion, this interface:

Usage: prog C (A1 | A2 | A3) (B1 | B2 | B3)

... becomes:

Usage: prog C -groupA- -groupB-

GroupA: A1 | A2 | A3
GroupB: B1 | B2 | B3

Back to the example from the beginning, now implemented with pattern groups:

prog [-common_options-] -source- -destination-

Common Options:
  -l LOGFILE 
  -v VERBOSITY 
  -z TIMEZONE

Source:
   read_file <source_path> |
   (read_db <source_db> 
      [--source_user=SOURCE_USER 
       --source_pass=SOURCE_PASS] )

Destination:
  --dest_file=OUT_FILE | 
 (--dest_db=OUTDB 
    [--dest_user=DEST_USER --dest_pass=DEST_PASS] )

This type of specification is significantly more readable. And actually a bit shorter (in this example only about 30 characters shorter, but that is somewhat besides the point).

Mix and match

There is no obligation to put every repeatable subpattern in a group. Groups can be matched with other pattern elements in any way.

For example, important subpatterns can be kept in the usage pattern (even if they are repeated), while other subpattern can be extracted to groups. This example is equivalent to the one above, possibly even more readable:

prog  read_file <source_path> -destination- [-common_options-]
prog  read_db <source_db> [-db_options-]  -destination- [-common_options-]

Common Options:
  -l LOGFILE 
  -v VERBOSITY 
  -z TIMEZONE

DB Options:
  --source_user=SOURCE_USER 
  --source_pass=SOURCE_PASS

Destination:
  --dest_file=OUT_FILE | 
  (--dest_db=OUTDB 
    [--dest_user=DEST_USER --dest_pass=DEST_PASS] )

Note how only part of the second repeating subpattern (database data) is now extracted into a group.

Implementation

Pattern groups (like -source- in the example above) are only "syntactic sugar". Before Docopt does any real parsing, the group elements (e.g. -source-) are replaced by their respective pattern definitions (e.g. read_file <source_path> |(read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] )).

Regex expression for group elements

Since elements like -something- are not useful for CLI specification, they are used here as a regex pattern for groups.

Regex pattern will match [letters, numbers, underscore, dash] between [space_etc][dash] and [dash][space_etc].

Alternative group delimiters

Similar implementations are possible with other characters, as in {my_group}, ~my_group~, $my_group$ , etc. However, introducing new characters was deemed unnecessary, and possibly (though not likely) a source of collisions with other systems (e.g. {} with Python's string.format, $ and ~ with Bash).

Why Merge?

Implementation adds only 20 lines of code.
Not counting tests, comments, blank lines...
Oh, and total lines of code in docopt.py is still below 400! (398 by my count)
Backward compatible
Solves my problems :)
Interfaces have a tendency to grow. This keeps them readable, while adding very little complexity for both the application developer and Docopt.

keleshev · 2017-04-21T10:14:37Z

I don't see why the following:

prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_file <source_path> --dest_file=OUT_FILE 
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_file <source_path> --dest_db=OUTDB [--dest_user=DEST_USER --dest_pass=DEST_PASS]
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] --out_file=OUTFILE
prog [-l LOGFILE -v VERBOSITY -z TIMEZONE] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] --out_db=OUTDB [-u OUTUSER -p OUTPASS]

Couldn't be rewritten as:

usage:
  prog [options] read_file <source_path> --dest_file=OUT_FILE 
  prog [options] read_file <source_path> --dest_db=OUTDB [--dest_user=DEST_USER --dest_pass=DEST_PASS]
  prog [options] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] --out_file=OUTFILE
  prog [options] read_db <source_db> [--source_user=SOURCE_USER --source_pass=SOURCE_PASS] --out_db=OUTDB [-u OUTUSER -p OUTPASS]

options:
  ...

Anyway, Docopt incorporates only POSIX standard, plus widely used conventions. So new syntax/semantics don't have any chance of being incorporated.

noraj · 2021-06-09T09:12:15Z

#459 #110 #294

frnhr added 13 commits March 12, 2016 21:47

test

878e17b

quickndirty implementation

2320e9b

single function

37adaf3

added groups to readme

6a4b274

Readme updates for groups

ab92f12

commends on group lines; tests

36373f2

Allow space in group name

44e00ac

added example for pattern groups

b58f5bb

Allow dash in group name

1d678c6

tests for fails

f3cd5e9

rst fixes

73235c0

enable names like "Some Options"

40d6fdc

clean a tmp test

ca06876

frnhr mentioned this pull request Mar 13, 2016

Group options #294

Closed

eaaltonen mentioned this pull request Oct 19, 2020

[RFC] Add sketch of specifying option groups #486

Open

eaaltonen mentioned this pull request Jun 9, 2021

Some problems with optional arguments #459

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pattern groups #313

Pattern groups #313

frnhr commented Mar 13, 2016

keleshev commented Apr 21, 2017 •

edited

noraj commented Jun 9, 2021

Pattern groups #313

Are you sure you want to change the base?

Pattern groups #313

Conversation

frnhr commented Mar 13, 2016

Implementation of Pattern Groups

Problem

Analysis

Long common subpattern

Multiple axes of patterns

possibility 1 - list all combinations explicitly:

possibility 2 - extract one element

possibility 3 - onebignastyonelinerofpatterns

no limits

Solution - Pattern groups

Mix and match

Implementation

Regex expression for group elements

Alternative group delimiters

Why Merge?

keleshev commented Apr 21, 2017 • edited

noraj commented Jun 9, 2021

keleshev commented Apr 21, 2017 •

edited