Can 'doc' use the terminal width for formatting output? #595

phil-s · 2019-12-23T01:33:07Z

At present, regardless of the terminal width, doc output seems to wrap at ~100 columns. I'm not sure whether this formatting is under psysh's control, but if so then it would be nice if it used the actual terminal width (or else used 80 columns, which is a much safer assumption for a hard-coded value).

bobthecow · 2020-04-06T05:30:49Z

Unfortunately it can't with the way it's currently written (the line breaks, indents, etc are hardcoded into the manual DB during compilation).

phil-s · 2020-04-06T09:26:01Z

Ah, ok. Is that an upstream issue, or something you potentially have control over? (I'm not quite sure where the php_manual.sqlite files at https://github.com/bobthecow/psysh/wiki/PHP-manual actually come from.)

If you have control, and wrapping at 80 columns is an option, I think that would be a safer choice for hard-coding.

Failing that, it would be interesting to know what the actual rules are for formatting (as that would potentially allow detection and re-formatting -- even if I'm doubtful that I'll actually try to do that :)

phil-s · 2020-04-07T04:56:25Z

After a look at some example output, I decided to give it a whirl after all, and got it working pretty well! I'm assuming that more than one space always indicates an indent/tab-stop, and that seems to hold in my initial tests.

I'm expecting I'll find edge-cases where it wraps things it shouldn't, but the only example I've seen so far turns out to be an issue by default:

doc empty produces the following 'list':

Return:
  bool  Returns FALSE if $var exists and has a non-empty, non-zero value. Otherwise returns TRUE.
        
        The following values are considered to be empty: "" (an empty string)0 (0 as an integer)0.0
        (0 as a float)"0" (0 as a string)NULLFALSEarray() (an empty array)

The HTML docs have it as:

The following values are considered to be empty: 

* "" (an empty string)
* 0 (0 as an integer)
* 0.0 (0 as a float)
* "0" (0 as a string)
* NULL
* FALSE
* array() (an empty array)

In theory, could I expect all list items to have the asterisk+space prefix? Or is there a fixed set of such characters?

bobthecow · 2020-04-07T05:54:43Z

We do have control over the wrapping.

The most straightforward thing to do here would be to wrap to 80 characters at build time. We're not, because that extra 20 characters means a lot more things don't get awkward when they're wrapped. But we definitely could.

We're wrapping at build time because proper line wrapping at runtime would require a lot more complexity, e.g. storing structured data rather than flat strings.

We could wrap nothing at build time, and attempt to wrap at runtime. But it's not completely straightforward. For example:

>>> doc explode
function explode($separator, $str, $limit = unknown)

Description:
  Split a string by a string

  Returns an array of strings, each of which is a substring of $string formed by splitting it on boundaries formed by the string $delimiter.

Param:
  string  $delimiter  The boundary string.
  string  $string     The input string.
  int     $limit      If $limit is set and positive, the returned array will contain a maximum of $limit elements with the last element containing the rest of $string.

                      If the $limit parameter is negative, all components except the last -$limit are returned.

                      If the $limit parameter is zero, then this is treated as 1.

Return:
  array  Returns an array of strings created by splitting the $string parameter on boundaries formed by the $delimiter.

         If $delimiter is an empty string (""), explode() will return FALSE. If $delimiter contains a value that is not contained in $string and a negative $limit is used, then an empty array will be returned, otherwise an array containing $string will be returned.

See Also:
   * preg_split()
   * str_split()
   * mb_split()
   * str_word_count()
   * strtok()
   * implode()

In the explode docs, the first line of $limit needs to wrap at the third column, e.g.:

>>> doc explode
function explode($separator, $str, $limit = unknown)

Description:
  Split a string by a string

  Returns an array of strings, each of which is a substring of $string formed by
  splitting it on boundaries formed by the string $delimiter.

Param:
  string  $delimiter  The boundary string.
  string  $string     The input string.
  int     $limit      If $limit is set and positive, the returned array will
                      contain a maximum of $limit elements with the last element
                      containing the rest of $string.

                      If the $limit parameter is negative, all components except
                      the last -$limit are returned.

                      If the $limit parameter is zero, then this is treated as
                      1.

Return:
  array  Returns an array of strings created by splitting the $string parameter
         on boundaries formed by the $delimiter.

         If $delimiter is an empty string (""), explode() will return FALSE. If
         $delimiter contains a value that is not contained in $string and a
         negative $limit is used, then an empty array will be returned,
         otherwise an array containing $string will be returned.

See Also:
   * preg_split()
   * str_split()
   * mb_split()
   * str_word_count()
   * strtok()
   * implode()

But even if we could come up with a good heuristic for this, it's hard, because we're not wrapping plain text. We're wrapping formatted text, which is stored with markup that changes colors, makes things bold, etc, so string lengths, column boundaries, etc are a lot more complicated. So to do this right, we're back to needing to store structured data rather than flat strings :-/

phil-s · 2020-04-07T07:38:19Z

Yes, non-plain-text will cause issues -- as I found when I first tried to plug in my filter, as it was acting before the ansi colour escape codes had been dealt with, so the end result, once the escape characters had been removed, was indented all over the show :)

Here's that example after I've made my window quite narrow:

>>> doc explode
function explode($separator, $str, $limit = unknown)

Description:
  Split a string by a string

  Returns an array of strings, each of which is a
  substring of $string formed by splitting it on
  boundaries formed by the string $delimiter.

Param:
  string  $delimiter  The boundary string.
  string  $string     The input string.
  int     $limit      If $limit is set and positive,
                      the returned array will contain a
                      maximum of $limit elements with
                      the last element containing the
                      rest of $string.

                      If the $limit parameter is
                      negative, all components except
                      the last -$limit are returned.

                      If the $limit parameter is zero,
                      then this is treated as 1.

Return:
  array  Returns an array of strings created by
         splitting the $string parameter on boundaries
         formed by the $delimiter.

         If $delimiter is an empty string (""),
         explode() will return FALSE. If $delimiter
         contains a value that is not contained in
         $string and a negative $limit is used, then an
         empty array will be returned, otherwise an
         array containing $string will be returned.

See Also:
   * preg_split()
   * str_split()
   * mb_split()
   * str_word_count()
   * strtok()
   * implode()

I'd show code, but I'm guessing lisp won't help you, and the actual wrapping is a hand-off to a library function. The custom logic is essentially:

Find the 'most indented' column in the current line, where any sequence of 2+ spaces indicates a new level of indentation.
Remember that position in the text as the start point for reformatting.
Find all the immediately-following lines (if any) which are directly indented to that same column, and which do not appear to be a list item (based on * prefix).
Remember the end of the final line in that group as the end point for reformatting.
Generate an indentation/padding string of spaces matching the indent level.
Set the wrapping column based on the window width.
Pass the text from the start point to the end point to the re-formatter, along with the wrap column and the padding string to prefix to each new line when wrapping.
Move to the next line, and loop until finished.

I'd figured that the "2+ spaces" heuristic might be a fragile thing to rely on, but it seems to apply in all the cases I've checked so far.

phil-s · 2020-04-07T08:05:20Z

But even if we could come up with a good heuristic for this, it's hard, because we're not wrapping plain text. We're wrapping formatted text, which is stored with markup that changes colors, makes things bold, etc, so string lengths, column boundaries, etc are a lot more complicated

Yes, I guess it's trickier on your side, as I do get to process plain text (without the escape codes -- the colour properties are stored differently by that point). I presume there must be code handling this for the current build system in order to wrap at the current 100 columns, but I've no idea whether it would be easily re-usable or adaptable for dynamic wrapping.

phil-s · 2020-04-07T10:34:05Z

I'd figured that the "2+ spaces" heuristic might be a fragile thing to rely on, but it seems to apply in all the cases I've checked so far.

Well, I've spotted some counter-examples in doc var_export:

  var_export() gets structured information about the  given variable. It is similar to var_dump()
  with one exception: the returned representation is valid PHP code.
[...]
  mixed  Returns the variable representation when the $return  parameter is used and evaluates to
         TRUE. Otherwise, this function will return NULL.

Those rogue double-spaces mean I end up with the likes of:

  var_export() gets structured information about the  given variable. It is similar
                                                      to var_dump()
  with one exception: the returned representation is valid PHP code.
[...]
  mixed  Returns the variable representation when the $return  parameter is used
                                                               and evaluates to
         TRUE. Otherwise, this function will return NULL.

I believe those are both errors in the original data, but it does prove that my approach is brittle.

I'd happily live with these occasional minor issues, though -- the pros greatly outweigh the cons.

earboxer · 2021-05-27T15:19:58Z

Since this hasn't seen much attention in the last year, I thought I'd post my kludgy workaround.

I like to read the docs with a single command cheatp, defined like

#!/bin/sh
# Install PHP manual from https://github.com/bobthecow/psysh/wiki/PHP-manual
echo "doc $1" | psysh $2

(using this uses the wrapping baked into the files)

# unfold is this awk script, which tries to undo the effects of fold.
alias unfold='awk '"'"'(length($0) < 70){print $0}(length($0)>=70){printf $0}'"'"
# Custom build of fold accounts for escape sequences
cheatpp () { cheatp $1 --color | sed 's/^  */ /g' | unfold | fold -sw $COLUMNS; };

cheatp $1 --color (equivalent to echo "doc $1" | psysh --color) Force color output in psysh.
sed 's/^ */ /g' Replace preceding indentation with a single space
unfold - make it as if there were no wrapping
awk '(length($0) < 70){print $0}(length($0)>=70){printf $0}' - if the length of the line is less than 70, print it with a newline, if the length >= 70, print it without a newline.
fold -sw $COLUMNS - fold at spaces for the current terminal width

The custom build of fold is a modification of fold from ctools, with a hack which tries to account for escape sequences (just by subtracting a fixed amount).

diff --git a/src/fold.c b/src/fold.c
index eb48a2d..f6d97cf 100644
--- a/src/fold.c
+++ b/src/fold.c
@@ -166,6 +166,8 @@ fold(char * const path, const long w, const unsigned int mode)
 					cp = 0;
 				} else if (buf[i] == '\t') {
 					cstep = 9 - cp % 8;
+				} else if (buf[i] == '\e') {
+				    cstep = -3;// zach hack!
 				} else {
 					cstep = 1;
 				}

The result is a colored output, which fits the size of the current terminal (whether narrow or wide), but has preceding indentation replaced with a single space.

bobthecow added the enhancement label Mar 15, 2020

bobthecow added the docs label May 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can 'doc' use the terminal width for formatting output? #595

Can 'doc' use the terminal width for formatting output? #595

phil-s commented Dec 23, 2019

bobthecow commented Apr 6, 2020

phil-s commented Apr 6, 2020

phil-s commented Apr 7, 2020 •

edited

bobthecow commented Apr 7, 2020

phil-s commented Apr 7, 2020 •

edited

phil-s commented Apr 7, 2020 •

edited

phil-s commented Apr 7, 2020

earboxer commented May 27, 2021

Can 'doc' use the terminal width for formatting output? #595

Can 'doc' use the terminal width for formatting output? #595

Comments

phil-s commented Dec 23, 2019

bobthecow commented Apr 6, 2020

phil-s commented Apr 6, 2020

phil-s commented Apr 7, 2020 • edited

bobthecow commented Apr 7, 2020

phil-s commented Apr 7, 2020 • edited

phil-s commented Apr 7, 2020 • edited

phil-s commented Apr 7, 2020

earboxer commented May 27, 2021

phil-s commented Apr 7, 2020 •

edited

phil-s commented Apr 7, 2020 •

edited

phil-s commented Apr 7, 2020 •

edited