Skip to content

Commit

Permalink
Regex in parse_section() is aligned with proposal for docopt.
Browse files Browse the repository at this point in the history
The proposed changes for docopt can be found in the pull request
[339](docopt/docopt#339).
  • Loading branch information
EVGVir committed Aug 4, 2016
1 parent d997d40 commit fb64b5c
Showing 1 changed file with 11 additions and 7 deletions.
18 changes: 11 additions & 7 deletions docopt.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -161,16 +161,20 @@ std::vector<T*> flat_filter(Pattern& pattern) {
}

static std::vector<std::string> parse_section(std::string const& name, std::string const& source) {
// There is no a multiline strings concept in std::regex, therefore the symbols `^` and `$` match
// only once at the start and at the end of a string, even if this string contains new line
// characters. For this reason, following constructions are used instead:
// (?:^|\\n) - start of a line;
// (?=\\n|$) - end of a line.
// ECMAScript regex only has "?=" for a non-matching lookahead. In order to make sure we always have
// a newline to anchor our matching, we have to avoid matching the final newline of each grouping.
// Therefore, our regex is adjusted from the docopt Python one to use ?= to match the newlines before
// the following lines, rather than after.
std::regex const re_section_pattern {
"(?:^|\\n)" // anchored at a linebreak (or start of string)
"("
"[^\\n]*" + name + "[^\\n]*(?=\\n?)" // a line that contains the section name
"(?:\\n+[ \\t].*?(?=\\n|$))*" // followed by any number of indented or empty lines
")",
"(?:^|\\n)(" // A section begins at start of a line and consists of:
".*" + name + ".*" // - a line that contains the section's name; and
"(?:" // - several
"\\n+[ \\t].*" // indented lines possibly separated by empty lines.
")*"
")(?=\\n|$)", // The section ends at the end of a line.
std::regex::icase
};

Expand Down

0 comments on commit fb64b5c

Please sign in to comment.