=pod =for advent_year 2010 =for advent_day 01 =for advent_title Tangled Tidings =for advent_author Jerrad Pierce We begin this year's calendar with a tool to help the adept Perl hacker cope with laziness, be it the laziness of selves past or someone else. M is a package in the YAPE​ family which can untangle the Christmas lights of Perl… regular expressions. YAPEREE​ turns line noise into English explanations. The unstyled output from: =begin pre % perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr%<([^\s>]+)(?:\s+[^>]*?)?(?:/|>.*?</\1)>%)->explain' =end pre looks like the following: =begin pre The regular expression: (?s-imx:<([^\s>]+)(?:\s+[^>]*?)?(?:/|>.*?</\1)>) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?s-imx: group, but do not capture (with . matching \n) (case-sensitive) (with ^ and $ matching normally) (matching whitespace and # normally): ---------------------------------------------------------------------- < '<' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [^\s>]+ any character except: whitespace (\n, \r, \t, \f, and " "), '>' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [^>]*? any character except: '>' (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- / '/' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- > '>' ---------------------------------------------------------------------- .*? any character (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- </ '</' ---------------------------------------------------------------------- \1 what was matched by capture \1 ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- > '>' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- =end pre Pretty nifty, but arguably a somewhat redundant and unfortunate default given the more useful I mode—no P!—which you can use to create a skeleton C-style commented regexp like the one at the end of this document. There's also a misleadingly named I mode, which is a sort of regular expression pretty printer: =begin pre % perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr%<([^\s>]+)(?:\s+[^>]*?)?(?:/|>.*?</\1)>%g)->explain' (?sx-im: < ( [^\s>]+ ) (?x: \s+ [^>]*? )? (?x: / | > .*? </ \1 ) > ) =end pre Although the POD notes that the module can parse some expressions passed as strings, this can fail, so you are better off passing everything through C first. Perhaps the largest drawback to the current version of the module, for those who can read it's English output at least, is that it does not include support for syntax added since 5.6 e.g;
  • Named captures—(?<yada>…)—instead of counting parentheses
  • P<2005-12-13|Regexp::Keep>—\K…—which is now core
  • Recursive patterns—(?№…)—like back references, but for the pattern rather than the match; both might benefit from having the relevant pattern included in the explanatory comments instead of a simple \№
Thankfully, Explain is aware of the easily confused positive/negative look-ahead/behind e.g; /(?<!foo)bar(?=quz)/ =begin code (?x-ims: # group, but do not capture (disregarding # whitespace and comments) (case-sensitive) # (with ^ and $ matching normally) (with . not # matching \n): (?More» =cut