{"id":14860,"date":"2019-05-28T19:22:29","date_gmt":"2019-05-28T17:22:29","guid":{"rendered":"https:\/\/www.roumazeilles.net\/news\/en\/wordpress\/?page_id=14860"},"modified":"2020-11-24T21:27:11","modified_gmt":"2020-11-24T19:27:11","slug":"regular-expressions","status":"publish","type":"page","link":"https:\/\/www.roumazeilles.net\/news\/en\/wordpress\/shareware-home\/regular-expressions\/","title":{"rendered":"Regular expressions"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\">Search regular expressions<\/h3>\n\n\n\n<p>The regular expression routines of the YGrep Search Engine support a full range of Unix regular expressions as defined in ed(1) and in grep(1) . They are also very similar to regular expressions provided by the Emacs or MicroEmacs text editor or Borland utility GREP.COM utility.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Specification<\/h4>\n\n\n\n<figure class=\"wp-block-table\">      <TABLE WIDTH=75% align=\"center\" class=\"std_box\">\n        <TR> \n          <TD>| <\/TD>\n          <TD>A vertical bar between expressions forces matches onto either the \n            first expressions OR the second expression. Up to 10 of these can \n            be combined. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>&#038; <\/TD>\n          <TD>An ampersand between expressions forces matches onto both the first \n            expression AND the second expression. Up to 10 of these can be combined. \n          <\/TD>\n        <\/TR>\n        <TR> \n          <TD>^ <\/TD>\n          <TD>A circumflex as the first character of the pattern forces matches \n            to beginning of lines. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>$ <\/TD>\n          <TD>A dollar as the last character of the pattern forces matches to \n            end of lines. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>. <\/TD>\n          <TD>A period anywhere in the string matches any single character. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>* <\/TD>\n          <TD>An expression followed by an asterisk matches zero or more occurrences \n            of that expression. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>+ <\/TD>\n          <TD>An expression followed by a plus sign matches one or more occurrences \n            of that expression. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>&#8211; <\/TD>\n          <TD>An expression followed by a minus sign optionally matches that expression \n            (matches zero or one occurrence of that expression). <\/TD>\n        <\/TR>\n        <TR> \n          <TD>{} <\/TD>\n          <TD>An expression followed by a number N in curly braces matches the \n            expression N times.<\/TD>\n        <\/TR>\n        <TR> \n          <TD>{} <\/TD>\n          <TD>An expression followed by two numbers M and N separated by a comma \n            in curly braces matches the expression M to N times. See the examples.<\/TD>\n        <\/TR>\n        <TR> \n          <TD>[] <\/TD>\n          <TD>A string enclosed in square brackets matches any character in that \n            string, but no others. If the first character of the string is a circumflex \n            the expression matches any character except the characters in the \n            string. A range of characters may be specified by two characters separated \n            by a -. These are known as character classes. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\< <\/TD>\n          <TD>A backslash followed by an opening < matches the beginning of a \n            word. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\> <\/TD>\n          <TD>A backslash followed by a closing > matches the end of a word. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\( <\/TD>\n          <TD>A backslash followed by an opening ( describes the beginning of \n            a tagged sub-expression (see Search and replace regular expressions, \n            it has no effect on search-only expressions). No more than 9 sub-expressions \n            are allowed by the YGrep Search Engine. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\) <\/TD>\n          <TD>A backslash followed by a closing ) describes the end of a tagged \n            sub-expression (see Search and replace regular expressions, it has \n            no effect on search-only expressions). No more than 9 sub-expressions \n            are allowed by the YGrep Search Engine. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\b <\/TD>\n          <TD>A backslash followed by a letter &#8216;b&#8217; matches the backspace character \n            (ASCII code 8). <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\n <\/TD>\n          <TD>A backslash followed by a letter &#8216;n&#8217; matches the newline character \n            (ASCII code 10). <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\f <\/TD>\n          <TD>A backslash followed by a letter &#8216;f&#8217; matches the form-feed character \n            (ASCII code 12). <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\r <\/TD>\n          <TD>A backslash followed by a letter &#8216;r&#8217; matches the carriage return \n            character (ASCII code 13). <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\t <\/TD>\n          <TD>A backslash followed by a letter &#8216;t&#8217; matches the horizontal tab \n            character (ASCII code 9). <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\x00 <\/TD>\n          <TD>A backslash followed by a letter &#8216;x&#8217; and a hexadecimal code matches \n            the character with that hexadecimal ASCII code. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\ <\/TD>\n          <TD>A backslash followed by any other character quotes that character. \n            This allows a search for a character that is usually a regular expression \n            specifier. <\/TD>\n        <\/TR>\n      <\/TABLE>\n   <\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Important note to the users of the C language:<\/h4>\n\n\n\n<p>C programmers should remember that in C strings, a backslash is a special character and it should be \ufffddoubled\ufffd in order to be inserted in the C-declared constant strings. The external user will use the backslash character as described here. For example in a C source program the expression &#8220;\\&lt;hello\\&gt;&#8221; should be written as the C string &#8220;\\\\&lt;hello\\\\&gt;&#8221;. The error is very common (I do it myself at regular intervals) and may make you scratch you head in front of a bizarre bug.<\/p>\n\n\n\n<p>If an enclosure must contain either the dash (&#8216;-&#8216;) or the closing bracket (&#8216;]&#8217;), these characters must appear at the beginning of the enclosure list like in &#8220;[]-]&#8221;. Please note that &#8216;]&#8217; must appear before &#8216;-&#8216;.<ins><\/ins><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Examples<\/h4>\n\n\n\n<figure class=\"wp-block-table\">      <TABLE CELLPADDING=0 CELLSPACING=0 WIDTH=75% BORDER=1 align=\"center\">\n        <TR> \n          <TD>foo|bar|toto <\/TD>\n          <TD>matches one of the words: foo, or bar, or toto <\/TD>\n        <\/TR>\n        <TR> \n          <TD>foo&#038;bar&#038;toto <\/TD>\n          <TD>matches all of the words: foo, and bar, and toto <\/TD>\n        <\/TR>\n        <TR> \n          <TD>^Windows <\/TD>\n          <TD>matches all lines starting with Windows <\/TD>\n        <\/TR>\n        <TR> \n          <TD>Grep$ <\/TD>\n          <TD>matches all lines ending with Grep <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\$ <\/TD>\n          <TD>matches a dollar sign <\/TD>\n        <\/TR>\n        <TR> \n          <TD>H..p <\/TD>\n          <TD>matches Help, Hoop, Harp, etc. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>H.*p <\/TD>\n          <TD>matches Help, Hoop, Harp, etc. but also fragments beginning with \n            H and finishing with p like Heeellllp, Holy crop, Halt and stop, etc. \n          <\/TD>\n        <\/TR>\n        <TR> \n          <TD>^W.n <\/TD>\n          <TD>matches all lines starting with Win, Wan, Won, etc. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>fo* <\/TD>\n          <TD>matches f, fo, foo, etc. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>fo+ <\/TD>\n          <TD>matches fo, foo, etc. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>fo-<\/TD>\n          <TD>matches f, fo, but not foo, fooo, etc. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>[xyz] <\/TD>\n          <TD>matches x, y and z <\/TD>\n        <\/TR>\n        <TR> \n          <TD>a[^xyz]c <\/TD>\n          <TD>matches abc, arc and aXb but not axb <\/TD>\n        <\/TR>\n        <TR> \n          <TD>([0-9]) <\/TD>\n          <TD>matches (0), (1), (2), (3), (4), (5), (6), (7), (8) and (9) <\/TD>\n        <\/TR>\n        <TR> \n          <TD>([0-9]*) <\/TD>\n          <TD>matches (), (0), (123), (2512), etc. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\<[Aa].*\\> <\/TD>\n          <TD>matches any non-empty word beginning with either a or A like A, \n            Ab, Abc, a, abC, etc. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>a{2} <\/TD>\n          <TD>matches two (2) characters a <\/TD>\n        <\/TR>\n        <TR> \n          <TD>a{2,4} <\/TD>\n          <TD>matches 2, 3 or 4 characters a <\/TD>\n        <\/TR>\n      <\/TABLE><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">About the order of precedence of the | and &amp; operators:<\/h4>\n\n\n\n<p>You should be aware of the precedence that is implied by the Boolean operators inside a regular expression. The expression is &#8220;explored&#8221; from the beginning (from the left). The precedence can be easily considered recognized by saying that a YGrep Search Engine regular expression containing Boolean operators is implicitly parenthesized to the end (to the right). For example:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Expr1 | Expr2 &amp; Expr3 &amp; Expr4 | Expr5<\/pre>\n\n\n\n<p>is equivalent to the grouped theoretical expression:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Expr1 | ( Expr2 &amp; ( Expr3 &amp; ( Expr4 | ( Expr5 ))))<\/pre>\n\n\n\n<p>Consequently, the exact order of the sub-expressions can be quite important to reach the exact objective of the programmer.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">What is the matched with &amp; operator:<\/h4>\n\n\n\n<p>The matched string (as returned by the TagStart[0] and TagEnd[0] of the RGrep() RGREPINFO returned value) is a little specific when you use the AND-operator (&amp;) in a regular expression. As a matter of fact, the matched string is the shortest string containing all of the matched string for each of the individual searches.<\/p>\n\n\n\n<p>For example, foo&amp;bar will match on the following line:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Test1 bar schwoop foo test2<\/pre>\n\n\n\n<p>The matched string returned in (as returned by the TagStart[0] and TagEnd[0] of the RGrep() RGREPINFO returned value) is the string containing &#8220;bar schwoop foo&#8221;.<\/p>\n\n\n\n<p>Even if this has no specific relation to how it is implemented, in this simple example, the expression foo&amp;bar was equivalent to bar.*foo|foo.*bar, and both match the same string.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Search-and-Replace regular expressions<\/h3>\n\n\n\n<p>The YGrep Search Engine regular expression substitution routines support a small set of expressions to define how the substitution will be performed.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Specification<\/h4>\n\n\n\n<figure class=\"wp-block-table\">      <TABLE CELLPADDING=0 CELLSPACING=0 WIDTH=75% BORDER=1 align=\"center\">\n        <TR> \n          <TD>&#038; <\/TD>\n          <TD>An ampersand in the substituted string forces insertion of the full \n            matched pattern. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\number <\/TD>\n          <TD>A backslash followed by a number (between 1 and 9) forces the insertion \n            of the tag matched with the equivalent number in the pattern. <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\0 <\/TD>\n          <TD>A backslash followed by a 0 forces the insertion of the the full \n            matched pattern (like &#038;). <\/TD>\n        <\/TR>\n        <TR> \n          <TD>\\&#038; <\/TD>\n          <TD>is an escape sequence to allow the insertion of the &#038; character \n            (while removing its matched pattern meaning). <\/TD>\n        <\/TR>\n      <\/TABLE><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Examples<\/h4>\n\n\n\n<figure class=\"wp-block-table\">Search Pattern\tReplace Pattern\tSubstitution\nWindows\tMS-&#038;\treplaces all occurrences of the simple word Windows with MS-Windows\n\\(dows\\)\\([Ww]in\\)\t\\2\\1\tallows to reorder the pattern dowsWin into the normal Windows regardless of the letter-casing of the W in the beginning of the word<\/figure>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Other interesting links<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">General links<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>The original page describing the&nbsp;<a href=\"https:\/\/www.roumazeilles.net\/ygrep\/ygrep.php\">YGrep Search Engine<\/a>.<\/li><li><strong>Syntax<\/strong>&nbsp;of YGrep Search Engine&nbsp;<a href=\"https:\/\/www.roumazeilles.net\/ygrep\/regex.php\">regular expressions<\/a>.<\/li><li><strong><a href=\"https:\/\/www.roumazeilles.net\/ygrep\/ygrepapp.php\">Applications<\/a><\/strong>&nbsp;of the YGrep Search Engine.<\/li><li><a href=\"https:\/\/www.roumazeilles.net\/ygrep\/sh_clusterv.php\">ClusterView<\/a>&nbsp;is an application using the YGrep Search Engine to search in Windows files.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Internationalization-related links<\/h3>\n\n\n\n<p>MSDN resources:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.amazon.com\/exec\/obidos\/ASIN\/1556158408\/yvesroumazeilles\">Developing International Software for Windows 95 and Windows NT<\/a><\/li><\/ul>\n\n\n\n<p>Non-MSDN resources:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.lisa.unige.ch\/\">Localisation Industry Standard Association<\/a>&nbsp;&#8211; of uncertain quality<\/li><\/ul>\n\n\n\n<p><ins><\/ins><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Books of potential interest to the reader<\/h2>\n\n\n\n<p>I have found a few book you may want to buy from&nbsp;<a href=\"https:\/\/www.amazon.com\/exec\/obidos\/redirect-home\/yvesroumazeilles\">Amazon.com<\/a>.<a href=\"https:\/\/www.amazon.com\/exec\/obidos\/redirect-home\/yvesroumazeilles\"><\/a><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.amazon.com\/exec\/obidos\/ASIN\/0471578053\/yvesroumazeilles\">Obfuscated C and Other Mysteries\/Book-Disk (Wiley Professional Computing)<\/a>&nbsp;by Don Libes, is a nice document for all those who wish they could be a good programmer in C and can afford a good dose of fun with bizarre C code from others. I love it!<\/li><li><img decoding=\"async\" src=\"https:\/\/www.amazon.com\/images\/P\/157231995X.01.MZZZZZZZ.gif\" alt=\"Programming Windows\"><a href=\"https:\/\/www.amazon.com\/exec\/obidos\/ASIN\/157231995X\/yvesroumazeilles\">Programming Windows<\/a>&nbsp;by Charles Petzold, is the basic book for anyone willing to program on the Microsoft Operating System. You really can&#8217;t avoid reading it.<\/li><li><a href=\"https:\/\/www.amazon.com\/exec\/obidos\/ASIN\/0471551724\/yvesroumazeilles\">Advanced Windows Programming (Wiley Professional Computing)<\/a>&nbsp;by Martin Heller, is &#8211; by far &#8211; my preferred book about writing real Windows applications. It may be a little old, but it still contains all the&nbsp;<strong>really<\/strong>&nbsp;important issues.<\/li><li><img decoding=\"async\" src=\"https:\/\/www.amazon.com\/images\/P\/1565922573.01.MZZZZZZZ.gif\" alt=\"Mastering regexp\"><a href=\"https:\/\/www.amazon.com\/exec\/obidos\/ASIN\/1565922573\/yvesroumazeilles\">Mastering Regular Expressions : Powerful Techniques for Perl and Other Tools (Nutshell Handbook)<\/a>&nbsp;by Jeffrey E. F. Friedl, Andy Oram (Editor), is a very useful document for all users of&nbsp;<a href=\"https:\/\/www.roumazeilles.net\/ygrep\/regex.php\">regular expressions<\/a>.<\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Search regular expressions The regular expression routines of the YGrep Search Engine support a full range of Unix regular expressions as defined in ed(1) and in grep(1) . They are also very similar to regular expressions provided by the Emacs or MicroEmacs text editor or Borland utility GREP.COM utility. Specification | A vertical bar between [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":14866,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-14860","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.roumazeilles.net\/news\/en\/wordpress\/wp-json\/wp\/v2\/pages\/14860","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.roumazeilles.net\/news\/en\/wordpress\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.roumazeilles.net\/news\/en\/wordpress\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.roumazeilles.net\/news\/en\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.roumazeilles.net\/news\/en\/wordpress\/wp-json\/wp\/v2\/comments?post=14860"}],"version-history":[{"count":0,"href":"https:\/\/www.roumazeilles.net\/news\/en\/wordpress\/wp-json\/wp\/v2\/pages\/14860\/revisions"}],"up":[{"embeddable":true,"href":"https:\/\/www.roumazeilles.net\/news\/en\/wordpress\/wp-json\/wp\/v2\/pages\/14866"}],"wp:attachment":[{"href":"https:\/\/www.roumazeilles.net\/news\/en\/wordpress\/wp-json\/wp\/v2\/media?parent=14860"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}