Regular Expressions

Regular Expressions in Unix/Linux/Cygwin CS 162 UC-Irvine Some slides from Reva Freedman, Marty Stepp, Jessica Miller, and Ruth Anderson Regular Expression (RE) Formal Definition Basis: single character, a, is an RE, signifying language {a}. e is an RE, signifying language {e} is an RE, signifying language If E1 and E2 are REs, then E1|E2 is an RE, signifying L(E1) U L(E2) If E1 and E2 are REs, then E1E2 is an RE, signifying L(E1) L(E2), that is, concatenation If E is an RE, then E* is an RE, signifying L(E)*, that is, Kleene closure, which is the concatenation of 0 or more strings from L(E). Precedence is the the order of Kleene closure (highest), concatenation, and union (lowest) Parentheses can be used for grouping and dont count as characters. 2 egrep and regexes command egrep description extended grep; uses regexes in its search

patterns; equivalent to grep -E egrep "[0-9]{3}-[0-9]{3}-[0-9]{4}" contact.html egrep searches for a regular expression pattern in a file (or group of files) grep uses basic regular expressions instead of extended extended has some minor differences and additional metacharacters -i option before regex signifies a case-insensitive match egrep -i "mart" matches "Marty S", "smartie", "WALMART", ... Metacharacters RE Metacharacter . [a-z] * Matches Any one character, except new line Any one of the enclosed characters (e.g. a-z) Zero or more of preceding character ? or \? Zero or one of the preceding characters + or \+ One or more of the preceding characters any non-metacharacter matches itself

4 more Metacharacters RE Metacharacter Matches ^ beginning of line $ end of line \char Escape the meaning of char following it [^] One character not in the set \< Beginning of word anchor \> End of word anchor

( ) or \( \) Tags matched characters to be used later (max = 9) | or \| Or grouping x\{m\} Repetition of character x, m times (x,m = integer) x\{m,\} Repetition of character x, at least m times x\{m,n\} Repetition of character x between m and m times 5 Wildcards and anchors . (a dot) matches any character except \n ".oo.y" matches "Doocy", "goofy", "LooPy", ... use \. to literally match a dot . character ^ matches the beginning of a line; $ the end "^fi$" matches lines that consist entirely of fi

\< demands that pattern is the beginning of a word; \> demands that pattern is the end of a word "\" matches lines that contain the word "for Words are made up of letters, digits and _ (underscore) Special characters | means OR "abc|def|g" matches lines with "abc", "def", or "g" precedence of ^(Subject|Date) vs. ^Subject|Date: There's no AND symbol. () are for grouping "(Homer|Marge) Simpson" matches lines containing "Homer Simpson" or "Marge Simpson" \ starts an escape sequence many characters must be escaped to match them: / \ $ . [ ] ( ) ^ * + ? "\.\\n" matches lines containing ".\n" Quantifiers: * + ? * means 0 or more occurrences "abc*" matches "ab", "abc", "abcc", "abccc", ... "a(bc)*" matches "a", "abc", "abcbc", "abcbcbc", ... "a.*a" matches "aa", "aba", "a8qa", "a!?_a", ... + means 1 or more occurrences "a(bc)+" matches "abc", "abcbc", "abcbcbc", ...

"Goo+gle" matches "Google", "Gooogle", "Goooogle", ... ? means 0 or 1 occurrences "Martina?" matches lines with "Martin" or "Martina" "Dan(iel)?" matches lines with "Dan" or "Daniel" More quantifiers {min,max} means between min and max occurrences "a(bc){2,4}" matches "abcbc", "abcbcbc", or "abcbcbcbc" min or max may be omitted to specify any number "{2,}" means 2 or more "{,6}" means up to 6 "{3}" means exactly 3 Character sets [ ] group characters into a character set; will match any single character from the set "[bcd]art" matches strings containing "bart", "cart", and "dart" equivalent to "(b|c|d)art" but shorter inside [ ], most modifier keys act as normal characters "what[.!*?]*" matches "what", "what.", "what!", "what? **!", ... Character ranges inside a character set, specify a range of characters with "[a-z]" matches any lowercase letter "[a-zA-Z0-9]" matches any lower- or uppercase letter or digit an initial ^ inside a character set negates it "[^abcd]" matches any character other than a, b, c, or d

inside a character set, - can sometimes be tricky to match Try escaping it (use \) or place it last in the brackets "[+\-]?[0-9]+" matches optional + or -, followed by one digit POSIX Character Sets POSIX added newer, portable ways to describe character sets: Note that some people use [[:alpha:]] as a notation, but the outer '[...]' specifies a character set. 12 Anchors Anchors tell where the next character in the pattern must be located in the text data. 13 Concatenation Operator In a sequence operator, if a series of atoms are shown in a regular expression, there is no operator between them. 14 Alternation Operator: | or \| operator (| or \| ) is used to define one or more alternatives Note: depends on whether using egrep or grep 15

Repetition Operator: {} or \ {\} The repetition operator specifies that the atom or expression immediately before the repetition may be repeated. 16 Basic Repetition Forms 17 Short Form Repetition Operators: * + ? 18 Group Operator In the group operator, when a group of characters is enclosed in parentheses, the next operator applies to the whole group, not only the previous characters. Note: depends on egrep or grep - grep uses \( and \) 19 Grep detail and examples grep is family of commands grep (global regular expression print) common version

egrep (extended grep) understands extended REs (| + ? ( ) dont need backslash) fgrep (fast grep) understands only fixed strings, i.e., is faster 20 Commonly used grep options: -c Print only a count of matched lines. -i Ignore uppercase and lowercase distinctions. -l List all files that contain the specified pattern. -n Print matched lines and line numbers. -s Work silently; display nothing except error messages. Useful for checking the exit status. -v

Print lines that do not match the pattern. 21 Example: grep with pipe Pipe the output of the ls l command to grep and list/select only directory entries. Display the number of lines where the pattern was found. This does not mean the number of occurrences of the pattern. % ls -l | grep '^d' drwxr-xr-x 2 krush drwxr-xr-x 2 krush drwxr-xr-x 2 krush drwxr-xr-x 2 krush drwxr-xr-x 2 krush drwxr-xr-x 2 krush drwxr-xr-x 2 krush drwxr-xr-x 2 krush drwxr-xr-x 4 krush drwxr-xr-x 2 krush % ls -l | grep -c '^d' 10

csci csci csci csci csci csci csci csci csci csci 512 Feb 8 22:12 assignments 512 Feb 5 07:43 feb3 512 Feb 5 14:48 feb5 512 Dec 18 14:29 grades 512 Jan 18 13:41 jan13 512 Jan 18 13:17 jan15 512 Jan 18 13:43 jan20 512 Jan 24 19:37 jan22 512 Jan 30 17:00 jan27 512 Jan 29 15:03 jan29 22 Example: grep with \< \> % cat grep-datafile northwest NW Charles Main western

WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr. north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10 400000.00 440500.45

57800.10 455000.50 575500.70 Print the line if it contains the word north. % grep '\' grep-datafile north NO Ann Stephens 455000.50 23 Example: grep with a\|b % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway

eastern EA TB Savage northeast NE AM Main Jr. north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10 400000.00 440500.45 57800.10 455000.50 575500.70 Print the lines that contain either the expression NW or the expression EA % grep 'NW\|EA' grep-datafile northwest NW Charles Main eastern EA

TB Savage 300000.00 440500.45 Note: egrep works with | 24 Example: egrep with + % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr.

north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10 400000.00 440500.45 57800.10 455000.50 575500.70 Print all lines containing one or more 3's. % egrep '3+' grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass 300000.00

53000.89 290000.73 Note: grep works with \+ 25 Example: egrep with RE: ? % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr. north NO

Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10 400000.00 440500.45 57800.10 455000.50 575500.70 Print all lines containing a 2, followed by zero or one period, followed by a number. % egrep '2\.?[0-9]' grep-datafile southwest SW Lewis Dalsass 290000.73 Note: grep works with \? 26 Example: egrep with ( ) % cat grep-datafile northwest NW

Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr. north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10

400000.00 440500.45 57800.10 455000.50 575500.70 Print all lines containing one or more consecutive occurrences of the pattern no. % egrep '(no)+' northwest northeast north grep-datafile NW Charles Main NE AM Main Jr. NO Ann Stephens 300000.00 57800.10 455000.50 Note: grep works with \( \) \+ 27 Example: egrep with (a|b) % cat grep-datafile northwest NW

Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr. north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10

400000.00 440500.45 57800.10 455000.50 575500.70 Print all lines containing the uppercase letter S, followed by either h or u. % egrep 'S(h|u)' grep-datafile western WE Sharon Gray southern SO Suan Chin 53000.89 54500.10 Note: grep works with \( \) \| 28 Example: fgrep % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW

Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr. north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10 400000.00 440500.45 57800.10 455000.50 575500.70

Find all lines in the file containing the literal string [A-Z]****[0-9]..$5.00. All characters are treated as themselves. There are no special characters. % fgrep '[A-Z]****[0-9]..$5.00' grep-datafile Extra [A-Z]****[0-9]..$5.00 29 Example: Grep with ^ % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr. north

NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10 400000.00 440500.45 57800.10 455000.50 575500.70 Print all lines beginning with the letter n. % grep '^n' grep-datafile northwest NW Charles Main northeast NE AM Main Jr. north NO Ann Stephens 300000.00 57800.10

455000.50 30 Example: grep with $ % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr. north NO Ann Stephens central

CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10 400000.00 440500.45 57800.10 455000.50 575500.70 Print all lines ending with a period and exactly two zero numbers. % grep '\.00$' grep-datafile northwest NW Charles Main southeast SE Patricia Hemenway Extra [A-Z]****[0-9]..$5.00 300000.00 400000.00 31 Example: grep with \char % cat grep-datafile

northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr. north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89

290000.73 54500.10 400000.00 440500.45 57800.10 455000.50 575500.70 Print all lines containing the number 5, followed by a literal period and any single character. % grep '5\..' grep-datafile Extra [A-Z]****[0-9]..$5.00 32 Example: grep with [ ] % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE

Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr. north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10 400000.00 440500.45 57800.10 455000.50 575500.70 Print all lines beginning with either a w or an e. % grep '^[we]' grep-datafile western WE Sharon Gray eastern

EA TB Savage 53000.89 440500.45 33 Example: grep with [^] % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr.

north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10 400000.00 440500.45 57800.10 455000.50 575500.70 Print all lines ending with a period and exactly two non-zero numbers. % grep '\.[^0][^0]$' grep-datafile western WE Sharon Gray southwest SW Lewis Dalsass eastern EA TB Savage 53000.89

290000.73 440500.45 34 Example: grep with x\{m\} % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr. north NO Ann Stephens

central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10 400000.00 440500.45 57800.10 455000.50 575500.70 Print all lines where there are at least six consecutive numbers followed by a period. % grep '[0-9]\{6\}\.' grep-datafile northwest NW Charles Main southwest SW Lewis Dalsass southeast SE Patricia Hemenway eastern EA TB Savage north NO

Ann Stephens central CT KRush 300000.00 290000.73 400000.00 440500.45 455000.50 575500.70 35 Example: grep with \< % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern

EA TB Savage northeast NE AM Main Jr. north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 300000.00 53000.89 290000.73 54500.10 400000.00 440500.45 57800.10 455000.50 575500.70 Print all lines containing a word starting with north. % grep '\

north NO Ann Stephens 300000.00 57800.10 455000.50 36 Example: egrep with linux.words /usr/share/dict % egrep -i '^x.*x$' linux.words Xerox xerox xix xx xxx xylanthrax Print all words that begin and end with x. 37 Example: egrep with linux.words /usr/share/dict % egrep '.*sex.*' linux.words | wc 325 325 3564 Counts all words that have sex as a substring Some of the 325 words: Essexville misexample

misexecute misexecution misexpectation misexpend misexpenditure misexplain misexplained misexplanation 38 Example: egrep with linux.words /usr/share/dict % egrep '.*b.*b.*b.*b.*' linux.words Lists words that have at least 4 bs in them Some of the 25 words: beerbibber bibble-babble blood-bedabbled bubble-bow bubblebow bubbybush bumblebomb double-bubble flibbertigibbet flibbertigibbets flibbertigibbety gibble-gabble gibblegabble gibble-gabbler gibblegabbler hubble-bubble

39

Recently Viewed Presentations

  • The Rock Cycle

    The Rock Cycle

    Middle School Science Nature and Technology - Changing the Earth SCoPE SC070504 * A possible sequence involves the crystallization of magma to form igneous rocks that are then broken down to sediments (clasts) as a result of weathering, the sediments...
  • Types of Advocacy John Lord May 31, 2004

    Types of Advocacy John Lord May 31, 2004

    Rights - concerned with law & social structures (e.g. ODA) Participation - concerned with move to inclusion, citizenship, & involvement in recovery (e.g. Individualized funding) Power - concerned with shifting power to families & individuals & distributing valued resources more...
  • Who Wants To Be A Millionaire? - The Teacher&#x27;s Guide

    Who Wants To Be A Millionaire? - The Teacher's Guide

    Who Wants To Be A Millionaire? Henry and Mudge Under the Yellow Moon Question 1 $100 What kind of story is this? A fiction story B play C fable D true story What kind of story is this? ... A...
  • Evolution of the Atmosphere: The Biological Connection The

    Evolution of the Atmosphere: The Biological Connection The

    VOLATILES OUTGASSED TO FORM OCEANS & ATMOSPHERE N2 (molecular nitrogen) primarily derived from volcanic outgassing. CO2 initially "scrubbed" from atmosphere through weathering and later biological activity (cooling effect) H2O condensed into oceans upon atmospheric cooling, some remained in ...
  • Fluids, electrolytes, nutrition in surgery

    Fluids, electrolytes, nutrition in surgery

    Fluids, electrolytes, nutrition in surgery. PSGS Review. Bonaventure Plaza, Greenhills, San Juan. 3-4 PM; April 27, 2012
  • Nagios and Kentix System Partners - Critical Monitoring ...

    Nagios and Kentix System Partners - Critical Monitoring ...

    E-Mail and SMS-Notification will now work for the AlarmManager host and its defined services. To test both ways of notification, set one of the values in your services configuration file (services.cfg) to a. critical level, and restart nagios again. After...
  • SisNANO - DataSebrae

    SisNANO - DataSebrae

    Agosto 2012 - Chamada Pública para os primeiros laboratórios do SisNANO. 50 propostas recebidas - 26 laboratórios selecionados. O SisNANO é um dos principais programas da IBN e está sob a responsabilidade da Coordenação-Geral de Desenvolvimento e Inovação em Tecnologias...
  • Kaasaegne sotsioloogia - Pärnu Sütevaka Humanitaargümnaasium

    Kaasaegne sotsioloogia - Pärnu Sütevaka Humanitaargümnaasium

    * * * * * * * * Autopoiesis, kreeka keeles 'auto'- ise, 'poiesis' - loomine, tähistab fundamentaalset dialektikat struktuuri ja funktsiooni vahel. Esimesena kasutasid seda väljendit Tšiili bioloogid Humberto Maturana ja Francisco Varela 1973. aastal. Seda väljendit kasutati alguses...