|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
Softpanorama Search
|
Prev | Up | Contents | Down | Next
This chapter has been a long one primarily because 1) regular expressions are powerful, 2) regular expressions are heavily integrated into the rest of the language, and 3) people get confused about regular expressions and seldom use them correctly. If you learn their secrets, they make your life a lot easier, especially if you deal with large amounts of data.
| Meta Character |
Description |
|---|---|
| ^ | This meta-character - the caret - will match the beginning of a string or if the /m option is used, matches the beginning of a line. It is one of two pattern anchors - the other anchor is the $. |
| . | This meta-character will match any character except for the newline unless the /s option is specified. If the /s option is specified, then the newline will also be matched. |
| $ | This meta-character will match the end of a string or if the /m option is used, matches the end of a line. It is one of two pattern anchors - the other anchor is the ^. |
| | | This meta-character - called alternation - lets you specify two values that can cause the match to succeed. For instance, m/a|b/ means that the $_ variable must contain the "a" or "b" character for the match to succeed. |
| * | This meta-character indicates that the "thing" immediately to the left should be matched 0 or more times in order to be evaluated as true. |
| + | This meta-character indicates that the "thing" immediately to the left should be matched 1 or more times in order to be evaluated as true. |
| ? | This meta-character indicates that the "thing" immediately to the left should be matched 0 or 1 times in order to be evaluated as true. When used in conjunction with the +, _, ?, or {n, m} meta- characters and brackets, it means that the regular expression should be non-greedy and match the smallest possible string. |
| Meta Brackets |
Description |
|---|---|
| () | The parentheses let you affect the order of pattern evaluation and act as a form of pattern memory. See the section "Pattern Memory" later in this chapter for more information. |
| (?...) | If a question mark immediately follows the left parentheses, it indicates that an extended mode component is being specified. |
| {n, m} | The curly braces let specify how many times the "thing" immediately to the left should be matched. {n} means that it should be matched exactly n times. {n,} means it must be matched at least n times. {n, m} means that it must be matched at least n times and not more than m times. |
| [] | The square brackets let you create a character class. For instance, m/[abc]/ will evaluate to true if any of "a", "b", or "c" is contained in $_. The square brackets are a more readable alternative to the alternation meta-character. |
| Meta Sequences |
Description |
|---|---|
| \ | This meta-character "escapes" the following character. This means that any special meaning normally attached to that character is ignored. For instance, if you need to include a dollar sign in a pattern, you must use \$ to avoid Perl's variable interpolation. Use \\ to specify the backslash character in your pattern. |
| \0nnn | Any Octal byte. |
| \A | This meta-sequence represents the beginning of the string. Its meaning is not affected by the /m option. |
| \b | This meta-sequence represents the backspace character inside a character class; otherwise, it represents a word boundary. A word boundary is the spot between word (\w) and non-word(\W) characters. Perl thinks that the \W meta-sequence matches the imaginary characters off the ends of the string. |
| \B | Match a non-word boundary. |
| \cn | Any control character. |
| \d | Match a single digit character. |
| \D | Match a single non-digit character. |
| \e | Escape. |
| \E | Terminate the \L or \U sequence. |
| \f | Form Feed. |
| \G | Match only where the previous m//g left off. |
| \l | Change the next character to lowercase. |
| \L | Change the following characters to lowercase until a \E sequence is encountered. |
| \n | Newline. |
| \Q | Quote Regular Expression meta-characters literally until the \E sequence is encountered. |
| \r | Carriage Return. |
| \s | Match a single whitespace character. |
| \S | Match a single non-whitespace character. |
| \t | Tab. |
| \u | Change the next character to uppercase. |
| \U | Change the following characters to uppercase until a \E sequence is encountered. |
| \v | Vertical Tab. |
| \w | Match a single word character. Word characters are the alphanumeric and underscore characters. |
| \W | Match a single non-word character. |
| \xnn | Any Hexadecimal byte. |
| \Z | This meta-sequence represents the end of the string. Its meaning is not affected by the /m option. |
Regular Expressions form almost a 'language within a language' in Perl. As you can see above, they can be fairly involved, and (lets face it) if you are not familiar with them now, you are not going to learn them without practice. Therefore, we suggest the following path for learning regular expressions.
| Option | Description |
|---|---|
| g | This option finds all occurrences of the pattern in the string. You can iterate over the matches using a loop statement or put result into array |
| i | This option ignores the case of characters in the string. |
| m | This option treats the string as multiple lines. Perl does some optimization by assuming that $_ contains a single line of input. If you know that it contains multiple newline characters, use this option to turn off the optimization. |
| o | This option compiles the pattern only once. You can achieve some small performance gains with this option. It should be used with variable interpolation only when the value of the variable will not change during the lifetime of the program. |
| s | This option treats the string as a single line. |
| x | This option lets you use extended regular expressions. Basically, this means that Perl will ignore whitespace that's not escaped with a backslash or within a character class. I highly recommend this option so you can use spaces to make your regular expressions more readable. See the section "Example: Extension Syntax" later in this chapter for more information. |
| Option | Description |
|---|---|
| g | This option finds all occurrences of the pattern in the string. You can iterate over the matches using a loop statement or put result into array |
| i | This option ignores the case of characters in the string. |
| m | This option treats the string as multiple lines. Perl does some optimization by assuming that $_ contains a single line of input. If you know that it contains multiple newline characters, use this option to turn off the optimization. |
| o | This option compiles the pattern only once. You can achieve some small performance gains with this option. It should be used with variable interpolation only when the value of the variable will not change during the lifetime of the program. |
| s | This option treats the string as a single line. |
| x | This option lets you use extended regular expressions. Basically, this means that Perl will ignore whitespace that's not escaped with a backslash or within a character class. I highly recommend this option so you can use spaces to make your regular expressions more readable. See the section "Example: Extension Syntax" later in this chapter for more information. |
Learn well the principles in this chapter. In order to use regular expressions effectively, you need first to learn to construct simple expressions or you will be spinning your wheels quite a lot. Then try to modify more complex from books and scripts on the Net and only then begin to construct complex regular expressions yourself. . You really need to work from simple to complicated. Let your knowledge about regular expressions grow naturally. Start with simple regular expressions, and as your knowledge grows, let the complexity of your regular expressions grow.
Be sure to check out the documentation in perlref. This chapter is pretty superficial and contains material that just will get you started and give you some skill in regular expression manipulation. there is much more material here that we can cover in the introduction.
Prev | Up | Contents | Down | Next
Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Disclaimer:
Last modified: September 05, 2009