Ruby正则表达式
正则表达式是一个特殊的字符序列可以帮助匹配或者找到其他字符串或串套,使用的模式保持一个专门的语法。
正则表达式文本是一个模式之间的斜线之间或任意分隔符 %r 如下:
语法:
/pattern/ /pattern/im # option can be specified %r!/usr/local! # general delimited regular expression
例如:
#!/usr/bin/ruby line1 = "Cats are smarter than dogs"; line2 = "Dogs also like meat"; if ( line1 =~ /Cats(.*)/ ) puts "Line1 starts with Cats" end if ( line2 =~ /Cats(.*)/ ) puts "Line2 starts with Dogs" end
这将产生以下结果:
Line1 starts with Cats
正则表达式修饰符:
正则表达式的文字可以包括一个可选的修饰符来控制各方面的匹配。修改指定第二个斜杠字符后,如前面所示,可表示为这些字符之一:
修饰符 | 描述 |
---|---|
i | Ignore case when matching text. |
o | Perform #{} interpolations only once, the first time the regexp literal is evaluated. |
x | Ignores whitespace and allows comments in regular expressions |
m | Matches multiple lines, recognizing newlines as normal characters |
u,e,s,n | Interpret the regexp as Unicode (UTF-8), EUC, SJIS, or ASCII. If none of these modifiers is specified, the regular expression is assumed to use the source encoding. |
%Q分隔字符串文字一样,Ruby允许正则表达式带 %r,然后由所选择的定界符。这是非常有用的,当所描述的模式中包含正斜杠字符不希望转义:
# Following matches a single slash character, no escape required %r|/| # Flag characters are allowed with this syntax, too %r[</(.*)>]i
正则表达式模式:
除控制字符, (+ ? . * ^ $ ( ) [ ] { } | ), 所有字符匹配。可以转义控制字符前面加上反斜线。
下表列出了可在Ruby的正则表达式语法。
模式 | 描述 |
---|---|
^ | Matches beginning of line. |
$ | Matches end of line. |
. | Matches any single character except newline. Using m option allows it to match newline as well. |
[...] | Matches any single character in brackets. |
[^...] | Matches any single character not in brackets |
re* | Matches 0 or more occurrences of preceding expression. |
re+ | Matches 1 or more occurrence of preceding expression. |
re? | Matches 0 or 1 occurrence of preceding expression. |
re{ n} | Matches exactly n number of occurrences of preceding expression. |
re{ n,} | Matches n or more occurrences of preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of preceding expression. |
a| b | Matches either a or b. |
(re) | Groups regular expressions and remembers matched text. |
(?imx) | Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
(?-imx) | Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
(?: re) | Groups regular expressions without remembering matched text. |
(?imx: re) | Temporarily toggles on i, m, or x options within parentheses. |
(?-imx: re) | Temporarily toggles off i, m, or x options within parentheses. |
(?#...) | Comment. |
(?= re) | Specifies position using a pattern. Doesn't have a range. |
(?! re) | Specifies position using pattern negation. Doesn't have a range. |
(?> re) | Matches independent pattern without backtracking. |
w | Matches word characters. |
W | Matches nonword characters. |
s | Matches whitespace. Equivalent to [ f]. |
S | Matches nonwhitespace. |
d | Matches digits. Equivalent to [0-9]. |
D | Matches nondigits. |
A | Matches beginning of string. |