Difference between revisions of "Regular expressions"
From TED Notepad
Line 1: | Line 1: | ||
<noinclude>{{manversion|6.0.0.17}}__NOTOC__</noinclude> | <noinclude>{{manversion|6.0.0.17}}__NOTOC__</noinclude> | ||
− | ===== | + | =====Basics and escape sequences===== |
− | + | Following constructs match specific characters at positions at which they are encountered. | |
− | |||
− | |||
− | |||
− | |||
− | |||
:; {{string|.}} | :; {{string|.}} | ||
Line 36: | Line 31: | ||
:; {{string|\Q}} ''...'' {{string|\E}} | :; {{string|\Q}} ''...'' {{string|\E}} | ||
:: Quoted string. Anything between {{string|\Q}} and {{string|\E}} is treated as plain-text string and is matched exactly as it appears in the pattern. | :: Quoted string. Anything between {{string|\Q}} and {{string|\E}} is treated as plain-text string and is matched exactly as it appears in the pattern. | ||
+ | |||
+ | =====Zero-length assertions===== | ||
+ | |||
+ | Following zero-length pattern conditions do not match any specific characters, they only assert that a specific condition is met at the position at which they are encountered. | ||
+ | |||
+ | :; {{string|^}} | ||
+ | :: Matches only at line beginnings. | ||
+ | :; {{string|$}} | ||
+ | :: Matches only at line ends. | ||
+ | |||
+ | :; {{string|\p}} | ||
+ | :: Matches only at paragraph beginnings. | ||
+ | :; {{string|\P}} | ||
+ | :: Matches only at paragraph ends. | ||
+ | |||
+ | :; {{string|\A}} | ||
+ | :: Matches only at document beginning. | ||
+ | :; {{string|\Z}} | ||
+ | :: Matches only at document end. | ||
+ | |||
+ | :; {{string|\b}} | ||
+ | :: Matches only at word boundary, i.e. one of the characters around the current matching position must be a {{defined|word character}} and the other may not. | ||
+ | :; {{string|\B}} | ||
+ | :: Matches only inside a word, i.e. both characters around the current matching position must be {{defined|word characters}}. | ||
+ | :; {{string|\y}} | ||
+ | :: Matches only at word beginning, i.e. the second of the characters around the current matching position must be a {{defined|word character}} and the first one may not. | ||
+ | :; {{string|\Y}} | ||
+ | :: Matches only at word end, i.e. the first of the characters around the current matching position must be a {{defined|word character}} and the second one may not. | ||
+ | |||
+ | :; {{string|\G}} | ||
+ | :: Matches only at the original starting position. Guarantees that only the position within the document, where the search started, is matched at by this construct. Starting position is usually the one with the caret before the search, also indicated by the {{feature|Status Bar}}. | ||
=====Capture groups===== | =====Capture groups===== | ||
Line 66: | Line 92: | ||
/* \D any non-digit character */ | /* \D any non-digit character */ | ||
/* */ | /* */ | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
/* */ | /* */ | ||
/* \K removes the match left of the current position from \0 */ | /* \K removes the match left of the current position from \0 */ | ||
Line 165: | Line 179: | ||
--> | --> | ||
− | ===== | + | =====More escape sequences===== |
Since many characters have special meanings in regular expressions, escapes are provided to allow using these characters in searches. | Since many characters have special meanings in regular expressions, escapes are provided to allow using these characters in searches. |
Revision as of 13:15, 1 August 2011
This section is up to date for TED Notepad version 6.3.1.0.
Basics and escape sequences
Following constructs match specific characters at positions at which they are encountered.
- .
- Matches any single character (except for newline).
- \n
- Matches one newline sequence (CR, NL or CR/NL)
- \t
- Matches a horizontal tab character (TAB).
- \f
- Matches a form feed character (FF).
- \a
- Matches a bell character (BEL).
- \v
- Matches a vertical tab (VT).
- \e
- Matches an escape character (ESC).
- \0
- Matches a null character (NUL).
- \xhh
- Matches a character in hex notation.
- \uhhhh
- Matches a character in unicode notation (unicode version only).
- \cA
- Matches a character in control notation.
- \Q ... \E
- Quoted string. Anything between \Q and \E is treated as plain-text string and is matched exactly as it appears in the pattern.
Zero-length assertions
Following zero-length pattern conditions do not match any specific characters, they only assert that a specific condition is met at the position at which they are encountered.
- ^
- Matches only at line beginnings.
- $
- Matches only at line ends.
- \p
- Matches only at paragraph beginnings.
- \P
- Matches only at paragraph ends.
- \A
- Matches only at document beginning.
- \Z
- Matches only at document end.
- \b
- Matches only at word boundary, i.e. one of the characters around the current matching position must be a
word character
and the other may not. - \B
- Matches only inside a word, i.e. both characters around the current matching position must be
word characters
. - \y
- Matches only at word beginning, i.e. the second of the characters around the current matching position must be a
word character
and the first one may not. - \Y
- Matches only at word end, i.e. the first of the characters around the current matching position must be a
word character
and the second one may not.
- \G
- Matches only at the original starting position. Guarantees that only the position within the document, where the search started, is matched at by this construct. Starting position is usually the one with the caret before the search, also indicated by the Status Bar.
Capture groups
- (
- Begins a new capture group. Capture groups are useful for back-references in both search and replace patterns.
- )
- Ends current capture group. Note: Capture groups can be nested.
Alternations
- |
- Divides pattern alternations. As long as any of the alternations matches, the entire pattern matches.
Character classes
- [
- Opens character class definition. See character class definition below.
More escape sequences
Since many characters have special meanings in regular expressions, escapes are provided to allow using these characters in searches.
- \\
- Matches character \. Note: Unescaped single \ has a special meaning.
- \^
- Matches character ^. Note: Unescaped single ^ has a special meaning.
- \$
- Matches character $. Note: Unescaped single $ has a special meaning.
- \.
- Matches character .. Note: Unescaped single . has a special meaning.
- \|
- Matches character |. Note: Unescaped single | has a special meaning.
- \(
- Matches character (. Note: Unescaped single ( has a special meaning.
- \)
- Matches character ). Note: Unescaped single ) has a special meaning.
- \[
- Matches character [. Note: Unescaped single [ has a special meaning.
- \]
- Matches character ]. Note: Unescaped single ] has a special meaning.
- \*
- Matches character *. Note: Unescaped single * has a special meaning.
- \+
- Matches character +. Note: Unescaped single + has a special meaning.
- \?
- Matches character ?. Note: Unescaped single ? has a special meaning.
- \{
- Matches character {. Note: Unescaped single { has a special meaning.
- \}
- Matches character }. Note: Unescaped single } has a special meaning.
- \<
- Matches character <. Note: Unescaped single < has a special meaning.
- \>
- Matches character >. Note: Unescaped single > has a special meaning.
- \:
- Matches character :. Note: Unescaped single : has a special meaning.
Replace patterns
Any of these constructs may appear anywhere in the replace pattern, as long as regular expressions are turned on.
- \\
- Inserts a backslash.
- \n
- Inserts a newline sequence (CR, NL or CR/NL; depends on current document options).
- \t
- Inserts a horizontal tab character (TAB).
- \f
- Inserts a form feed character (FF).
- \a
- Inserts a vell character (BEL).
- \v
- Inserts a vertical tab (VT).
- \e
- Inserts an escape character (ESC).
- \0
- Inserts a null character (NUL).
- \xhh
- Inserts a character in hex notation.
- \uhhhh
- Inserts a character in unicode notation (unicode version only).
- \cA
- Inserts a character in control notation.
- \Q ... \E
- Quoted string. Anything between \Q and \E is treated as plain-text string and is inserted exactly as it appears in the pattern.
- \&
- Back-reference to the entire match.
- \1, \2, ..., \9
- Back-reference to a specific captured group.
- \+
- Back-reference to the last successfull captured group. Consider having several alternations, each with a group inside it. Only one of the alternations will match, thus only one of those groups will be valid upon replacing. This back-reference allows referencing the correct one of those groups, based on which of the alternations matched.
- Note: This can also be achieved by using branch restart groups.