- Perl ›
- Regular expression ›
- here
Regular expression character class
Use a character class to represent a set of characters.
[Character set]
For example, if you want to express the letters "a", "b", or "c", write [abc].
a b->[abc] c
You can use the hyphen "-" symbol to specify a range of letters or numbers.
a b b c->[a-e] d e
0 1 2->[0-4] 3 Four
If you want to express either an alphabet or a numerical value, write [a-zA-Z0-9].
Alphabets and numbers->[a-zA-Z0-9]
Negation of character class
You can also use character classes to represent non-specific characters. Use "^" to represent something other than a specific character. Note that if "^" is used at the beginning of the character class [], it means "other than that", not "the beginning of the character".
For example, to represent a character other than "a", "b", and "c":
Characters other than "a", "b" and "c"->[^abc]
Perl - defined character classes
There is a character class defined in Perl. "\D" is a predefined character class.
Character class - Regular expression reference, quote the defined character class ..
\d Numerical value \D non-numeric \w word letter \W Non-word character \s whitespace \S Non-whitespace character \h Horizontal whitespace \H Non-horizontal whitespace character \n Non-line feed (if not followed by'{NAME}'; experimental; in character class Illegal; equivalent to [^\n]; similar to'.' Without/s) \v Vertical whitespace \V Non-vertical whitespace \r General a line break (?>\V |\x0D\x0A) \C Match 1 byte (in Unicode, '.' Matches a character) \pP P P's name (Unicode) property \p{...} Matches a Unicode property with a name longer than one character \PP Matches non-P \P{...} Matches one that does not have a Unicode property with a name longer than one character \X Matches Unicode extended grapheme clusters
Character classes can also be used within character classes.
[a-c\d\s]
Please note that all of these are Unicode affected. Perl supports Unicode, and a good practice for working with Japanese is all decoded strings (decoded) in your program. Treat it as a string).
In this case, "\d" matches not only half-width 0-9 but also full-width 0-9.\s matches not only half-width space characters but also full-width space characters.
If you want to limit it to the ASCII range, read the instructions below.
Limit character classes to ASCII range only
Perl 5.14 introduces the "a" option, which limits the character class to the ASCII range only.
# Match only numbers in the ASCII range. $str =~ /\d+/a;
Character class that represents only ASCII range
Perl also has character classes that represent ASCII ranges. This method is another solution if you want to limit it to the ASCII range only.
For example, in the ASCII range, character classes that match alphabets and numbers can be expressed as follows.
# Character classes that match alphabets and numbers in the ASCII range \p{PosixAlnum}
Character class - Regular expression reference, quote the defined character class ..
Notice the character class in the column labeled ASCII-range.
ASCII- Full- POSIX range range backslash [[: ...:]]\p{...}\p{...} sequence Description - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --- - -- -- -- -- -- -- - alnum PosixAlnum XPosixAlnum Alpha and Digit alpha PosixAlpha XPosixAlpha alphabet ascii ASCII any ASCII character blank PosixBlank XPosixBlank\h Horizontal blank; All types are\p{HorizSpace} Tomo (GNU extension) cntrl PosixCntrl XPosixCntrl Control character digit PosixDigit XPosixDigit\d Numbers graph PosixGraph XPosixGraph Alnum and Punct lower PosixLower XPosixLower Lowercase print PosixPrint XPosixPrint Graph and Print, but Does not include Cntrl punct PosixPunct XPosixPunct With ASCII range punctuation Symbol; simply outside of it punct space PosixSpace XPosixSpace [\s\cK] PerlSpace XPerlSpace\s Perl blank definition upper PosixUpper XPosixUpper Uppercase word PosixWord XPosixWord\w Alnum + Unicode mark + Connection characters like'_' (Perl extension) xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal; In the ASCII range [0-9A-Fa-f]
Character class example
This is an example character class. Example user ID regular expression and web service password regular expression that can also be used for website membership registration I also wrote.
use strict; use warnings; use utf8; # Use character class my $str = 'Hello'; if ($str =~ /[abcde]/) { print "Match\n"; } # Match with web service user ID (a-zA-Z0-9_) my $user_id = 'kimoto_yuki089'; my $user_id_invalid = 'abc & cde'; if ($user_id =~ /^\p{PosixWord}+$/) {print "user_is valid\n"; } unless ($user_id_invalid =~ /^\p{PosixWord}+$/) { print "user_id_invalid is invalid\n"; } # Match a web service password(a non-blank visible character in the ASCII range) my $password = 'Ufy | & 123_'; my $passowrd_invalid1 = 'abc def'; my $passowrd_invalid2 = 'abc a'; if ($password =~ /^\p{PosixGraph}+$/) { print "password is valid\n"; } unless ($passowrd_invalid1 =~ /^\p{PosixGraph}+$/) { print "passowrd_invalid1 is inalid\n"; } unless ($passowrd_invalid2 =~ /^\p{PosixGraph}+$/) { print "passowrd_invalid2 is inalid\n"; }