1. Perl
  2. Regular expression
  3. here

Regular expression character class

Use a character class to represent a set of characters.

[Character set]

For example, if you want to express the letters "a", "b", or "c", write [abc].

a
b->[abc]
c

You can use the hyphen "-" symbol to specify a range of letters or numbers.

a
b b
c->[a-e]
d
e
0
1
2->[0-4]
3
Four

If you want to express either an alphabet or a numerical value, write [a-zA-Z0-9].

Alphabets and numbers->[a-zA-Z0-9]

Negation of character class

You can also use character classes to represent non-specific characters. Use "^" to represent something other than a specific character. Note that if "^" is used at the beginning of the character class [], it means "other than that", not "the beginning of the character".

For example, to represent a character other than "a", "b", and "c":

Characters other than "a", "b" and "c"->[^abc]

Perl - defined character classes

There is a character class defined in Perl. "\D" is a predefined character class.

Character class - Regular expression reference, quote the defined character class ..

\d Numerical value
\D non-numeric
\w word letter
\W Non-word character
\s whitespace
\S Non-whitespace character
\h Horizontal whitespace
\H Non-horizontal whitespace character
  \n Non-line feed (if not followed by'{NAME}'; experimental; in character class
           Illegal; equivalent to [^\n]; similar to'.' Without/s)
\v Vertical whitespace
\V Non-vertical whitespace
  \r General a line break (?>\V |\x0D\x0A)
\C Match 1 byte (in Unicode, '.' Matches a character)
\pP P P's name (Unicode) property
\p{...} Matches a Unicode property with a name longer than one character
\PP Matches non-P
\P{...} Matches one that does not have a Unicode property with a name longer than one character
\X Matches Unicode extended grapheme clusters

Character classes can also be used within character classes.

[a-c\d\s]

Please note that all of these are Unicode affected. Perl supports Unicode, and a good practice for working with Japanese is all decoded strings (decoded) in your program. Treat it as a string).

In this case, "\d" matches not only half-width 0-9 but also full-width 0-9.\s matches not only half-width space characters but also full-width space characters.

If you want to limit it to the ASCII range, read the instructions below.

Limit character classes to ASCII range only

Perl 5.14 introduces the "a" option, which limits the character class to the ASCII range only.

# Match only numbers in the ASCII range.
$str =~ /\d+/a;

Character class that represents only ASCII range

Perl also has character classes that represent ASCII ranges. This method is another solution if you want to limit it to the ASCII range only.

For example, in the ASCII range, character classes that match alphabets and numbers can be expressed as follows.

# Character classes that match alphabets and numbers in the ASCII range
\p{PosixAlnum}

Character class - Regular expression reference, quote the defined character class ..

Notice the character class in the column labeled ASCII-range.

            ASCII- Full-
   POSIX range range backslash
 [[: ...:]]\p{...}\p{...} sequence Description
 - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --- - -- -- -- -- -- -- -
 alnum PosixAlnum XPosixAlnum Alpha and Digit
 alpha PosixAlpha XPosixAlpha alphabet
 ascii ASCII any ASCII character
 blank PosixBlank XPosixBlank\h Horizontal blank;
                                                   All types are\p{HorizSpace}
                                                   Tomo (GNU extension)
 cntrl PosixCntrl XPosixCntrl Control character
 digit PosixDigit XPosixDigit\d Numbers
 graph PosixGraph XPosixGraph Alnum and Punct
 lower PosixLower XPosixLower Lowercase
 print PosixPrint XPosixPrint Graph and Print, but
                                                   Does not include Cntrl
 punct PosixPunct XPosixPunct With ASCII range punctuation
                                                   Symbol; simply outside of it
                                                   punct
 space PosixSpace XPosixSpace [\s\cK]
         PerlSpace XPerlSpace\s Perl blank definition
 upper PosixUpper XPosixUpper Uppercase
 word PosixWord XPosixWord\w Alnum + Unicode mark +
                                                   Connection characters like'_'
                                                   (Perl extension)
 xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal;
                                                    In the ASCII range
                                                    [0-9A-Fa-f]

Character class example

This is an example character class. Example user ID regular expression and web service password regular expression that can also be used for website membership registration I also wrote.

use strict;
use warnings;
use utf8;

# Use character class
my $str = 'Hello';
if ($str =~ /[abcde]/) {
  print "Match\n";
}

# Match with web service user ID (a-zA-Z0-9_)
my $user_id = 'kimoto_yuki089';
my $user_id_invalid = 'abc & cde';
if ($user_id =~ /^\p{PosixWord}+$/) {print "user_is valid\n";
}
unless ($user_id_invalid =~ /^\p{PosixWord}+$/) {
   print "user_id_invalid is invalid\n";
}

# Match a web service password(a non-blank visible character in the ASCII range)
my $password = 'Ufy | & 123_';
my $passowrd_invalid1 = 'abc def';
my $passowrd_invalid2 = 'abc a';
if ($password =~ /^\p{PosixGraph}+$/) {
   print "password is valid\n";
}
unless ($passowrd_invalid1 =~ /^\p{PosixGraph}+$/) {
   print "passowrd_invalid1 is inalid\n";
}
unless ($passowrd_invalid2 =~ /^\p{PosixGraph}+$/) {
   print "passowrd_invalid2 is inalid\n";
}

Related Informatrion