Oracle 10g 政则表达式介绍

ruir 2008-05-02 08:44:18

*****************************************
lynx286原创, 首发www.mydwbi.com
http://www.mydwbi.com/posts/list/47.page
欢迎转贴, 但转贴请不要删除此段版权说明.
*****************************************

Purpose
Data validation, identification of duplicate word occurrences, detection of extraneous white spaces, or parsing of strings are just some of the many uses of regular expressions.
You can apply them in order to validate the formats of phone numbers, zip codes, email addresses, Social Security numbers, IP addresses, filenames and pathnames, and so on.
Furthermore, you can locate patterns such as HTML tags, numbers, dates, or anything that fits any pattern within any textual data and replace them with other patterns.
Using Regular Expressions With Oracle 10g
To harness the power of regular expressions, you can exploit the newly introduced Oracle SQL REGEXP_LIKE operator and the REGEXP_INSTR, REGEXP_SUBSTR, and REGEXP_REPLACE functions. You will see how this new functionality supplements the existing LIKE operator and the INSTR, SUBSTR, and REPLACE functions.
In fact, they are similar to the existing operator and functions but now offer powerful pattern-matching capabilities. The searched data can be simple strings or large volumes of text stored in the database character columns. Regular expressions let you search, replace, and validate data in ways you have never thought of before, with a high degree of flexibility.
Syntax
Official document:
http://www.itk.ilstu.edu/docs/oracle/server.101/b10759/ap_posix001.htm#sthref7430

操作符
'^' 匹配输入字符串的开始位置，在方括号表达式中使用，此时它表示不接受该字符集合。'$' 匹配输入字符串的结尾位置。如果设置了 RegExp 对象的 Multiline 属性，则 $ 也匹配 'n' 或 'r'。'.' 匹配除换行符 n之外的任何单字符。'?' 匹配前面的子表达式零次或一次。'*' 匹配前面的子表达式零次或多次。'+' 匹配前面的子表达式一次或多次。'( )' 标记一个子表达式的开始和结束位置。'[]' 标记一个中括号表达式。'{m,n}' 一个精确地出现次数范围，m=<出现次数<=n，'{m}'表示出现m次，'{m,}'表示至少出现m次。'|' 指明两项之间的一个选择。例子'^([a-z]+|[0-9]+)$'表示所有小写字母或数字组合成的字符串。
Syntax
字符簇：
[[:alpha:]] 任何字母。[[:digit:]] 任何数字。[[:alnum:]] 任何字母和数字。[[:space:]] 空格。[[:upper:]] 任何大写字母。[[:lower:]] 任何小写字母。[[:punct:]] 任何标点符号。[[:xdigit:]] 任何16进制的数字，相当于[0-9a-fA-F]。

各种操作符的运算优先级:
转义符
圆括号和方括号: (), (?:), (?=), []
限定符: *, +, ?, {n}, {n,}, {n,m}
位置和顺序: ^, $, any metacharacter
“或”操作: |
Examples
create table test(mc varchar2(60));
insert into test values('112233445566778899');insert into test values('22113344 5566778899');insert into test values('33112244 5566778899');insert into test values('44112233 5566 778899');insert into test values('5511 2233 4466778899');insert into test values('661122334455778899');insert into test values('771122334455668899');insert into test values('881122334455667799');insert into test values('991122334455667788');insert into test values('aabbccddee');insert into test values('bbaaaccddee');insert into test values('ccabbddee');insert into test values('ddaabbccee');insert into test values('eeaabbccdd');insert into test values('ab123');insert into test values('123xy');insert into test values('007ab');insert into test values('abcxy');insert into test values('The final test is is is how to find duplicate words.');
REGEXP_LIKE
SQL> select * from test where regexp_like(mc,'^a{1,3}');
MC
------------------------------------------------------------
aabbccddee
ab123
Abcxy

查询a开头,并且a出现1次到三次的记录.

REGEXP_LIKE
SQL> select * from test where regexp_like(mc,'a{1,3}');

MC
------------------------------------------------------------
aabbccddee
bbaaaccddee
ccabbddee
ddaabbccee
eeaabbccdd
ab123
007ab
abcxy
The final test is is is how to find duplicate words.
REGEXP_LIKE
SQL> select * from test where regexp_like(mc,'^a.*e$');

MC
------------------------------------------------------------
aabbccddee

查询以a开头,中间有零个或多个除换行符之外的任何单字符,以e结尾的记录.

REGEXP_LIKE
SQL> select * from test where regexp_like(mc,'^[[:lower:]]');

MC
------------------------------------------------------------
aabbccddee
bbaaaccddee
ccabbddee
ddaabbccee
eeaabbccdd
ab123
Abcxy

查询以小写字母开头的记录.
REGEXP_LIKE
SQL> select * from test where regexp_like(mc,'^[[:lower:]]|[[:digit:]]');

MC
------------------------------------------------------------
112233445566778899
…..
ccabbddee
ddaabbccee
eeaabbccdd
ab123
123xy
007ab
abcxy
REGEXP_LIKE
SQL> Select mc FROM test Where REGEXP_LIKE(mc,'[^[:digit:]]');

MC
------------------------------------------------------------
22113344 5566778899
aabbccddee
bbaaaccddee
ccabbddee
ddaabbccee
eeaabbccdd
ab123
123xy
007ab
abcxy
The final test is is is how to find duplicate words.
查询包含非数字的记录
REGEXP_LIKE
SQL> Select mc FROM test Where REGEXP_LIKE(mc,'^[^[:digit:]]');

MC
------------------------------------------------------------
aabbccddee
bbaaaccddee
ccabbddee
ddaabbccee
eeaabbccdd
ab123
abcxy
The final test is is is how to find duplicate words.
REGEXP_INSTR
SQL> Select REGEXP_INSTR('aa453rer455daf554443','[[:digit:]]$') result from dual;

RESULT
----------
20

查以数字结尾字符串编号.
REGEXP_INSTR
SQL> Select REGEXP_INSTR('aa453rer455daf554443','[[:digit:]]+$') result from dual;

RESULT
----------
15

查以多个数字结尾的字符串编号.
REGEXP_INSTR
SQL> SELECT REGEXP_INSTR('Joe Smith, 10045 Berry Lane, San Joseph, CA 91234','[[:digit:]]{5}$') AS rx_instr FROM dual;

RX_INSTR
----------
45

SQL> SELECT REGEXP_INSTR('Joe Smith, 10045 Berry Lane, San Joseph, CA 91234','[[:digit:]]{5}') AS rx_instr FROM dual;

RX_INSTR
----------
12
REGEXP_SUBSTR
SQL> SELECT REGEXP_SUBSTR('11aa22BB33cc44','[a-z]+') result FROM dual;
RESULT
------
aa
SQL> SELECT REGEXP_SUBSTR('11aa22BB33cc44','[a-z]+',3,2) result FROM dual;

RESULT
------
cc
REGEXP_SUBSTR
SQL> SELECT REGEXP_SUBSTR('NE-19225 (01-Oct-2002)','[[:digit:]]{2}-[[:alpha:]]{3}-[[:digit:]]{4}') AS rx_instr FROM dual;

RX_INSTR
-----------
01-Oct-2002

SQL> SELECT REGEXP_SUBSTR('DX-sd4533 (20060203)','[[:digit:]]{8}') AS rx_instr FROM dual;

RX_INSTR
--------
20060203
REGEXP_SUBSTR
SQL> SELECT REGEXP_SUBSTR('The final test is is the implementation','([[:alnum:]]+)([[:space:]]+)\1') AS substr FROM dual;

SUBSTR
------
is is

SQL> SELECT REGEXP_REPLACE('The final test is is the implementation','([[:alnum:]]+)([[:space:]]+)\1','\1') AS substr FROM dual;

SUBSTR
------------------------------------
The final test is the implementation
REGEXP_REPLACE
SQL> SELECT REPLACE('Joe Smith',' ', ' ') AS replace FROM dual;

REPLACE
----------
Joe Smith

SQL> SELECT REGEXP_REPLACE('Joe Smith', '( ){2,}', ' ') AS RX_REPLACE FROM dual;

RX_REPLACE
----------
Joe Smith
REGEXP_REPLACE
SQL> SELECT REGEXP_REPLACE ('<td>aaa</td>', '<[^>]+>') result from dual;

RESULT
------
aaa
SQL> SELECT REGEXP_REPLACE(REGEXP_REPLACE ('NE-19225 (01-Oct-2002)', ' .*'),'.*-') result from dual;

RESULT
------
19225
REGEXP_REPLACE
SQL> select REGEXP_REPLACE('AndyChen','([[:lower:]])([[:upper:]])','\1 \2') result from dual;

RESULT
---------
Andy Chen
Using with DDL
CREATE TABLE t1 (
c1 VARCHAR2(20), CHECK
(REGEXP_LIKE(c1, '^[[:alpha:]]+$')));
INSERT INTO t1 VALUES ('newuser');
-> 1 row created.
INSERT INTO t1 VALUES ('newuser1');
-> ORA-02290: check constraint violated
INSERT INTO t1 VALUES ('new-user');
-> ORA-02290: check constraint violated

Using with DDL
CREATE INDEX t1_ind ON t1 (REGEXP_SUBSTR(c1, 'a'));

SELECT c1
FROM t1
WHERE REGEXP_SUBSTR(c1, 'a') = 'a';

...全文