OverflowError: regular expression code size limit exceeded

lioujian47 2009-11-12 06:13:17

python 2.5.2

for i in list(set(re.findall(name_string, y))):
,....

name_string是这样的结构:
'''
a|b|c......
'''
当然这个name_string比较大

然后就报错了:
OverflowError: regular expression code size limit exceeded

(<type 'exceptions.OverflowError'>, OverflowError('regular expression code size limit exceeded',), <traceback object at 0x033D2B20>)

我没有找到解决办法,请高人指教?!

...全文

327 6 打赏收藏转发到动态举报

写回复

用AI写文章

6 条回复

切换为时间正序

请发表友善的回复…

发表回复

notax 2009-11-13

打赏
举报

[Quote=引用 4 楼 lioujian47 的回复:]
主要避免这种情况:
张杰文说什么。

name_string = '张杰文|张杰'
[/Quote]

这个还是可以避免，首尾相接，分批处理，

from __future__ import generators

s1='ab|cd|e'
s2='f|hk|jk'
s3='|lm|op|'
s4='|qr|'
s5='st|'

def process_your_data(s):
print [ w for w in s.split('|') if w ]

def read_data():
head, tail, t = None, None, None
first_time = True

for s in (s1, s2, s3, s4, s5):
head, tail = s.rsplit('|', 1)

if first_time :
yield head
first_time = False
else:
if head and t:
yield '%s%s'%(t, head)
elif head:
yield '%s'%(head)
t = tail
head = ''

for s in read_data():
process_your_data(s)

"""
>python split_test.py
['ab', 'cd']
['ef', 'hk']
['jk', 'lm', 'op']
['qr']
['st']
"""

notax 2009-11-13

打赏
举报

如果可以分成很多行的话，用readline + generator的方法，分开处理，

不行的话或者用

import sys

f = open('text.txt','r')
line = ''
while 1:
part = f.read(1024)
if not part:
break

line = ''.join(part.split('|'))
#sys.stdout.write(line)

#processing_your_data(line)

看看行不行

另外，python 3 的话，请参考
http://neopythonic.blogspot.com/2008/10/sorting-million-32-bit-integers-in-2mb.html

cppfaq 2009-11-13

打赏
举报

lioujian47 2009-11-13

打赏
举报

主要避免这种情况:
张杰文说什么。

name_string = '张杰文|张杰'
这样取到的是张杰文而非张杰
才是我要的结果....

分开文件之后,我发现了新问题,就是如同上面的一样,一个文件中有张杰文另一个有张杰
这样就会匹配出 2个来,而这句话只需要匹配出张杰文。不知道是不是要用到文本分类的算法了...
这东西的算法太复杂费心了..

thy38 2009-11-12

打赏
举报

Python的正则的确有这么个限制，有点不方便。我提两个参考建议：
1.将name_string切分开来，分批处理
2.如果你的name_string能够保证是“a|b|c...”的结构，那为什么不用split()？

>>> a

'a|23|fwfew|w|f'

>>> a.split('|')

['a', '23', 'fwfew', 'w', 'f']

正则并非在所有情况下都是最好的。

lioujian47 2009-11-12

打赏
举报

name_string是打开一个文件的来的,文件有70-80k吧
现在我临时做了一个办法,把分成2个文件
可行,不知道有没有人有更好的办法?

22403 Example 2-49 使用 code 模块实现简单的 Debugging 3 线程和进程 31 概览 311 线程 312 进程 32 threading 模块 3201 Example 3-1 使用 threading 模块 33 Queue 模块 ...

2.24. code 模块线程和进程 3.1. 概览 3.2. threading 模块 3.3. Queue 模块 3.4. thread 模块 3.5. commands 模块 3.6. pipes 模块 3.7. popen2 模块 3.8. signal 模块 ...

文章来自于：http://hyperpolyglot.org/scripting a side-by-side reference sheet sheet one: ...source code encoding ...conditional expression $x >...

conditional expression $ x > 0 ? $ x : -$ x $x > 0 ? $x : -$x x if x > 0 else -x x > 0 ? x : -x arithmetic and logic php perl python ruby ...

Style chooser: Modern, Modern Black&White, Classic, High contrast or Printing [Hint: Use styles Modern Black & White or Printing to print. If you get problems, try printing the PDF versions instead]

脚本语言

37,743

社区成员

34,212

社区内容

发帖

与我相关

我的任务

社区管理员

加入社区

近7日
近30日
至今

加载中

查看更多榜单

试试用AI创作助手写篇文章吧

+ 用AI写文章