求助,带有特殊字符如何拆分?

jarodzhao 2018-02-23 10:21:13
ss = "'name':'Love up,love cup. 卡通大肚 陶瓷马克杯','id':'8746089' , 'price':'16','brand':'Love up,love cup.' ,'mall':'天猫精选', 'category':'日用百货/厨房用具/水具酒具/陶瓷杯','metric1':'16','dimension10':'tmall.com','dimension9':'faxian','dimension11':'1阶价格','dimension12':'天猫精选','dimension20':'无','dimension32':'先发后审','dimension25':'10162'"
print(ss)
print()

sss = ss.split(',')
for s in sss:
print(s)

本来用逗号分挺好的,突然有条数据里name值里有逗号,造成分割出现问题。数据的格式是固定的,求助如何分割不出问题?
...全文
1936 11 打赏 收藏 转发到动态 举报
写回复
用AI写文章
11 条回复
切换为时间正序
请发表友善的回复…
发表回复
jarodzhao 2018-02-27
  • 打赏
  • 举报
回复
引用 10 楼 xpresslink 的回复:
用我这个方法对于有单引号转义也没有问题

>>> raw_str = "{'name':'Jack N\'Jil l, 杰克吉尔 牙刷兔子图案1支+儿童牙膏香蕉味 50g','id':'8753131' , 'price':'11','brand':'' ,'mall':'Amcal中文官网', 'category':'母婴用品/洗护用品/婴儿护理用品/婴儿口腔护理','metric1':'11','dimension10':'amcal.com.au','dimension9':'faxian','dimension11':'1阶价格','dimension12':'Amcal中文官网','dimension20':'无','dimension32':'无','dimension25':'769'}"
>>> d = eval(repr(raw_str))
>>> d
"{'name':'Jack N'Jil l, 杰克吉尔 牙刷兔子图案1支+儿童牙膏香蕉味 50g','id':'8753131' , 'price':'11','brand':'' ,'mall':'Amcal中文官网', 'category':'母婴用品/洗护用品/婴儿护理用品/婴儿口腔护理','metric1':'11','dimension10':'amcal.com.au','dimension9':'faxian','dimension11':'1阶价格','dimension12':'Amcal中文官网','dimension20':'无','dimension32':'无','dimension25':'769'}"
>>> 
问题越来越复杂,不但有单引号,还有逗号,还有/。。。准备买本书好好学一下正则表达式
sanGuo_uu 2018-02-24
  • 打赏
  • 举报
回复
正则推荐:https://www.cnblogs.com/deerchao/archive/2006/08/24/zhengzhe30fengzhongjiaocheng.html 这个多看几遍,就能稍微有点感觉了 你这个得具体问题具体分析了
import re

ss = "gtmAddToCart({'name':'Jack N\'Jil l, 杰克吉尔 牙刷兔子图案1支+儿童牙膏香蕉味 50g','id':'8753131' , 'price':'11','brand':'' ,'mall':'Amcal中文官网', 'category':'母婴用品/洗护用品/婴儿护理用品/婴儿口腔护理','metric1':'11','dimension10':'amcal.com.au','dimension9':'faxian','dimension11':'1阶价格','dimension12':'Amcal中文官网','dimension20':'无','dimension32':'无','dimension25':'769'})"
sss = re.findall(r"('.*?':'.*?'[$|\s*,])",ss,re.S)

for item in sss:
	print(item)
jarodzhao 2018-02-24
  • 打赏
  • 举报
回复
引用 6 楼 xpresslink 的回复:
我这个人就喜欢偷懒

Python 3.6.2 (v3.6.2:5fd33b5, Jul  8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ss = "'name':'Love up,love cup. 卡通大肚 陶瓷马克杯','id':'8746089' , 'price':'16','brand':'Love up,love cup.' ,'mall':'天猫精选', 'category':'日用百货/厨房用具/水具酒具/陶瓷杯','metric1':'16','dimension10':'tmall.com','dimension9':'faxian','dimension11':'1阶价格','dimension12':'天猫精选','dimension20':'无','dimension32':'先发后审','dimension25':'10162'"
>>> s_dict = eval('{%s}' % ss)
>>> list(s_dict.items())
[('name', 'Love up,love cup. 卡通大肚 陶瓷马克杯'), ('id', '8746089'), ('price', '16'), ('brand', 'Love up,love cup.'), ('mall', '天猫精选'), ('category', '日用百货/厨房用具/水具酒具/陶瓷杯'), ('metric1', '16'), ('dimension10', 'tmall.com'), ('dimension9', 'faxian'), ('dimension11', '1阶价格'), ('dimension12', '天猫精选'), ('dimension20', '无'), ('dimension32', '先发后审'), ('dimension25', '10162')]
>>> 
麻烦看一下7楼的情况,能不能直接拆解成字典?谢谢先
jarodzhao 2018-02-24
  • 打赏
  • 举报
回复
引用 5 楼 u012536120 的回复:
import re

ss = "'name':'Love up,love cup. 卡通大肚 陶瓷马克杯','id':'8746089'  , 'price':'16','brand':'Love up,love cup.' ,'mall':'天猫精选', 'category':'日用百货/厨房用具/水具酒具/陶瓷杯','metric1':'16','dimension10':'tmall.com','dimension9':'faxian','dimension11':'1阶价格','dimension12':'天猫精选','dimension20':'无','dimension32':'先发后审','dimension25':'10162'"
sss = re.findall(r"('.*?':'.*?')",ss,re.S)

for item in sss:
	print(item)
含有逗号的情况解决了,谢谢 但是又遇到一种情况,含有单引号,还有转义符。。。麻烦再帮忙看看
import re

a = ("gtmAddToCart({'name':'Jack N\'Jil l, 杰克吉尔 牙刷兔子图案1支+儿童牙膏香蕉味 50g','id':'8753131' , 'price':'11','brand':'' ,'mall':'Amcal中文官网', 'category':'母婴用品/洗护用品/婴儿护理用品/婴儿口腔护理','metric1':'11','dimension10':'amcal.com.au','dimension9':'faxian','dimension11':'1阶价格','dimension12':'Amcal中文官网','dimension20':'无','dimension32':'无','dimension25':'769'})")

b = re.findall(r"(\'.*?'\:\'.*?\')", a, re.S)


print(type(b))
另,想看看正则表达式,有没有什么书籍推荐一下?
混沌鳄鱼 2018-02-24
  • 打赏
  • 举报
回复
用我这个方法对于有单引号转义也没有问题

>>> raw_str = "{'name':'Jack N\'Jil l, 杰克吉尔 牙刷兔子图案1支+儿童牙膏香蕉味 50g','id':'8753131' , 'price':'11','brand':'' ,'mall':'Amcal中文官网', 'category':'母婴用品/洗护用品/婴儿护理用品/婴儿口腔护理','metric1':'11','dimension10':'amcal.com.au','dimension9':'faxian','dimension11':'1阶价格','dimension12':'Amcal中文官网','dimension20':'无','dimension32':'无','dimension25':'769'}"
>>> d = eval(repr(raw_str))
>>> d
"{'name':'Jack N'Jil l, 杰克吉尔 牙刷兔子图案1支+儿童牙膏香蕉味 50g','id':'8753131' , 'price':'11','brand':'' ,'mall':'Amcal中文官网', 'category':'母婴用品/洗护用品/婴儿护理用品/婴儿口腔护理','metric1':'11','dimension10':'amcal.com.au','dimension9':'faxian','dimension11':'1阶价格','dimension12':'Amcal中文官网','dimension20':'无','dimension32':'无','dimension25':'769'}"
>>> 
sanGuo_uu 2018-02-23
  • 打赏
  • 举报
回复
ss = "'name':'Love up,love cup. 卡通大肚 陶瓷马克杯','id':'8746089' , 'price':'16','brand':'Love up,love cup.' ,'mall':'天猫精选', 'category':'日用百货/厨房用具/水具酒具/陶瓷杯','metric1':'16','dimension10':'tmall.com','dimension9':'faxian','dimension11':'1阶价格','dimension12':'天猫精选','dimension20':'无','dimension32':'先发后审','dimension25':'10162'"

sss = ss.split("','")
for item in sss:
	s=item.replace("'","")
	print(s)
这样子简单点,也有复杂点的办法
混沌鳄鱼 2018-02-23
  • 打赏
  • 举报
回复
我这个人就喜欢偷懒

Python 3.6.2 (v3.6.2:5fd33b5, Jul  8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ss = "'name':'Love up,love cup. 卡通大肚 陶瓷马克杯','id':'8746089' , 'price':'16','brand':'Love up,love cup.' ,'mall':'天猫精选', 'category':'日用百货/厨房用具/水具酒具/陶瓷杯','metric1':'16','dimension10':'tmall.com','dimension9':'faxian','dimension11':'1阶价格','dimension12':'天猫精选','dimension20':'无','dimension32':'先发后审','dimension25':'10162'"
>>> s_dict = eval('{%s}' % ss)
>>> list(s_dict.items())
[('name', 'Love up,love cup. 卡通大肚 陶瓷马克杯'), ('id', '8746089'), ('price', '16'), ('brand', 'Love up,love cup.'), ('mall', '天猫精选'), ('category', '日用百货/厨房用具/水具酒具/陶瓷杯'), ('metric1', '16'), ('dimension10', 'tmall.com'), ('dimension9', 'faxian'), ('dimension11', '1阶价格'), ('dimension12', '天猫精选'), ('dimension20', '无'), ('dimension32', '先发后审'), ('dimension25', '10162')]
>>> 
sanGuo_uu 2018-02-23
  • 打赏
  • 举报
回复
import re

ss = "'name':'Love up,love cup. 卡通大肚 陶瓷马克杯','id':'8746089'  , 'price':'16','brand':'Love up,love cup.' ,'mall':'天猫精选', 'category':'日用百货/厨房用具/水具酒具/陶瓷杯','metric1':'16','dimension10':'tmall.com','dimension9':'faxian','dimension11':'1阶价格','dimension12':'天猫精选','dimension20':'无','dimension32':'先发后审','dimension25':'10162'"
sss = re.findall(r"('.*?':'.*?')",ss,re.S)

for item in sss:
	print(item)
sanGuo_uu 2018-02-23
  • 打赏
  • 举报
回复
\s* 表示若干个空格(可以是0个)
import re

ss = "'name':'Love up,love cup. 卡通大肚 陶瓷马克杯','id':'8746089'  , 'price':'16','brand':'Love up,love cup.' ,'mall':'天猫精选', 'category':'日用百货/厨房用具/水具酒具/陶瓷杯','metric1':'16','dimension10':'tmall.com','dimension9':'faxian','dimension11':'1阶价格','dimension12':'天猫精选','dimension20':'无','dimension32':'先发后审','dimension25':'10162'"

sss = re.split(r"'\s*,\s*'",ss)
for item in sss:
	s=item.replace("'","")
	print(s)
jarodzhao 2018-02-23
  • 打赏
  • 举报
回复
回2楼,我尝试过用这种方法分割。但是有些数据中,和'之间会有空格,所以这个方案被pass了
jarodzhao 2018-02-23
  • 打赏
  • 举报
回复
回2楼,我尝试过用这种方法分割。但是有些数据中,和'之间会有空格,所以这个方案被pass了

37,719

社区成员

发帖
与我相关
我的任务
社区描述
JavaScript,VBScript,AngleScript,ActionScript,Shell,Perl,Ruby,Lua,Tcl,Scala,MaxScript 等脚本语言交流。
社区管理员
  • 脚本语言(Perl/Python)社区
  • IT.BOB
加入社区
  • 近7日
  • 近30日
  • 至今

试试用AI创作助手写篇文章吧