求助: 现在有两个列表, 找出列表1中不在列表2内的数据, 已知两个列表数据量都很大. 列表2已按字符串长度倒序.

蜜友 2020-04-29 10:45:14

求助:
现在有两个列表, 找出列表1中不在列表2内的数据, 已知两个列表数据量都很大. 列表2已按字符串长度倒序.
列表1.txt
北京金色公司
北京金色世纪公司
北京金公司

列表2.txt
金色世纪
金色

要求输出的结果为
北京金公司

...全文

408 10 打赏收藏转发到动态举报

写回复

用AI写文章

10 条回复

切换为时间正序

请发表友善的回复…

发表回复

RockeyCui 2020-05-06

打赏
举报

去看看KMP算法

ITjavaman 2020-05-06

打赏
举报

按公司名称组成规律，先对列表一先去头去尾，再跟列表二进行匹配？

蜜友 2020-05-02

打赏
举报

引用 6 楼 dhacker1 的回复：

为啥子楼上用sql啊，楼主需要的是txt文档提取数据，先用outputstream读取文件内容，读取方式是一行一行读取，读取出的数据是字符串，最后再进行比较，具体方法就不写了，你可往这个方向查下

无所谓什么语言，但肯定不能用简单的包含来处理

蜜友 2020-05-02

打赏
举报

引用 5 楼 nayi_224 的回复：

[quote=引用 4 楼蜜友的回复:] [quote=引用 3 楼 nayi_224 的回复:]

with tab1 as (
select '111' id from dual union all
select '1121' id from dual union all
select '1131' id from dual 
),
tab2 as (
select '111' id from dual union all
select '121' id from dual union all
select '11331' id from dual
)
select * from tab1 t1
 where not exists(
 select 1 from tab2 t2 where instr(t1.id, t2.id) > 0
 )
;

我把数据改一下吧列表1.txt 北京金色公司北京金色世纪公司北京金公司列表2.txt 金色世纪金要求输出的结果为北京金色公司[/quote] 北京金色公司有金字，为何要输出？[/quote]假设列表2是这些公司的字号，不在列表2中的就需要输出

dhacker1 2020-04-30

打赏
举报

为啥子楼上用sql啊，楼主需要的是txt文档提取数据，先用outputstream读取文件内容，读取方式是一行一行读取，读取出的数据是字符串，最后再进行比较，具体方法就不写了，你可往这个方向查下

nayi_224 2020-04-30

打赏
举报

引用 4 楼蜜友的回复:

[quote=引用 3 楼 nayi_224 的回复:]

with tab1 as (
select '111' id from dual union all
select '1121' id from dual union all
select '1131' id from dual 
),
tab2 as (
select '111' id from dual union all
select '121' id from dual union all
select '11331' id from dual
)
select * from tab1 t1
 where not exists(
 select 1 from tab2 t2 where instr(t1.id, t2.id) > 0
 )
;

我把数据改一下吧列表1.txt 北京金色公司北京金色世纪公司北京金公司列表2.txt 金色世纪金要求输出的结果为北京金色公司[/quote] 北京金色公司有金字，为何要输出？

蜜友 2020-04-30

打赏
举报

引用 3 楼 nayi_224 的回复:

with tab1 as (
select '111' id from dual union all
select '1121' id from dual union all
select '1131' id from dual 
),
tab2 as (
select '111' id from dual union all
select '121' id from dual union all
select '11331' id from dual
)
select * from tab1 t1
 where not exists(
 select 1 from tab2 t2 where instr(t1.id, t2.id) > 0
 )
;

我把数据改一下吧列表1.txt 北京金色公司北京金色世纪公司北京金公司列表2.txt 金色世纪金要求输出的结果为北京金色公司

蜜友 2020-04-29

打赏
举报

引用 1 楼 qqvb 的回复:

我是小白，没有做过这方面的业务，我觉得是可以结合redis来做，将两个表数据加载到redis的set集合中,用sdiff命令计算差集，不知可否？

两个列表中的字符是不一样的, 只知道存在包含关系. 列表1的顺序不确定. 如果两包都是一样的字符串, sdiff 或者数据库的 not in 都可实现.

qqvb 2020-04-29

打赏
举报

我是小白，没有做过这方面的业务，我觉得是可以结合redis来做，将两个表数据加载到redis的set集合中,用sdiff命令计算差集，不知可否？

nayi_224 2020-04-29

打赏
举报

with tab1 as (
select '111' id from dual union all
select '1121' id from dual union all
select '1131' id from dual 
),
tab2 as (
select '111' id from dual union all
select '121' id from dual union all
select '11331' id from dual
)
select * from tab1 t1
 where not exists(
 select 1 from tab2 t2 where instr(t1.id, t2.id) > 0
 )
;