一项特殊的统计,求高效率点的算法

sosoben 2015-06-06 09:16:41

有一个数据表
ID time pos result
1 15:00:1 1 true
1 15:00:1 2 true
1 15:00:1 3 true
1 15:00:1 4 true
1 15:00:2 1 true
1 15:00:2 2 true
3 15:00:1 1 true
3 15:00:1 2 False
3 15:00:1 3 true
3 15:00:1 4 true

如上: 在ID time 作为唯一标识的情况下 Pos项要有1 2 3 4 四项全有才算一组完整数据, 如果其result全为true则1 15:00:1为true

同理 3 15:00:1 为 false, 1 15:00:2 为无效数据

整个数据库少的有几千行,多的有几万,怎么样统计才是效率较高呢?? SQL能做到?

...全文

238 18 打赏收藏转发到动态举报

写回复

用AI写文章

18 条回复

切换为时间正序

请发表友善的回复…

发表回复

sosoben 2015-06-10

打赏
举报

引用 17 楼 sosoben 的回复:

[quote=引用 15 楼 wwwwb 的回复:]

目前怀疑我的数据里面有 datatime一样而 SN不一样的情况,但理论来说应该不会出现,因为datatime是数据写入时间,一个时间应该只有一组SN,所以这个我也不明白,现在正在找原因

sosoben 2015-06-10

打赏
举报

引用 15 楼 wwwwb 的回复:

having count(pos)=8：如果POS有重复，结果会有问题

引用 16 楼 gxttr 的回复:

你不是还有个where result='true' 的条件没有加么

回复wwwwb: 的确有重复(元数据有整行一模一样的),我已经先用 distinct 筛选出一个表,再在这个表套用上面两个公式回复gxttr: 我的确需要true 和 false都要统计数量, 而且最后需要把各组true,false的最早和最晚两条数据分别导出(这个我打算用程序自行判断或linq编写完成) 目前用你的算法算出1780条数据,因为是聚合后的,所以应该是1780*8 = 14240 我那个

SELECT *  FROM dist AS c
  WHERE exists(select [datatime] from (select count(*) as n ,[datatime] from [dist] group by [datatime]) d  where c.datatime = d.datatime and n =8);

得出的是 14272条数据,

SELECT *
FROM [Table] AS a
WHERE 
exists(select 1 from [Table] where a.SN=SN and a.datatime=datatime and pos='W40lp-S ' )
and
exists(select 1 from [Table] where a.SN=SN and a.datatime=datatime and pos='W40lp-M ' )
and
exists(select 1 from [Table] where a.SN=SN and a.datatime=datatime and pos='W20lp-S '  )
and
exists(select 1 from [Table] where a.SN=SN and a.datatime=datatime and pos='W20lp-M '  )
and
exists(select 1 from [Table] where a.SN=SN and a.datatime=datatime and pos='T40lp-S '  )
and
exists(select 1 from [Table] where a.SN=SN and a.datatime=datatime and pos='T40lp-M '  )
and
exists(select 1 from [Table] where a.SN=SN and a.datatime=datatime and pos='T20lp-S '  )
and
exists(select 1 from [Table] where a.SN=SN and a.datatime=datatime and pos='T20lp-M' ) 
;

上述本来想找出同样结果所以把 Judge = true 去掉,但在access新建查询后一开始查找就查死机, 所以没测出来

遥望那些年 2015-06-09

打赏
举报


select ID,Time
from 表
where result='True'
group by ID,Time
having sum(pos)=10

wwwwb 2015-06-09

打赏
举报

对，每组最新时间

sosoben 2015-06-09

打赏
举报

引用 5 楼 wwwwb 的回复:

蓝色为True的一组是要求结果？

我最终要的统计出"红" "蓝"分组的个数,并且把true的组全列出来, False的组又全列出来 select * from tt a where not exists(select 1 from tt where a.id=id and a.time<time) 这个是在我前面的表的基础上找出最新时间的吧?

wwwwb 2015-06-09

打赏
举报

select * from tt a where not exists(select 1 from tt where a.id=id and a.time<time)

wwwwb 2015-06-09

打赏
举报

蓝色为True的一组是要求结果？

sosoben 2015-06-09

打赏
举报

引用 3 楼 tcmakebest 的回复:

time的最后一部分为什么是1位数，不是秒吗？如果是顺序号就太不妙了，一是违反了一个字段只存一个信息的常识，二是取最大的顺序号又增加了麻烦了。access的效率要比SQL低几百倍的。

是秒,手打的时候手误 ,不好意思 ,是datetime格式的当作15:00:01吧 .

遥望那些年 2015-06-09

打赏
举报

引用 14 楼 sosoben 的回复:

[quote=引用 13 楼 gxttr 的回复:] 已经group by ID和Time了呀，还会有pos相同的？ 1 15:00:1 4 true 1 15:00:1 4 true 会有这种记录？这种记录数据库是不允许的呀。不是数字的话可以cast一下么？

我的pos目前实际有8个,("Tde-M 50%" "Tde-S 50%" 之类的复杂字符数字组合 ) 而且都是原始数据读来的,要做一次变换当然可以,用case,cast估计转不了所以我在找的是最优解决方法嘛现在 select SN,dataTime from [dist] group by SN,dataTime having count(pos)=8 这种方法我也在试验, 与我之前的算法得出结果不一样 SELECT * FROM dist AS c WHERE exists(select [datatime] from (select count(*) as n ,[datatime] from [dist] group by [datatime]) d where c.datatime = d.datatime and n =8); 我现在也在纠结到底哪个是正确数据

[/quote] 你不是还有个where result='true' 的条件没有加么

wwwwb 2015-06-09

打赏
举报

having count(pos)=8：如果POS有重复，结果会有问题

sosoben 2015-06-09

打赏
举报

引用 13 楼 gxttr 的回复:

已经group by ID和Time了呀，还会有pos相同的？ 1 15:00:1 4 true 1 15:00:1 4 true 会有这种记录？这种记录数据库是不允许的呀。不是数字的话可以cast一下么？

遥望那些年 2015-06-09

打赏
举报

已经group by ID和Time了呀，还会有pos相同的？ 1 15:00:1 4 true 1 15:00:1 4 true 会有这种记录？这种记录数据库是不允许的呀。不是数字的话可以cast一下么？

於黾 2015-06-09

打赏
举报

如果就是满足有4条那么就别用sum(pos),用count(pos)=4来判断呗还可能出现重复数据吗

於黾 2015-06-09

打赏
举报

你如果自己能想明白逻辑,能简化那么你给出简化的数据,别人给出方法,你再还原回复杂的数据就行了而如果你自己根本不明白,就不要胡乱简化这样别人就算给了你简化数据的查询方法,换回真实数据,你又不会改了,有啥用?

sosoben 2015-06-09

打赏
举报

引用 9 楼 gxttr 的回复:


select ID,Time
from 表
where result='True'
group by ID,Time
having sum(pos)=10

我的真实的表不是数字啊! 而且万一有个 442 或 334 组合 .......

tcmakebest 2015-06-08

打赏
举报

time的最后一部分为什么是1位数，不是秒吗？如果是顺序号就太不妙了，一是违反了一个字段只存一个信息的常识，二是取最大的顺序号又增加了麻烦了。access的效率要比SQL低几百倍的。

sosoben 2015-06-08

打赏
举报

引用 1 楼 WWWWA 的回复:

在ID、POS、result建立索引

谢谢你的解答,实际上我的pos有8个不同选项 , 我修改后到access下已经测试成功, 不过效率好像比以下sql慢(我删除了 result='true'这项条件,查出一样的结果)(注:实际上我是要排除不完整的数据,完整的数据要统计true与false的比例) SELECT * FROM dist AS c WHERE exists(select [datatime] from (select count(*) as n ,[datatime] from [dist] group by [datatime]) d where c.datatime = d.datatime and n =8); 现在问题又来了对于同一ID号有可能有不同时间的有效数据,但我只取最新的一组来统计,那应该怎么写呢? 是不是在表的基础上再查找一次啊? 我加到100分求解答 1 15:00:1 1 true 1 15:00:1 2 true 1 15:00:1 3 False 1 15:00:1 4 true 1 15:00:3 1 true 1 15:00:3 2 true 1 15:00:3 3 true 1 15:00:3 4 true 2 15:00:1 1 true 2 15:00:1 2 true 2 15:00:1 3 False 2 15:00:1 4 true 如上面所示蓝色为True的一组 , 红色为False的一组, 黑色的为较早的无效数据或不完整数据

WWWWA 2015-06-08

打赏
举报

SELECT * from tth a where exists(select 1 from tth where a.id=id and pos=1 and result='true') and exists(select 1 from tth where a.id=id and pos=2 and result='true') and exists(select 1 from tth where a.id=id and pos=3 and result='true') and exists(select 1 from tth where a.id=id and pos=4 and result='true') 在ID、POS、result建立索引