一个挺简单的问题，如何进行统计

im2web 2011-12-14 01:18:20

有一个 table

有 id, recvdate , value 3个字段

有 1000 多个不同的id
每个相同 id 有 200-300 条记录

现在需要统计

id max(recvdate) , value ( max(recvdate) 时候的 )，max(recvdate) , value(max(recvdate) 时候的), min(value), recvdate( min(value) 时候的recvdate) , max(value) , recvdate( max(value）时候的recvdate))

看起来简单，但是做起来比较麻烦。有办法吗?

...全文

233 28 打赏收藏转发到动态举报

写回复

用AI写文章

28 条回复

切换为时间正序

请发表友善的回复…

发表回复

chengg0769 2011-12-21

打赏
举报

我在06年第一次遇到每月上亿条数据的一些报表，其实统计也是很简单，就是从中查找异常数据，只是数据条数太多，以前在笔记本上执行往往几个小时，而且内存小，感觉执行不下去。

当然管理意义上，还会看类似半年存量，增幅，降幅，半年对比分析之类的，都涉及到较多账期的数据大量地读取，筛选，比较等计算。

慢慢摸索，就扩展了一些知识，这些在以往的帖子或者网上资料都零星存在:
1.索引，这个非常关乎数据查询的速度；
2.其次，不能对字段做函数，那样不会走索引，会引起整表的遍历，还有in，or等使用经验，以往有文字记载;
3.再有，就是分区，和分表，减少检索的目标范围；比如一年的数据放一个表和一个月放一张表，显然检索的目标要小很多.像计费数据之类的，还得按系统的要求，进行多种方式的划分，否则规模太大，无法用报表快速查询;比如按天再划分，或者按计费种类，或者按装机类型等等，目的就是切分得更小;
4.软件设计时，考虑尽可能减少对数据库的访问，有些数据可以预先拉到本地来缓存起来。在多用户情况下能极大的减少数据库压力，提高速度；
5.对数据的预处理，比如上面的问题，可以把id，max值查询到一个新表中。这样数量规模只有几万条而已。经过预处理，不会在每次查询都去重复这个"在数百万中筛选几万条"的重复时间成本；
6.辅助的字段或者计算字段，比如判断五个字段得到一个是非结果，我不如增加一个字段，预先计算出这个是非结果，后续的查询我就依据这个字段来判断，减少数倍的时间成本；
7.语句的优化，先前有高手专门做了总结，可以搜搜网上.真正的系统必须做优化，并注意细节.
8.条件字段，尽可能不要用char和varchar，应该全部用id(int)，显示时用fk表.当然datetime等也是变相的int类型，换言之，尽可能是数字类型。这就是在很多大型系统，主表全是id字段.理论上，任何非int字段的运算都会数倍于int的时间成本，而且int等同于散列效果，往往直接指向单一的定义。初学的时候可能更系统用varchar来定义一些如部分编码等等。
9.如果数据表容纳全部数据而未分区分表，就需要在月初月末移除历史数据到历史表中去，减少主表数据量;

为了提高查询速度，提高报表速度，其实归根结底到一句话，就是以空间换时间，用较多的时间和空间去预先处理，可以使得最终的重复的query在较短时间显示出来.在多人系统中，就数倍节省了前期预处理时间，因为只需处理一次，而最终的查询可以数次，数十次。

说的不一定完全正确，需要在实践中去测试，以实际效果为准.

SW_luke 2011-12-20

打赏
举报

好问题！

im2web 2011-12-20

打赏
举报

:) 你说得听对。

我以为靠一个sql 语句可以搞定，看来数据做大规模统计关系行数据库还是不行。 hadoop 这样的并行计算更合适。

卡死的确是误区，但是最后的结局是完全无法给我结果 30 分钟以后也没有结果。

所以我改用python 动态的拼抽sql 然后把结果保存起来一点点的做

2 w 多个id 一共操作8w 多次拿到了结果。

非常的稳定。

本来希望随时可以靠sql 一次搞定。

现在只能出一个结果，定期更新了谢谢大家

chengg0769 2011-12-16

打赏
举报

或许说的卡死是个错误的认识，只不过是执行时间比较久罢了，因为你不是一个简单的查询。max，group by等注定要经过多次的筛选和运算。有个误区，初学者总是不论几万，几十万，几百万数据，总期望在一个语句中实现，或者期望在几秒钟得到结果。其实是不现实的。耗用的时间，很大部分集中在从众多的数据中筛选符合条件的行，并根据你的数据规模而成倍增加，比较复杂的运算耗时几分，几十分，几个小时都有可能。

建议：1.进行必要的数据规模的控制，比如每批只处理10个id或者100个id。并提取数据到临时的表中再进行运算，这样数据规模小很多。以往的经验看，往往10万行内进行统计，数据库的效率都比较满意。上到百万条，io和索引，内存等是存在瓶颈的。

2. 可以增补一些字段，然后做update将一些数据设置标记，并清除掉，或者挪到其他历史表中。
3. 可以做一些计算字段，辅助计算。
4. 对数据的预处理和清洗也是必要的。

im2web 2011-12-14

打赏
举报

数据库是mysql 非 sql_server.

im2web 2011-12-14

打赏
举报

非常感谢大家

我的表有2 483 117 条记录

大概 17000 个id

刚才做了
SELECT
t.id,max(t.value),(SELECT recvdate FROM #temp WHERE t.id=id AND max(t.value)=value)
FROM #temp AS t
GROUP BY id

这一个查询立刻就卡死了。

第二次在被查询的字段做了索引。

结果还是卡死。

在一个 4000 行记录的表里面测试没有问题。

-晴天 2011-12-14

打赏
举报

create table tb(id int,recvdate datetime,value decimal(10,2))

insert into tb select 1,'2011-12-01 09:30:06',12.56

insert into tb select 1,'2011-12-01 11:05:32',12.68

insert into tb select 1,'2011-12-01 11:25:37',12.84

insert into tb select 1,'2011-12-01 13:25:17',12.62

insert into tb select 1,'2011-12-01 15:00:00',12.44

insert into tb select 1,'2011-12-02 11:00:00',12.90

go

select a.id,'2011-12-01' as dt,a.value as 开盘价,b.value as 最高价,c.value as 最低价,d.value as 收盘价

from (

select * from tb a where convert(varchar(10),recvdate,120)='2011-12-01' 

and not exists(select 1 from tb where id=a.id and convert(varchar(10),recvdate,120)='2011-12-01' and recvdate<a.recvdate)

)a inner join (

select * from tb a where convert(varchar(10),recvdate,120)='2011-12-01' 

and not exists(select 1 from tb where id=a.id and convert(varchar(10),recvdate,120)='2011-12-01' and value>a.value)

)b on a.id=b.id

inner join (

select * from tb a where convert(varchar(10),recvdate,120)='2011-12-01' 

and not exists(select 1 from tb where id=a.id and convert(varchar(10),recvdate,120)='2011-12-01' and value<a.value)

)c on c.id=a.id

inner join (

select * from tb a where convert(varchar(10),recvdate,120)='2011-12-01' 

and not exists(select 1 from tb where id=a.id and convert(varchar(10),recvdate,120)='2011-12-01' and recvdate>a.recvdate)

)d on d.id=a.id

/*

id          dt         开盘价         最高价         最低价         收盘价

----------- ---------- -------------- -------------- -------------- --------------------

1           2011-12-01 12.56          12.84          12.44          12.44



(1 行受影响)

*/

go

drop table tb

im2web 2011-12-14

打赏
举报

如果重复就列出一样的

就像收盘价也可以是最高价

chenguang79 2011-12-14

打赏
举报

CREATE TABLE #temp

(

  	id int,

  	recvdate datetime,

  	value int

)



INSERT INTO #temp(id,recvdate,value)VALUES(1,'2011-12-1',100)

INSERT INTO #temp(id,recvdate,value)VALUES(1,'2011-11-8',50)

INSERT INTO #temp(id,recvdate,value)VALUES(1,'2011-10-21',150)

INSERT INTO #temp(id,recvdate,value)VALUES(2,'2011-09-10',200)

INSERT INTO #temp(id,recvdate,value)VALUES(2,'2011-12-5',10)

INSERT INTO #temp(id,recvdate,value)VALUES(2,'2011-01-5',245)

INSERT INTO #temp(id,recvdate,value)VALUES(3,'2010-12-5',10)

INSERT INTO #temp(id,recvdate,value)VALUES(3,'2011-12-6',100)

INSERT INTO #temp(id,recvdate,value)VALUES(3,'2011-12-7',50)

INSERT INTO #temp(id,recvdate,value)VALUES(4,'2011-12-10',12)







SELECT 

t.id,max(t.value),(SELECT recvdate FROM #temp WHERE t.id=id AND max(t.value)=value)

FROM #temp AS t

GROUP BY id



UNION



SELECT 

t.id,min(t.value),(SELECT recvdate FROM #temp WHERE t.id=id AND min(t.value)=value)

FROM #temp AS t

GROUP BY id



UNION 



SELECT 

t.id,(SELECT value FROM #temp WHERE t.id=id AND max(t.recvdate)=recvdate),max(recvdate)

FROM #temp AS t

GROUP BY id



UNION



SELECT 

t.id,(SELECT value FROM #temp WHERE t.id=id AND min(t.recvdate)=recvdate),min(recvdate)

FROM #temp AS t

GROUP BY id

楼主,如果你这四个条件有重的怎么办吧,就是比如说value最小和日期最后是一个呢

都先生 2011-12-14

打赏
举报

select convert(varchar(10),pushtime,120)时间,code as 代码,
(select top 1 price from tb b where code=a.code and convert(varchar(10),pushtime,120)=convert(varchar(10),a.pushtime,120) and not exists(select 1 from tb where code=a.code and convert(varchar(10),pushtime,120)=convert(varchar(10),a.pushtime,120) and pushtime<b.pushtime)) as 开盘价,
(select top 1 price from tb b where code=a.code and convert(varchar(10),pushtime,120)=convert(varchar(10),a.pushtime,120) and not exists(select 1 from tb where code=a.code and convert(varchar(10),pushtime,120)=convert(varchar(10),a.pushtime,120) and pushtime>b.pushtime)) as 收盘价,
(select max(price) from tb where code=a.code and convert(varchar(10),pushtime,120)=convert(varchar(10),a.pushtime,120)) as 最高价,
(select min(price) from tb where code=a.code and convert(varchar(10),pushtime,120)=convert(varchar(10),a.pushtime,120)) as 最低价
from tb a

楼主的想法说的不是很清楚。
你直接告诉我们你想获取的结果是什么信息，直接用文字描述就可以了

im2web 2011-12-14

打赏
举报

非常感谢 ls 我试试。

-晴天 2011-12-14

打赏
举报

参考:

select convert(varchar(10),pushtime,120)时间,code as 代码,

(select top 1 price from tb b where code=a.code and convert(varchar(10),pushtime,120)=convert(varchar(10),a.pushtime,120) and not exists(select 1 from tb where code=a.code and convert(varchar(10),pushtime,120)=convert(varchar(10),a.pushtime,120) and pushtime<b.pushtime)) as 开盘价,

(select top 1 price from tb b where code=a.code and convert(varchar(10),pushtime,120)=convert(varchar(10),a.pushtime,120) and not exists(select 1 from tb where code=a.code and convert(varchar(10),pushtime,120)=convert(varchar(10),a.pushtime,120) and pushtime>b.pushtime)) as 收盘价,

(select max(price) from tb where code=a.code and convert(varchar(10),pushtime,120)=convert(varchar(10),a.pushtime,120)) as 最高价,

(select min(price) from tb where code=a.code and convert(varchar(10),pushtime,120)=convert(varchar(10),a.pushtime,120)) as 最低价

from tb a

-晴天 2011-12-14

打赏
举报

前面做过一个股票最高价最低价的查询.你找找看.

im2web 2011-12-14

打赏
举报

id 是股票的代号

打字错误

im2web 2011-12-14

打赏
举报

简单的和股票对比就还很清楚了

id 是股票的时间
date 是某个时间点
value 是某个时间点的股价

现在有所有历史数据

要求得出所有股票的今天最高值最低值以及这些值出现在什么时候
以及开盘价停盘加开盘价和停盘的时间

就是一模一样的

Leon_He2014 2011-12-14

打赏
举报

给出测试数据和期望的结果吧。猜很累的~

im2web 2011-12-14

打赏
举报

CREATE TABLE `abcd` (
`id` INT(10) NULL DEFAULT NULL,
`date` INT(10) NULL DEFAULT NULL,
`value` INT(10) NULL DEFAULT NULL
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM
ROW_FORMAT=DEFAULT

INSERT INTO `abcd` (`id`, `date`, `value`) VALUES
(1, 1, 10),
(1, 2, 4),
(1, 1, 9),
(1, 2, 1),
(1, -1, 1),
(1, 4, 1),
(1, 10, 4);

差不多就是这个样子

gogodiy 2011-12-14

打赏
举报

给出表结构和数据吧，猜很累的～～

im2web 2011-12-14

打赏
举报

大侠们多谢但是我要的是每个id 的统计

id = xx 不是单个的么？

单个id 可以手动run 4个sql 然后把结果合并起来。

我要的是group by id 的做法

--小F-- 2011-12-14

打赏
举报



select

  *

from

(select id , recvdate , value where id = xx order by recvdate desc limit 1)a

inner join

(select id , recvdate , value where id = xx order by recvdate asc limit 1)b

on

 a.id=b.id

inner join

(

select id , recvdate , value where id = xx order by value desc limit 1

)c

on

  b.id=c.id

inner join

(

select id , recvdate , value where id = xx order by value desc limit 1

)d

on

  c.id=d.id