帮助编写一个查询:混乱的顺序操作的GROUP BY和秩序Help write a query: Confusion over order of operations of GROUP BY and ORDER BY

- 此内容更新于:2014-12-30
主题:

原文:

I have a table named Info of this schema:

int objectId;
int time;
int x, y;

There is a lot of redundant data in the system - that is, objectId is not UNIQUE. For each objectId there can be multiple entries of time, x, y.

I want to retrieve a list of the latest position of each object. I started out with this query:

SELECT * FROM Info GROUP BY objectId

That got me just the kind of list I was looking for. However I want also to get just the latest times for each Object, so I tried:

SELECT * FROM Info GROUP BY objectId ORDER BY time DESC

This gave me a time descended list of Infos. However, I don't think it did what I want - that is return me the latest time, x, y for each object.

Can anyone imagine a query to do what I want?

Update I have tried the top three solutions to see how they perform against each other on a dataset of about 50,000 Infos. Here are the results:

-- NO INDEX: forever
-- INDEX: 7.67 s

SELECT a.*
FROM Info AS a
  LEFT OUTER JOIN Info AS b ON (a.objectId = b.objectId AND a.time < b.time)
WHERE b.objectId IS NULL;

-- NO INDEX: 8.05 s
-- INDEX: 0.17 s

select a.objectId, a.time, a.x, a.y
  from Info a,
       (select objectId, max(time) time from Info group by objectId) b
  where a.objectId = b.objectId and a.time = b.time;

-- NO INDEX: 8.30 s
-- INDEX: 0.18 s

SELECT A.time, A.objectId, B.x, B.y
FROM
(
   SELECT max(time) as time, objectId 
   FROM Info
   GROUP by objectId
) as A 
INNER JOIN Info B
   ON A.objectId = b.objectId AND A.time = b.time;

By a margin, it would seem where outperforms inner join.

解决方案:
一种方法是使用子查询。 编辑:添加不同的,以防止重复的行如果objectId同时有多个记录。取决于你的数据如果这是必要的,这个问题作者提到有很多重复的行。(由Tomalak添加)
原文:

One way is using a subquery.

select distinct a.objectID, a.time, a.x, a.y
  from Info a,
       (select objectID, max(time) time from Info group by objectID) b
  where a.objectID = b.objectID and a.time = b.time

EDIT: Added DISTINCT to prevent duplicate rows if one objectId has multiple records with the same time. Depends on your data if this is necessary, the question author mentioned there were many duplicate rows. (added by Tomalak)

Eoin Campbell的回复:英雄所见略同,-)

(原文:Great minds think alike ;-))

Frank Krueger的回复:你们太好了!

(原文:You guys are too good!)

Tomalak的回复:使用不同的条款防止重复的行输出,在同一时间有多个条目烧焦objectId。

(原文:Using DISTINCT clause prevents duplicate rows in the output, in case there are more than one entry with the same time for a singe objectId.)

Torbjørn的回复:我喜欢Glomek& # 39;年代SQL风格比坎贝尔# 39;年代,所以我的投票,但他们基本上是相同的。# 39;实质性的年代的味道。

(原文:I like Glomek's SQL style better than Campbell's, so my vote goes for this one - but they are basically the same. It's just a matter of taste.)

Frank Krueger的回复:Tomalak:你能扩大?你能把Glomek& # 39;SQL和添加不同的条款?或者Glomek是不会# 39;t介意吗?

(原文:Tomalak: Can you expand on that? Can you take Glomek's SQL and add your DISTINCT clause? Or perhaps Glomek wouldn't mind?)

解决方案:
votenaders、解决方案行不通,如果x和y在anypoint衰减时间线。
原文:
SELECT A.time, A.objectID, B.X, B.Y
FROM
(
   SELECT max(time) as time, objectID 
   FROM table
   GROUP by objectID
) as A 
INNER JOIN table B
   ON A.objectID = b.objectID AND A.Time = b.Time

votenaders, solution won't work, if x & y decrement at anypoint in the time line.

解决方案:
不管怎样,这是另一种方式得到期望的结果。我的习惯在MySQL 4.0天,这样的技巧在子查询支持。 换句话说,给我看行,那里没有其他行存在具有相同objectID和更大的时间。这个自然返回行每objectID的最大时间。不需要GROUP BY。
原文:

For what it's worth, here's another way of getting the desired result. I got into the habit of doing tricks like this in the MySQL 4.0 days, before subqueries were supported.

SELECT a.*
FROM Info AS a
  LEFT OUTER JOIN Info AS b ON (a.objectID = b.objectID AND a.time < b.time)
WHERE b.objectID IS NULL;

In other words, show me the row where there no other row exists with the same objectID and a greater time. This naturally returns the row with the max time per objectID. No GROUP BY required.

Frank Krueger的回复:哇。这# 39;年代太棒了。谢谢你教我一些东西。

(原文:Wow. That's fantastic. Thanks for teaching me something.)

DOK的回复:不要,你需要一个马克斯?

(原文:Don't you need a MAX in there?)

Bill Karwin的回复:不,你不要# 39;t需要MAX()。当b。objectID是空的,这意味着不满足连接条件,这意味着当前行& # 39;一个# 39;有最大的时间价值具有相同objectID任何行。

(原文:No, you don't need a MAX(). When b.objectID IS NULL, that means the join condition was not met, which means the current row 'a' has the max time value of any row with the same objectID.)

解决方案:
这是一个很常见的在一行的所有信息,为连续是一组的一部分。 同样的问题以不同形式被问几次在过去的几周。我忘记如何措辞的问题。
原文:

This is a pretty common way of getting at all the information in a row, for a row that is part of a group.

Select Info.*
from Info
inner join
   (select ObjectId, max(time) as Latest
    from Info
    group by ObjectId)  I
on Info.ObjectId = I.ObjectID and Info.time = I.Latest

The same question has been asked in different forms a couple of times in the last couple of weeks. I forget how the questions were worded.