代码之家  ›  专栏  ›  技术社区  ›  sixtyfootersdude

MySQL:groupby如何处理没有聚合函数的列?

  •  7
  • sixtyfootersdude  · 技术社区  · 14 年前

    我有点困惑 group by 命令在mysql中有效。

    假设我有一张桌子:

    mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort;                   
    +----------+-----------------+---------------------+-------------------------------------------------+
    | recordID | IPAddress       | date                | httpMethod                                      |
    +----------+-----------------+---------------------+-------------------------------------------------+
    |        1 | 64.68.88.22     | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0         | 
    |        2 | 64.68.88.166    | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0           | 
    |        3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1                                  | 
    |        4 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/fcs_style.css HTTP/1.1             | 
    |        5 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/main_page.css HTTP/1.1             | 
    |        6 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/bigportaltopbanner.gif HTTP/1.1     | 
    |        7 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/right_1.jpg HTTP/1.1                | 
    |        8 | 64.68.88.165    | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 | 
    |        9 | 64.68.88.165    | 2003-07-09 00:02:44 | GET /news/sports/basketball.shtml HTTP/1.0      | 
    |       10 | 64.68.88.34     | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0          | 
    |       11 | 129.173.159.98  | 2003-07-09 00:03:46 | GET / HTTP/1.1                                  | 
    |       12 | 129.173.159.98  | 2003-07-09 00:03:46 | GET /include/fcs_style.css HTTP/1.1             | 
    |       13 | 129.173.159.98  | 2003-07-09 00:03:46 | GET /include/main_page.css HTTP/1.1             | 
    |       14 | 129.173.159.98  | 2003-07-09 00:03:48 | GET /images/bigportaltopbanner.gif HTTP/1.1     | 
    |       15 | 129.173.159.98  | 2003-07-09 00:03:48 | GET /images/left_1g.jpg HTTP/1.1                | 
    |       16 | 129.173.159.98  | 2003-07-09 00:03:48 | GET /images/webcam.gif HTTP/1.1                 | 
    +----------+-----------------+---------------------+-------------------------------------------------+
    

    当我执行这个语句时,它如何选择 recordID 包括,因为 记录ID 对吗?它只是选择第一个匹配的吗?

    mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
    +----------+-----------------+---------------------+-------------------------------------------------+
    | recordID | IPAddress       | date                | httpMethod                                      |
    +----------+-----------------+---------------------+-------------------------------------------------+
    |       11 | 129.173.159.98  | 2003-07-09 00:03:46 | GET / HTTP/1.1                                  | 
    |        3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1                                  | 
    |        8 | 64.68.88.165    | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 | 
    |        2 | 64.68.88.166    | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0           | 
    |        1 | 64.68.88.22     | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0         | 
    |       10 | 64.68.88.34     | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0          | 
    +----------+-----------------+---------------------+-------------------------------------------------+
    6 rows in set (0.00 sec)
    

    对于这张桌子 max(date) min(date) 在我看来,价值观似乎是合乎逻辑的,但我很困惑 记录ID httpMethod 在选择的地方。

    在一个命令中使用两个聚合函数是否安全?

    mysql> select recordID, IPAddress, min(date), max(date), httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
    +----------+-----------------+---------------------+---------------------+-------------------------------------------------+
    | recordID | IPAddress       | min(date)           | max(date)           | httpMethod                                      |
    +----------+-----------------+---------------------+---------------------+-------------------------------------------------+
    |       11 | 129.173.159.98  | 2003-07-09 00:03:46 | 2003-07-09 00:03:48 | GET / HTTP/1.1                                  | 
    |        3 | 129.173.177.214 | 2003-07-09 00:01:23 | 2003-07-09 00:01:23 | GET / HTTP/1.1                                  | 
    |        8 | 64.68.88.165    | 2003-07-09 00:02:43 | 2003-07-09 00:02:44 | GET /studentservices/responsible.shtml HTTP/1.0 | 
    |        2 | 64.68.88.166    | 2003-07-09 00:00:55 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0           | 
    |        1 | 64.68.88.22     | 2003-07-09 00:00:21 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0         | 
    |       10 | 64.68.88.34     | 2003-07-09 00:02:46 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0          | 
    +----------+-----------------+---------------------+---------------------+-------------------------------------------------+
    6 rows in set (0.00 sec)
    
    4 回复  |  直到 14 年前
        1
  •  13
  •   AndreKR    14 年前

    在select表达式中列出字段而不使用聚合函数时,通常使用GROUP BY是无效的SQL,应引发错误。

    然而,MySQL允许这样做,只需随机选择一个值。尽量避免它,因为它令人困惑。

    要禁止此操作,可以在运行时说:

    SET sql_mode := CONCAT('ONLY_FULL_GROUP_BY,',@@sql_mode);

    或者使用配置值和/或命令行选项 sql-mode .

    是的,列出两个聚合函数是完全有效的。

        2
  •  4
  •   Nick Eiden    12 年前

    因为我是新来的,显然我不能发布有用的图片,所以我会尝试用文字。。。

    我刚刚测试过,似乎不在GROUP BY中的字段的值将使用与groupby条件匹配的第一行的值。这也将解释其他人在选择不在GROUPBY子句中的列时所感受到的“随机性”。

    例子:

    创建一个名为“test”的表,其中有两列名为“col1”和“col2”,数据如下:

    Col1和Col2
    12个
    十二
    13个
    2 1个
    2 2个
    2 3个
    3 1个
    3 2个
    3 3个

    然后运行以下查询:

    选择col1,col2
    从测试
    按col2排序 描述

    您将得到以下结果:

    13个
    2 3个
    3 3个
    12个
    12个
    2 2个
    3 2个
    2 1个
    3 1个

    现在考虑以下查询:

    选择groupTable.col1,groupTable.col2
    从(
    选择col1,col2
    从测试
    按col2排序 描述
    )分组表
    按groupTable.col1分组
    按groupTable.col1 desc排序

    您将得到以下结果:

    3 3个
    2 3个
    13个

    将子查询更改为asc:

    选择col1,col2
    从测试
    按col2排序 asc公司

    结果:

    2 1个
    3 1个
    12个
    12个
    2 2个
    3 2个
    13个
    2 3个
    3 3个

    再次使用它作为子查询的基础:

    选择groupTable.col1,groupTable.col2
    从(
    选择col1,col2
    从测试
    按col2排序 asc公司
    )分组表
    按groupTable.col1分组
    按groupTable.col1 desc排序

    结果:
    3 1个
    2 1个
    12个

    现在,您应该能够看到子查询的顺序如何影响为已选定但不在GROUPBY子句中的字段选择哪些值。这可以解释其他人提到的所谓“随机性”,因为如果子查询(或缺少子查询)没有与ORDERBY子句组合,那么mysql将在行进入时抓取行,但是通过在子查询中定义排序顺序,您可以控制此行为并获得可预测的结果。

        3
  •  0
  •   Mario    14 年前

    我认为它会根据主键或任何索引获取第一行,因为它看起来是这样工作的,但是我尝试过对各种表进行逐组查询,但没有标识任何模式。

    因此,我将避免使用未分组列的任何值。

        4
  •  0
  •   ni30rocks    9 年前

    GROUPBY根据索引拾取第一条记录。假设Log_Analysis_Records_dalhouseishort表已重新编译为索引。因此,在recordID 11到16之间按11个recordID分组,得到IPAddress 129.173.159.98。但是,在某种程度上,min和max是按操作预先分组的,因此有一些值是按逻辑计算的。

    mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
    +----------+-----------------+---------------------+-------------------------------------------------+
    | recordID | IPAddress       | date                | httpMethod                                      |
    +----------+-----------------+---------------------+-------------------------------------------------+
    |       11 | 129.173.159.98  | 2003-07-09 00:03:46 | GET / HTTP/1.1                                  | 
    |        3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1                                  | 
    |        8 | 64.68.88.165    | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 | 
    |        2 | 64.68.88.166    | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0           | 
    |        1 | 64.68.88.22     | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0         | 
    |       10 | 64.68.88.34     | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0          | 
    +----------+-----------------+---------------------+-------------------------------------------------+
    6 rows in set (0.00 sec)