代码之家  ›  专栏  ›  技术社区  ›  Phrogz

左侧外部联接的作用与内部联接类似

  •  2
  • Phrogz  · 技术社区  · 11 年前

    总结

    我的目标是找到每一个被分配到任务中的用户,然后在特定的日期范围内生成一些统计数据,并将这些统计数据与原始用户集相关联。当不存在特定用户的统计信息时,我希望在该用户的输出中有一行,但是 NULL 统计数据的值。


    我有一个复杂的SQL查询,看起来像这样(底部的实际查询):

    SELECT
      user_name, changeday, project_name
      sum(hour_delta) AS hours,
    FROM ( … ) tasked_users
    LEFT OUTER JOIN ( … ) a
    ON tasked_users.id=a.assignee_id
    WHERE
      (changeday IS NULL) OR (changeday >= … AND changeday <= …)
    GROUP BY user_name, changeday, a.project_name
    ORDER BY user_name, changeday, a.project_name;
    

    我的愿望是找到一大群用户,并将他们与 a 桌子当存在没有任何匹配条目的用户时 我想要null或 0 小时。

    遗憾的是,此查询只返回“a”中存在的用户的行。例如,一组特定的日期返回:

    {:user_name=>"Gavin", :hours=>0.0, :changeday=>2013-09-08, :project_name=>"Foo"}
    {:user_name=>"Steve", :hours=>1.0, :changeday=>2013-09-08, :project_name=>"Bar"}
    

    而不同的日期范围会导致找到不同的用户。的内容 tasked_users 子查询有14个不同的用户id/名称对。我需要 全部的 其中的一个将在结果中表示。


    查询示例

    如果它有所不同,或者您有其他关于改进查询的有用提示,下面是完整的查询。

    SELECT
      user_name,
      sum(hour_delta) AS hours,
      changeday,
      project_name
    FROM (
      SELECT DISTINCT
        users.id,
        users.name AS user_name
      FROM users
      INNER JOIN tasks AS tasks1
      ON users.id=tasks1.assignee_id
    ) tasked_users
    LEFT OUTER JOIN
    (
      SELECT
        (
          coalesce(cast(nullif(new_value,'') AS float),0) -
          coalesce(cast(nullif(old_value,'') AS float),0)
        ) AS hour_delta,
        task_id,
        tasks2.assignee_id AS assigned_log,
        fixin_id,
        projects.name AS project_name,
        date_trunc('day',task_log_entries.created_on) AS changeday
      FROM task_log_entries
      INNER JOIN tasks AS tasks2
      ON task_id=tasks2.id
      INNER JOIN fixins
      ON fixins.id=tasks2.fixin_id
      INNER JOIN projects
      ON projects.id=fixins.project_id
      WHERE field_id=18
    ) a
    ON tasked_users.id=a.assigned_log
    WHERE
      (changeday IS NULL)
      OR
      (changeday >= '2013-09-08' AND changeday <= '2013-09-08')
    GROUP BY user_name, changeday, a.project_name
    ORDER BY user_name, changeday, a.project_name;
    

    解释输出

    以下是结果 EXPLAIN 对于查询,以防有帮助(我不知道如何阅读并导出我需要的内容):

    GroupAggregate  (cost=1116.40..1116.99 rows=13 width=144)"}
      ->  Sort  (cost=1116.40..1116.43 rows=13 width=144)"}
            Sort Key: users.name, (date_trunc('day'::text, task_log_entries.created_on)), projects.name"}
            ->  Hash Left Join  (cost=1024.32..1116.16 rows=13 width=144)"}
                  Hash Cond: (users.id = tasks2.assignee_id)"}
                  Filter: ((date_trunc('day'::text, task_log_entries.created_on) IS NULL) OR ((date_trunc('day'::text, task_log_entries.created_on) >= '2013-09-08 00:00:00'::timestamp without time zone) AND (date_trunc('day'::text, task_log_entries.created_on) <= '2013-09-08 00:00:00'::timestamp without time zone)))"}
                  ->  HashAggregate  (cost=44.07..45.46 rows=139 width=12)"}
                        ->  Hash Join  (cost=5.13..40.09 rows=795 width=12)"}
                              Hash Cond: (tasks1.assignee_id = users.id)"}
                              ->  Seq Scan on tasks tasks1  (cost=0.00..24.01 rows=801 width=4)"}
                              ->  Hash  (cost=3.39..3.39 rows=139 width=12)"}
                                    ->  Seq Scan on users  (cost=0.00..3.39 rows=139 width=12)"}
                  ->  Hash  (cost=963.51..963.51 rows=1339 width=30)"}
                        ->  Hash Join  (cost=729.23..963.51 rows=1339 width=30)"}
                              Hash Cond: (fixins.project_id = projects.id)"}
                              ->  Hash Join  (cost=727.91..943.79 rows=1339 width=24)"}
                                    Hash Cond: (task_log_entries.task_id = tasks2.id)"}
                                    ->  Seq Scan on task_log_entries  (cost=0.00..197.46 rows=1339 width=20)"}
                                          Filter: (field_id = 18)"}
                                    ->  Hash  (cost=717.90..717.90 rows=801 width=12)"}
                                          ->  Hash Join  (cost=676.87..717.90 rows=801 width=12)"}
                                                Hash Cond: (tasks2.fixin_id = fixins.id)"}
                                                ->  Seq Scan on tasks tasks2  (cost=0.00..24.01 rows=801 width=12)"}
                                                ->  Hash  (cost=589.72..589.72 rows=6972 width=8)"}
                                                      ->  Seq Scan on fixins  (cost=0.00..589.72 rows=6972 width=8)"}
                              ->  Hash  (cost=1.14..1.14 rows=14 width=14)"}
                                    ->  Seq Scan on projects  (cost=0.00..1.14 rows=14 width=14)"}
    

    表定义

    以下是所有相关表格的说明。我没有修剪它们以删除任何“不相关”的列,所以您可以确定是否存在任何不明确的列名冲突。

    app=> \d task_log_entries
                                         Table "public.task_log_entries"
       Column   |            Type             |                           Modifiers
    ------------+-----------------------------+---------------------------------------------------------------
     id         | integer                     | not null default nextval('task_log_entries_id_seq'::regclass)
     task_id    | integer                     | not null
     user_id    | integer                     |
     field_id   | integer                     | not null
     created_on | timestamp without time zone | not null default now()
     new_value  | text                        |
     old_value  | text                        |
    Indexes:
        "task_log_entries_pkey" PRIMARY KEY, btree (id)
    Foreign-key constraints:
        "task_log_entries_field_id_fkey" FOREIGN KEY (field_id) REFERENCES log_fields(id)
        "task_log_entries_task_id_fkey" FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
        "task_log_entries_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
    
    
    app=> \d tasks
                                            Table "public.tasks"
         Column     |            Type             |                      Modifiers
    ----------------+-----------------------------+-----------------------------------------------------
     id             | integer                     | not null default nextval('fixins_id_seq'::regclass)
     fixin_id       | integer                     | not null
     created_on     | timestamp without time zone | not null default now()
     updated_on     | timestamp without time zone | not null default now()
     name           | character varying(200)      | not null
     description    | text                        |
     blocked_by     | character varying(200)      |
     estimate       | double precision            |
     actual         | double precision            |
     remaining      | double precision            |
     relative_order | integer                     |
     status_id      | integer                     | not null
     assignee_id    | integer                     |
    Indexes:
        "tasks_pkey" PRIMARY KEY, btree (id)
    Foreign-key constraints:
        "tasks_assignee_id_fkey" FOREIGN KEY (assignee_id) REFERENCES users(id) ON DELETE SET NULL
        "tasks_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
        "tasks_status_id_fkey" FOREIGN KEY (status_id) REFERENCES task_statuses(id)
    Referenced by:
        TABLE "task_comments" CONSTRAINT "task_comments_task_id_fkey" FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
        TABLE "task_log_entries" CONSTRAINT "task_log_entries_task_id_fkey" FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
        TABLE "users_tasks_notifications" CONSTRAINT "users_tasks_notifications_task_id_fkey" FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
    
    
    app=> \d fixins
                                           Table "public.fixins"
         Column     |            Type             |                      Modifiers
    ----------------+-----------------------------+-----------------------------------------------------
     id             | integer                     | not null default nextval('fixins_id_seq'::regclass)
     project_id     | integer                     | not null
     created_on     | timestamp without time zone | not null default now()
     updated_on     | timestamp without time zone | not null default now()
     name           | character varying(200)      | not null
     description    | text                        | not null
     status_id      | integer                     | not null
     reporter_id    | integer                     |
     assignee_id    | integer                     |
     priority_id    | integer                     | not null
     severity_id    | integer                     | not null
     likelihood_id  | integer                     | not null
     maturity       | integer                     | not null default 0
     version        | character varying(100)      |
     iteration_id   | integer                     |
     relative_order | integer                     |
     kind           | character varying(16)       | not null default 'Bug'::character varying
     specs          | character varying(50)       |
     estimate       | double precision            |
     blocked_by     | character varying(200)      |
     plan_estimate  | double precision            |
     actual         | double precision            |
     remaining      | double precision            |
     promise_date   | date                        |
    Indexes:
        "fixins_pkey" PRIMARY KEY, btree (id)
    Foreign-key constraints:
        "fixins_assignee_id_fkey" FOREIGN KEY (assignee_id) REFERENCES users(id) ON DELETE SET NULL
        "fixins_iteration_id_fkey" FOREIGN KEY (iteration_id) REFERENCES iterations(id) ON DELETE SET NULL
        "fixins_likelihood_id_fkey" FOREIGN KEY (likelihood_id) REFERENCES likelihoods(id)
        "fixins_priority_id_fkey" FOREIGN KEY (priority_id) REFERENCES priorities(id)
        "fixins_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id)
        "fixins_reporter_id_fkey" FOREIGN KEY (reporter_id) REFERENCES users(id) ON DELETE SET NULL
        "fixins_severity_id_fkey" FOREIGN KEY (severity_id) REFERENCES severities(id)
        "fixins_status_id_fkey" FOREIGN KEY (status_id) REFERENCES statuses(id)
    Referenced by:
        TABLE "bug_snapshots" CONSTRAINT "bug_snapshots_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE SET NULL
        TABLE "comments" CONSTRAINT "comments_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
        TABLE "customers_fixins" CONSTRAINT "customers_fixins_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id)
        TABLE "fixins_tags" CONSTRAINT "fixins_tags_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
        TABLE "log_entries" CONSTRAINT "log_entries_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
        TABLE "relationships" CONSTRAINT "relationships_fixin1_id_fkey" FOREIGN KEY (fixin1_id) REFERENCES fixins(id) ON DELETE CASCADE
        TABLE "relationships" CONSTRAINT "relationships_fixin2_id_fkey" FOREIGN KEY (fixin2_id) REFERENCES fixins(id) ON DELETE CASCADE
        TABLE "tasks" CONSTRAINT "tasks_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
        TABLE "users_notifications" CONSTRAINT "users_notifications_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
        TABLE "votes" CONSTRAINT "votes_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id)
    
    
    app=> \d projects
                                         Table "public.projects"
         Column     |          Type           |                       Modifiers
    ----------------+-------------------------+-------------------------------------------------------
     id             | integer                 | not null default nextval('projects_id_seq'::regclass)
     name           | character varying(50)   | not null
     link_name      | character varying(50)   | not null
     pain_threshold | integer                 | not null
     wiki_server    | character varying(100)  |
     wiki_wiki      | character varying(100)  |
     wiki_pattern   | character varying(1000) |
     active         | boolean                 | not null default true
    Indexes:
        "projects_pkey" PRIMARY KEY, btree (id)
        "projects_link_name_key" UNIQUE, btree (link_name)
    Referenced by:
        TABLE "fixins" CONSTRAINT "fixins_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id)
        TABLE "iterations" CONSTRAINT "iterations_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
        TABLE "project_preferences" CONSTRAINT "project_preferences_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id)
        TABLE "projects_users_notifications" CONSTRAINT "projects_users_notifications_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id)
        TABLE "releases" CONSTRAINT "releases_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
    
    
    app=> \d users
                                     Table "public.users"
      Column  |         Type          |                     Modifiers
    ----------+-----------------------+----------------------------------------------------
     id       | integer               | not null default nextval('users_id_seq'::regclass)
     name     | character varying(50) | not null
     email    | character varying(50) |
     active   | boolean               | not null default true
     passhash | character varying(40) |
     salt     | character varying(4)  |
    Indexes:
        "users_pkey" PRIMARY KEY, btree (id)
    Referenced by:
        TABLE "comments" CONSTRAINT "comments_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
        TABLE "fixins" CONSTRAINT "fixins_assignee_id_fkey" FOREIGN KEY (assignee_id) REFERENCES users(id) ON DELETE SET NULL
        TABLE "fixins" CONSTRAINT "fixins_reporter_id_fkey" FOREIGN KEY (reporter_id) REFERENCES users(id) ON DELETE SET NULL
        TABLE "log_entries" CONSTRAINT "log_entries_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
        TABLE "project_preferences" CONSTRAINT "project_preferences_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
        TABLE "projects_users_notifications" CONSTRAINT "projects_users_notifications_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
        TABLE "task_comments" CONSTRAINT "task_comments_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
        TABLE "task_log_entries" CONSTRAINT "task_log_entries_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
        TABLE "tasks" CONSTRAINT "tasks_assignee_id_fkey" FOREIGN KEY (assignee_id) REFERENCES users(id) ON DELETE SET NULL
        TABLE "users_notifications" CONSTRAINT "users_notifications_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
        TABLE "users_tasks_notifications" CONSTRAINT "users_tasks_notifications_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
    
    
    app=> \d log_fields
              Table "public.log_fields"
     Column |          Type          | Modifiers
    --------+------------------------+-----------
     id     | integer                | not null
     name   | character varying(200) | not null
    Indexes:
        "log_fields_pkey" PRIMARY KEY, btree (id)
    Referenced by:
        TABLE "log_entries" CONSTRAINT "log_entries_field_id_fkey" FOREIGN KEY (field_id) REFERENCES log_fields(id)
        TABLE "task_log_entries" CONSTRAINT "task_log_entries_field_id_fkey" FOREIGN KEY (field_id) REFERENCES log_fields(id)
    
    2 回复  |  直到 4 年前
        1
  •  4
  •   Erwin Brandstetter    6 年前

    查询可能可以简化为:

    SELECT u.name AS user_name
         , p.name AS project_name
         , tl.created_on::date AS changeday
         , coalesce(sum(nullif(new_value, '')::numeric), 0)
         - coalesce(sum(nullif(old_value, '')::numeric), 0) AS hours
    FROM   users             u
    LEFT   JOIN (
            tasks            t 
       JOIN fixins           f  ON  f.id = t.fixin_id
       JOIN projects         p  ON  p.id = f.project_id
       JOIN task_log_entries tl ON  tl.task_id = t.id
                               AND  tl.field_id = 18
                               AND (tl.created_on IS NULL OR
                                    tl.created_on >= '2013-09-08' AND
                                    tl.created_on <  '2013-09-09') -- upper border!
           ) ON t.assignee_id = u.id
    WHERE  EXISTS (SELECT 1 FROM tasks t1 WHERE t1.assignee_id = u.id)
    GROUP  BY 1, 2, 3
    ORDER  BY 1, 2, 3;
    

    这将返回所有曾经执行过任何任务的用户。
    加号数据 每个项目和天 其中数据存在于中的指定日期范围内 task_log_entries .

    要点

    • 这个 aggregate function sum() 忽略 NULL 价值观 COALESCE() 每行 一旦您将计算重新计算为两个总和的差值,就不再需要:

       ,coalesce(sum(nullif(new_value, '')::numeric), 0) -
        coalesce(sum(nullif(old_value, '')::numeric), 0) AS hours
      

      然而 如果 有可能 全部的 所选内容的列具有 无效的 或空字符串,将总和包装成 COALESCE 一旦
      我正在使用 numeric 而不是 float ,更安全的替代方案,以最大限度地减少舍入误差。

    • 您试图从的联接中获得不同的值 users tasks 是徒劳的,因为你加入了 task 再往下走一次。将整个查询展平,使其更简单、更快。

    • 这些 positional references 只是一种符号上的方便:

      GROUP BY 1, 2, 3
      ORDER BY 1, 2, 3
      

      …执行与原始查询相同的操作。

    • 要获得 date 来自 timestamp 你可以简单地转换为 日期 :

      tl.created_on::date AS changeday
      

      但最好使用中的原始值进行测试 WHERE 条款或 JOIN 条件(如果可能,这里也是可能的),所以Postgres可以在列上使用普通索引(如果可用):

       AND (tl.created_on IS NULL OR
            tl.created_on >= '2013-09-08' AND
            tl.created_on <  '2013-09-09')  -- next day as excluded upper border
      

      请注意 日期文字 转换为 时间戳 00:00 当天的 at your current time zone 。你需要选择 下一个 天和 排除 它作为上边界。或者提供更明确的时间戳文字,如 '2013-09-22 0:0 +2':: timestamptz 。有关排除上边界的详细信息:

    • 对于要求 every user who has ever been assigned to a task 添加 哪里 条款:

      WHERE EXISTS (SELECT 1 FROM tasks t1 WHERE t1.assignee_id = u.id)
      
    • 最重要的是 :A LEFT [OUTER] JOIN 保留联接左侧的所有行。添加 哪里 关于 正确的 表可以使这种效果无效。相反 将筛选器表达式移动到 加入 条款 更多解释如下:

    • 圆括号 可以用来强制表连接的顺序。简单查询很少需要,但在这种情况下非常有用。我使用该功能加入 任务 , fixins , projects 任务_日志条目 在离开之前加入所有 用户 -没有子查询。

    • Table aliases 使编写复杂查询变得更容易。

        2
  •  1
  •   nimdil    11 年前

    它不起作用,因为第一个查询是与任务内部连接的。同一个表用于执行外部联接(通过子查询,但仍然如此),但第一个查询(任务用户)首先没有相关记录(缺少匹配)。

    尝试使用

    ....
    FROM (
      SELECT DISTINCT
        users.id,
        users.name AS user_name
      FROM users    
    ) tasked_users
    ...