代码之家  ›  专栏  ›  技术社区  ›  James Adams

SQL:如何根据两个字段查找重复项?

  •  20
  • James Adams  · 技术社区  · 14 年前

    我在一个Oracle数据库表中有行,对于两个字段的组合应该是唯一的,但是表上没有设置唯一约束,所以我需要自己使用SQL查找所有违反约束的行。不幸的是,我微薄的SQL技能不能胜任这项任务。

    我的表有三个相关列:实体id、站点id和obs\年。对于每一行,station\u id和obs\u year的组合应该是唯一的,我想通过SQL查询来找出是否有违反这一点的行。

    我尝试了以下SQL(由 this previous question )但它对我不起作用(我对ORA-00918列的定义含糊不清):

    SELECT
    entity_id, station_id, obs_year
    FROM
    mytable t1
    INNER JOIN (
    SELECT entity_id, station_id, obs_year FROM mytable 
    GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes 
    ON 
    t1.station_id = dupes.station_id AND
    t1.obs_year = dupes.obs_year
    

    有人能提出我做错了什么和/或如何解决这个问题吗?

    8 回复  |  直到 7 年前
        1
  •  41
  •   Quassnoi    14 年前
    SELECT  *
    FROM    (
            SELECT  t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
            FROM    mytable t
            )
    WHERE   rn > 1
    
        2
  •  12
  •   Michael Pakhantsov    14 年前
    SELECT entity_id, station_id, obs_year
    FROM mytable t1
    WHERE EXISTS (SELECT 1 from mytable t2 Where
           t1.station_id = t2.station_id
           AND t1.obs_year = t2.obs_year
           AND t1.RowId <> t2.RowId)
    
        3
  •  2
  •   Basic    14 年前

    SELECT
    t1.entity_id, t1.station_id, t1.obs_year
    
        4
  •  2
  •   FrustratedWithFormsDesigner    14 年前

    重新编写您的查询

    SELECT
    t1.entity_id, t1.station_id, t1.obs_year
    FROM
    mytable t1
    INNER JOIN (
    SELECT entity_id, station_id, obs_year FROM mytable 
    GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes 
    ON 
    t1.station_id = dupes.station_id AND
    t1.obs_year = dupes.obs_year
    

    我认为模棱两可的列错误(ORA-00918)是因为 select 正在删除名称同时出现在表和子查询中的列,但没有指定是否要从中删除 dupes mytable (化名为 t1 ).

        5
  •  1
  •   fredley    14 年前

        6
  •  1
  •   user359040 user359040    14 年前

    SELECT t1.entity_id, t1.station_id, t1.obs_year
    FROM mytable t1
    INNER JOIN (
    SELECT station_id, obs_year FROM mytable 
    GROUP BY station_id, obs_year HAVING COUNT(*) > 1) dupes 
    ON 
    t1.station_id = dupes.station_id AND
    t1.obs_year = dupes.obs_year
    
        7
  •  0
  •   Vinay Bagale    11 年前
    SELECT  *
    FROM    (
            SELECT  t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
            FROM    mytable t
            )
    WHERE   rn > 1
    

    通过Quassnoi是最有效的大型表。 我对成本进行了分析:

    SELECT a.dist_code, a.book_date, a.book_no
    FROM trn_refil_book a
    WHERE EXISTS (SELECT 1 from trn_refil_book b Where
           a.dist_code = b.dist_code and a.book_date = b.book_date and a.book_no = b.book_no
           AND a.RowId <> b.RowId)
           ;
    

    花费1322341

    SELECT a.dist_code, a.book_date, a.book_no
    FROM trn_refil_book a
    INNER JOIN (
    SELECT b.dist_code, b.book_date, b.book_no FROM trn_refil_book b 
    GROUP BY b.dist_code, b.book_date, b.book_no HAVING COUNT(*) > 1) c 
    ON 
     a.dist_code = c.dist_code and a.book_date = c.book_date and a.book_no = c.book_no
    ;
    

    花费1271699英镑

    虽然

    SELECT  dist_code, book_date, book_no
    FROM    (
            SELECT  t.dist_code, t.book_date, t.book_no, ROW_NUMBER() OVER (PARTITION BY t.book_date, t.book_no
              ORDER BY t.dist_code) AS rn
            FROM    trn_refil_book t
            ) p
    WHERE   p.rn > 1
    ;
    

    付出代价

        8
  •  0
  •   grokster    10 年前
      SELECT entity_id, station_id, obs_year
        FROM mytable
    GROUP BY entity_id, station_id, obs_year
    HAVING COUNT(*) > 1
    

    GROUP BY 根据指定的列查找与任何其他行匹配的任何行。 这个 HAVING COUNT(*) > 1 表示我们只希望看到出现超过1次的行(因此是重复的)

        9
  •  0
  •   Ben Petersen    5 年前

    我认为这里的很多解决方案都很麻烦,很难理解,因为我有一个3列的主键约束,需要找到重复项。所以这里有一个选择

    SELECT id, name, value, COUNT(*) FROM db_name.table_name
    GROUP BY id, name, value
    HAVING COUNT(*) > 1