代码之家  ›  专栏  ›  技术社区  ›  Joe

如何用awk求重复行的值之和?

awk
  •  3
  • Joe  · 技术社区  · 6 年前

    我有一个包含11行的csv文件,如下所示:

    Order Date,Username,Order Number,No Resi,Quantity,Title,Update Date,Status,Price Per Item,Status Tracking,Alamat
    05 Jun 2018,Mildred@email.com,205583995140400,,2,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Syahrul Address
    05 Jun 2018,Mildred@email.com,205583995140400,,1,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Syahrul Address
    05 Jun 2018,Martha@email.com,205486016644400,,2,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Faishal  Address
    05 Jun 2018,Martha@email.com,205486016644400,,2,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Faishal  Address
    05 Jun 2018,Misty@email.com,205588935534900,,2,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Rutwan Address
    05 Jun 2018,Misty@email.com,205588935534900,,1,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Rutwan Address
    

    我要删除该文件中的重复项并将 Quantity 行。我希望结果是这样的:

    Order Date,Username,Order Number,No Resi,Quantity,Title,Update Date,Status,Price Per Item,Status Tracking,Alamat
    05 Jun 2018,Mildred@email.com,205583995140400,,3,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Syahrul Address
    05 Jun 2018,Martha@email.com,205486016644400,,4,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Faishal  Address
    05 Jun 2018,Misty@email.com,205588935534900,,3,Gold,05 Jun 2018 – 10:01,In Process,Rp3.000.000,Done,Rutwan Address
    

    我只想把 数量 一边划船,一边保持现状。我试过解决办法 this question 但只有当文件只有两行时,答案才有效,我有11行,所以它不起作用。我怎么用锥子?

    2 回复  |  直到 6 年前
        1
  •  1
  •   RavinderSingh13 Nikita Bakshi    6 年前

    直接从karafka的解决方案中进行改编,并在其中添加一些代码,以按照op的要求使行按正确的顺序(它们在输入文件中)显示出来。

    awk -F, '
    FNR==1{
      print;
      next}
    {
      val=$5;
      $5="~";
      a[$0]+=val
    }
    !b[$0]++{
      c[++count]=$0}
    END{
      for(i=1;i<=count;i++){
         sub("~",a[c[i]],c[i]);
         print c[i]}
    }' OFS=,   Input_file
    

    说明: 现在也在上面的代码中添加解释。

    awk -F, '                         ##Setting field separator as comma here.
    FNR==1{                           ##Checking condition if line number is 1 then do following.
      print;                          ##Print the current line.
      next}                           ##next will skip all further statements from here.
    {
      val=$5;                         ##Creating a variable named val whose value is 5th field of current line.
      $5="~";                         ##Setting value of 5th field as ~ here to keep all lines same(to create index for array a).
      a[$0]+=val                      ##Creating an array named a whose index is current line and its value is variable val value.
    }
    !b[$0]++{                         ##Checking if array b whose index is current line its value is NULL then do following.
      c[++count]=$0}                  ##Creating an array named c whose index is variable count increasing value with 1 and value is current line.
    END{                              ##Starting END block of awk code here.
      for(i=1;i<=count;i++){          ##Starting a for loop whose value starts from 1 to till value of count variable.
         sub("~",a[c[i]],c[i]);       ##Substituting ~ in value of array c(which is actually lines value) with value of SUMMED $5.
         print c[i]}                  ##Printing newly value of array c where $5 is now replaced with its actual value.
    }' OFS=, Input_file               ##Setting OFS as comma here and mentioning Input_file name here too.
    
        2
  •  5
  •   karakfa    6 年前

    awk 救命啊!

    $ awk 'BEGIN{FS=OFS=","} 
           NR==1{print; next} 
                {q=$5; $5="~"; a[$0]+=q} 
           END  {for(k in a) {sub("~",a[k],k); print k}}' file
    
    Order Date,Username,Order Number,No Resi,Quantity,Title,Update Date,Status,Price Per Item,Status Tracking,Alamat
    05 Jun 2018,Misty@email.com,205588935534900,,3,Gold,05 Jun 2018 - 10:01,In Process,Rp3.000.000,Done,Rutwan Address
    05 Jun 2018,Martha@email.com,205486016644400,,4,Gold,05 Jun 2018 - 10:01,In Process,Rp3.000.000,Done,Faishal  Address
    05 Jun 2018,Mildred@email.com,205583995140400,,3,Gold,05 Jun 2018 - 10:01,In Process,Rp3.000.000,Done,Syahrul Address
    

    注意,不能保证记录的顺序,但也不要求最初对它们进行排序。为了保持秩序有多种解决方案…

    此外,我使用 ~ 作为占位符。如果数据包含此字符,则可以用未使用的字符替换。

    更新

    保留顺序(基于行的第一个外观)

    $ awk 'BEGIN{FS=OFS=","} 
           NR==1{print; next} 
                {q=$5;$5="~"; if(!($0 in a)) b[++c]=$0; a[$0]+=q} 
           END  {for(k=1;k<=c;k++) {sub("~",a[b[k]],b[k]); print b[k]}}' file
    

    保留一个单独的结构来标记行的顺序并在该数据结构上迭代…