代码之家  ›  专栏  ›  技术社区  ›  Chris

如何查找csv文件与仅包含此csv一列的文件之间的差异

  •  0
  • Chris  · 技术社区  · 14 年前

    我有一个包含一些用户数据的csv文件,如下所示:

    "10333","","an.10","Kenyata","","Aaron","","","","","","","","","",""
    "12222","","an.4","Wendy","","Aaron","","","","","","","","","",""
    "14343","","aaron.5","Nanci","","Aaron","","","","","","","","","",""
    

    我还有一个文件,每行上都有这样的项目:

    an.10
    arron.5
    

    我只想找到列表文件中包含的csv文件中的行。

    所以期望的输出是:

    "10333","","an.10","Kenyata","","Aaron","","","","","","","","","",""
    "14343","","aaron.5","Nanci","","Aaron","","","","","","","","","",""
    

    (请注意.4如何不包含在此新列表中。)

    我有任何可用的环境,我愿意尝试除手动操作以外的任何操作,因为这个csv包含数百万条记录,并且列表中有大约10万条条目。

    2 回复  |  直到 14 年前
        1
  •  1
  •   relet    14 年前

    标识符的唯一性如何 an.10 诸如此类?

    也许一个非常小的x shell脚本就足够了:

    for i in $(uniq list.txt); do grep "\"$i\"" data.csv; done
    

    对于列表中的每个唯一条目,这将返回csv文件中的所有匹配行。但是,它不完全匹配第二列。(这可以用 awk 例如)

        2
  •  1
  •   Joy Dutta    14 年前

    如果csv文件是data.csv,列表文件是list.txt,我会这样做:

    for i in `cat list.txt`; do grep $i data.csv; done