代码之家  ›  专栏  ›  技术社区  ›  William

熊猫预期在第153行有10个字段,看到11个,如何再增加一列

  •  1
  • William  · 技术社区  · 3 年前

    B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
    B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
    B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
    B 19980331 00107241 00107241000000 00107241000000 00107241000000 00100000 00100000000000 00100000000000 00100000000000    00000100
    

    可以看出,前3行有10列,而第4行有11列,所以当我读取thsi文件时:

    import pandas as pd
        import numpy as np
        df =pd.read_csv('C:\Users\Petter\Desktop\info.txt',sep=r"\s+", header=None, dtype=str, engine="python")
        df
    

    我得到这个和一个错误:

        0   1   2   3   4   5   6   7   8   9
    0   B   19960331    00100000    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
    1   B   19960430    00099100    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
    2   B   19960531    00098500    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
    
    Skipping line 4: Expected 10 fields in line 4, saw 11. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
    

    理想情况下,它应该自动向df中再添加一列。输出应如下所示:

        0   1   2   3   4   5   6   7   8   9  10
    0   B   19960331    00100000    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
    1   B   19960430    00099100    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
    2   B   19960531    00098500    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
    

    我试过:

    df = pd.DataFrame(pd.np.empty((0, 11))) 
    

    但它不起作用。

    2 回复  |  直到 3 年前
        1
  •  2
  •   sitting_duck    3 年前

    这很有效,可能适合您的需要:

    df = pd.read_csv(... names=range(11))
    

    enter image description here

        2
  •  1
  •   Raja Wajahat    3 年前

    你可以用 错误\u错误\u行

    import pandas as pd
    import numpy as np
    df = pd.read_csv("C:\Users\Petter\Desktop\info.txt", header=None, delimiter=r"\s+", error_bad_lines=False)
    df