B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19980331 00107241 00107241000000 00107241000000 00107241000000 00100000 00100000000000 00100000000000 00100000000000 00000100
可以看出,前3行有10列,而第4行有11列,所以当我读取thsi文件时:
import pandas as pd
import numpy as np
df =pd.read_csv('C:\Users\Petter\Desktop\info.txt'ï¼sep=r"\s+", header=None, dtype=str, engine="python")
df
我得到这个和一个错误:
0 1 2 3 4 5 6 7 8 9
0 B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
1 B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
2 B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
Skipping line 4: Expected 10 fields in line 4, saw 11. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
理想情况下,它应该自动向df中再添加一列。输出应如下所示:
0 1 2 3 4 5 6 7 8 9 10
0 B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
1 B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
2 B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
我试过:
df = pd.DataFrame(pd.np.empty((0, 11)))
但它不起作用。