我试图只提取从第三行开始的两个部分(“入学”和“重新入学”)中的数据;a的第2列
关键词
在第1列中。
请参阅底部csv文件中的数据集示例。
Admissions
Not Started:12 Sent:3 Completed:3
Division Community ResidentName Date DocumentStatus Last Update
Test Station Jane Doe 9/12/2023 Sent 9/12/2023
Test Station 2 John Doe 9/12/2023 Not Started
Alibaba Fizgerald Super Man 9/12/2023 Not Started
Iceland Kingdom Super Woman 9/12/2023 Not Started
,,,,,
Readmissions
Not Started:1 Sent:0 Completed:1
Division Community Resident Name Date DocumentStatus Last Update
Station Kingdom Pretty Woman 9/12/2023 Not Started
My Goodness Ugly Man 7/21/2023 Completed 7/26/2023
,,,,
Discharge
Division Community Resident Name Date
Station Kingdom1 Pretty Woman2 8/22/2023
My Goodness1 Ugly Man1 4/8/2023
Landmark2 Nice Guys 9/12/2023
Iceland Kingdom2 Mr. Heroshi2 7/14/2023
More Kingdom 2 King Kong 8/31/2023
因此,逻辑是:
查找包含数据的行,其中column1=
'招生'
或
'重新任务'
。
往下走3行,向右走1列。
获取5列的所有数据,直到它碰到一个没有数据的行。
因此,我想获得蓝框部分作为每个的输出:
这是一幅插图:
我从
previous post
,但我有几个星期没能解决这个问题/找到解决方案,所以我发布了这个新消息。
import re
import pandas as pd
from io import StringIO
def read_block(names, igidx=True):
with open("Test1.csv") as f:
pat = r"(\w+),+$\n+(.+?)(?=\n\w+,+\n$|\Z)"
return pd.concat([
pd.read_csv(StringIO(m.group(2)), skipinitialspace=True)
.iloc[:, 1:].dropna(how="all") for m in re.finditer(
pat, f.read(), flags=re.M|re.S) if m.group(1) in names # optional
], keys=names, ignore_index=igidx)
df = read_block(names=["Admissions"])
print(df)
底部是电流输出,但它连接了所有三个部分。
我希望使用关键字(“
招生
“,”
重新分配
“)作为代码中的变量。
Sent: 3 Completed: 3 Unnamed: 3 Unnamed: 4 Unnamed: 5
0 Community Resident Name Date Document Status Last Update
1 Test Station Jane Doe 9/12/2023 Sent 9/12/2023
2 Test Station 2 John Doe 9/12/2023 Not Started NaN
3 Alibaba Fizgerald Super Man 9/12/2023 Not Started NaN
4 Iceland Kingdom Super Woman 9/12/2023 Not Started NaN
5 Sent: 0 Completed: 1 NaN NaN NaN
6 Community Resident Name Date Document Status Last Update
7 Station Kingdom Pretty Woman 9/12/2023 Not Started NaN
8 My Goodness Ugly Man 7/21/2023 Completed 7/26/2023
9 Community Resident Name Date NaN NaN
10 Station Kingdom1 Pretty Woman2 8/22/2023 NaN NaN
11 My Goodness1 Ugly Man1 4/8/2023 NaN NaN
12 Landmark2 Nice Guys 9/12/2023 NaN NaN
13 Iceland Kingdom2 Mr. Heroshi2 7/14/2023 NaN NaN
14 More Kingdom 2 King Kong 8/31/2023 NaN NaN
如何修改当前代码使其工作?