我有一个数据帧:
dfs = """
contract RB BeginDate ValIssueDate EndDate Valindex0
1 A00118 46 19000100 19880901 19841231 50
2 A00118 46 19850100 19880901 99999999 50
3 A00118 47 19000100 19880901 19831231 47
4 A00118 47 19840100 19880901 19841299 47
5 A00118 47 19850100 19880901 99999999 50
6 A00253 48 19000100 19820101 19811231 47
7 A00253 48 19820100 19820101 19841299 47
8 A00253 48 19850100 19820101 99999999 50
9 A00253 50 19000100 19820101 19781231 47
10 A00253 50 19790100 19820101 19841299 47
11 A00253 50 19850100 19820101 99999999 50
12 A00253 4L 20170101 19880901 99999999 39
"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s+',
dtype={"RB": str, "BeginDate": int, "EndDate": int,'ValIssueDate':int,'Valindex0':int})
contract RB BeginDate ValIssueDate EndDate Valindex0
1 A00118 46 19000100 19880901 19841231 50
2 A00118 46 19850100 19880901 99999999 50
3 A00118 47 19000100 19880901 19831231 47
4 A00118 47 19840100 19880901 19841299 47
5 A00118 47 19850100 19880901 99999999 50
6 A00253 48 19000100 19820101 19811231 47
7 A00253 48 19820100 19820101 19841299 47
8 A00253 48 19850100 19820101 99999999 50
9 A00253 50 19000100 19820101 19781231 47
10 A00253 50 19790100 19820101 19841299 47
11 A00253 50 19850100 19820101 99999999 50
12 A00253 4L 20170101 19880901 99999999 39
我想按以下条件删除行:
如果此行与其他行具有相同的“合同”和“RB”,但其“有效期”不在
“BeginDate”和“EndDate”,然后删除此行。
请注意最后一行,它具有唯一的RB,因此不应删除它。
index_names = df[ (df['ValIssueDate'] <= df['EndDate'] ) | (df['ValIssueDate'] >= df['BeginDate'])].index
# drop these given row
# indexes from dataFrame
df.drop(index_names, inplace = True)
此方法仅在1行内进行比较,但如何根据我的条件比较不同的行?
contract RB BeginDate ValIssueDate EndDate Valindex0
2 A00118 46 19850100 19880901 99999999 50
5 A00118 47 19850100 19880901 99999999 50
7 A00253 48 19820100 19820101 19841299 47
10 A00253 50 19790100 19820101 19841299 47
12 A00253 4L 20170101 19880901 99999999 39