代码之家 › 专栏 › 技术社区 › Martin Bobak

如何对熊猫的数据帧应用函数?

apply dataframe pandas python

Martin Bobak · 技术社区 · 6 年前

我有一个保存病历的df,我需要确定一个人出院后去的第一个地点。df按id分组。

有3个选项:(1)在一个组中,如果任何一行的开始日期与第一行的结束日期相匹配,则将该位置作为第一个站点返回(如果有两行符合此条件,则两行都是正确的)。(2)如果第一个选项不存在,则在初始位置(3)之后选择第一个位置;否则,如果条件1和2不存在,则返回“home”

ID    color  begin_date    end_date     location
1     red    2017-01-01    2017-01-07   initial
1     green  2017-01-05    2017-01-07   nursing
1     blue   2017-01-07    2017-01-15   rehab
1     red    2017-01-11    2017-01-22   Health
2     red    2017-02-22    2017-02-26   initial
2     green  2017-02-26    2017-02-28   nursing
2     blue   2017-02-26    2017-02-28   rehab
3     red    2017-03-11    2017-03-22   initial
4     red    2017-04-01    2017-04-07   initial
4     green  2017-04-05    2017-04-07   nursing
4     blue   2017-04-10    2017-04-15   Health

预期结果:

ID   first_site
1    rehab
2    nursing
3    home
4    nursing

我的尝试失败了。我得到一个错误 "None of [Int64Index([8], dtype='int64')] are in the [index]" 没有太多关于错误的在线帮助。如果我移除 elif 关于val2的条件,那么我不会遇到错误。

def First(x):
   #compare each group first and see if there are any locations that match 
   val = x.loc[x['begin_date'] == x['end_date'].iloc[0], 'location']
   #find the first location after the initial stay
   val2 = x.loc[x[x.location=='initial'].index+1, 'location']
   if not val.empty:
       return val.iloc[0]
   elif not val2.empty:
       return val2.iloc[0]
   else:
       return 'Home'

final = df.groupby('ID').apply(First).reset_index(name='first_site')
print (final)

我做错什么了?

1 回复 | 直到 6 年前

wwii 6 年前

'ID' == 3 只有一排 val2 表达式试图索引不存在的位置。

首先检查一个组是否只有一行。

def First(x):
    if len(x) == 1:
        return_value = 'Home'
    else:
        val = x.loc[x['begin_date'] == x['end_date'].iloc[0], 'location']
        val2 = x.loc[x[x.location=='initial'].index+1, 'location']
        if not val.empty:
            return_value =  val.iloc[0]
        elif not val2.empty:
            return_value =  val2.iloc[0]
    return return_value

gb = df.groupby('ID')

>>> gb.apply(First)
ID
1      rehab
2    nursing
3       Home
4    nursing
dtype: object
>>>

推荐文章

user1245262 · 筛选Pandas数据帧时出现问题

1 年前

Foroand · 熊猫数据帧中的词频计数耗时过长

1 年前

user14696236 · 如何为每个对应的列创建一行[重复]

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

Karim Abou El Naga · 将带字符串的DataFrame绘制到堆叠条形图中

2 年前

The Great · 拆分并存储数据帧,但名称基于特定列中的唯一值

2 年前

nickolakis · 基于R中的列名复制列

2 年前

opposity · 形成一个数据帧,该数据帧包含R中包含类别和子类别的列

2 年前

A. Handler · 有没有办法将数据帧的列与完整列名向量相匹配?

2 年前

JasonX · 运行减法计算

2 年前