代码之家 › 专栏 › 技术社区 › Economist_Ayahuasca

python:基于索引事件将时间间隔数据排序为两天的chuck

pandas python

Economist_Ayahuasca · 技术社区 · 1 年前

我有以下数据:

df = 
id date_medication medication index_date
1  2000-01-01      A          2000-01-04
1  2000-01-02      A          2000-01-04
1  2000-01-05      B          2000-01-04
1  2000-01-06      B          2000-01-04
2  2000-01-01      A          2000-01-05
2  2000-01-03      B          2000-01-05
2  2000-01-06      A          2000-01-05
2  2000-01-10      B          2000-01-05

我想把数据转换成两天的索引事件(IE)。即创建表示时间间隔的新列,例如:

df =
id -4 -2 0  2  4  6 
1  A  A  IE B  0  0
2  A  B  IE A  A  B

2 回复 | 直到 1 年前

Corralien 1 年前

您可以使用:

# Compute the delta
days = df['date_medication'].sub(df['index_date']).dt.days

# Create desired bins and labels
bins = np.arange(days.min() - days.min() % 2, days.max() + days.max() % 2 + 1, 2)
lbls = bins[bins != 0]  # Exclude 0
df['interval'] = pd.cut(days, bins, labels=lbls, include_lowest=True, right=False)

# Reshape your dataframe
out = (df.pivot(index='id', columns='interval', values='medication')
         .reindex(bins, fill_value='IE', axis=1).fillna(0)
         .rename_axis(columns=None).reset_index())

输出:

>>> out
   id -4 -2   0  2  4  6
0   1  A  A  IE  B  B  0
1   2  A  B  IE  A  0  B

jezrael 1 年前

使用:

#convert columns to datetimes
df['date_medication'] = pd.to_datetime(df['date_medication'])
df['index_date'] = pd.to_datetime(df['index_date'])

#get 2 days chunks
s = df['date_medication'].sub(df['index_date']).dt.days // 2 * 2
#add 2 days for greater/equal values 0 
s.loc[s.ge(0)] += 2

#pivoting columns
df1 = df.assign(g = s).pivot(index='id', columns='g', values='medication')
#added 0 column
df1.loc[:, 0] = 'IE'
#added 0 column
df1 = (df1.rename_axis(columns=None)
         .reindex(columns=range(df1.columns.min(), df1.columns.max() + 2, 2), fill_value=0)
         .fillna(0)
         .reset_index())
   id -4 -2   0  2  4  6
0   1  A  A  IE  B  B  0
1   2  A  B  IE  A  0  B

细节 :

print (s)
0   -4
1   -2
2    2
3    4
4   -4
5   -2
6    2
7    6
dtype: int64

推荐文章

Mainland · Python数据帧规范化值错误:列的长度必须与键相同

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

rpn · 如何在列[1]中连续第二次出现“0”时返回列[0]的值

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前

Gtoth · 如何分割Pandas DataFrame中包含多个日期的两个时间戳之间的差异

1 年前

Domarius · 使用loc为多行设置多列值

1 年前

Swastik Bhattacharyya · 如何在同一类别类型的多列上运行get_dummies()函数?

1 年前

DrZoidberg09 · 如何在字典列表中创建一个新关键字,该关键字是另一个关键字的总和?

1 年前

armstrong3701 · 如何有效地处理熊猫数据框中缺失的数据并计算条件统计?

1 年前

msts1906 · 大熊猫向乳胶的适当多品种出口

1 年前