代码之家 › 专栏 › 技术社区 › user9092346

将re应用于pandas数据帧

pandas regex python

user9092346 · 技术社区 · 6 年前

!!目的是将工作方法应用于pandas数据帧中的文本!!

鉴于我有如下句子:

“他邀请了两个人,养了三条狗。”

“她邀请了三个朋友,养了一只猫。”

每句话我都想用一个变量来计算有多少人被邀请,有多少宠物是宠物。这很容易通过regex实现:

sentence = 'He invited 2 people and pet 3 dogs.'

human = [r'(\d+) people', r'(\d+) friend']

for h in human:
    number = re.search(h, sentence, re.IGNORECASE)
    if number is not None:
        number = number.group(1)

print('humans invited: ',number)

现在,句子在“句子”列的pandas数据框中。数据框中还有一个名为“人类”的列和一个名为“宠物”的列。我现在要把第一句话,像上面所示的那样处理,把人类的结果写进“人类”栏,对宠物做同样的事情,把它写进宠物栏。但是,我不知道如何将此应用于pandas数据帧的逐行。

2 回复 | 直到 6 年前

Ben.T 6 年前

对于熊猫,你可以使用 str.extract 例如:

df['humans'] = df['sentence'].str.extract('(\d+) (?:people|friend)', re.IGNORECASE, expand=False)

宠物也一样

ALollz 6 年前

如果句子中只有两个数字而你总是期望 humans 来之前 pets 你可以一次得到全部:

df[['humans', 'pets']] = df.sentence.str.extract('(\d+).*?(\d+)', expand=True)

df 现在是:

                                          sentence humans    pets
0              He invited 2 people and pet 3 dogs.      2       3
1             She invited 3 friends and pet 1 cat.      3       1
2        She invited 13 friends and pet 145 frogs.     13     145
3  She invited 11243 friends and pet 141415 frogs.  11243  141415

推荐文章

lonix · 使用sed从JSON中提取非贪婪正则表达式

1 年前

me-me · regex检查电子邮件字符串是否有@后跟一个点以及点符号后至少2个字符[重复]

2 年前

Dave Guerrero · 是否有一个正则表达式模式来捕获字符串中直到第一个字母字符的数字?

2 年前

Dima Malko · 如何在指定符号前添加符号?

2 年前

shekharsabale · 从列表元素捕获子字符串

2 年前

maycca · 正则表达式:过滤年份数值大于某个值的文件?字符串中编码的年份

2 年前

Katia · 根据特定规则进行多行匹配

2 年前

Andrei Cleland · 在长正则表达式中包含unicode字符

2 年前

MHA · Pandas str.extract()以字母结尾的数字

2 年前

Slava Vir · 如何查找后面“/”之间的最后一组

2 年前