代码之家 › 专栏 › 技术社区 › Neal Barsch

在文本列中只提取后面跟着+的数字,并在新列中返回后面跟着模式的数字[R]

grepl dataframe r

Neal Barsch · 技术社区 · 10 月前

我有这样的数据:

sample_data <- data.frame(
  txtnumbers = c("text stuff +300.5","other stuff 40+ more stuff","text here -30 here too","30- text here","50+","stuff here 500+","400.5-" ),
  stringsAsFactors = F
)

我想提取后面跟着+符号的数字,并将值插入到一个新列中,忽略文本的其余部分,在没有数字后面跟着+的地方返回NA:

desired_data <- data.frame(
  txtnumbers = c("text stuff +300.5","other stuff 40+ more stuff","text here -30 here too","30- text here","50+","stuff here 500+","400.5-" ),
  desired_col = c(NA,40,NA,NA,50,500,NA),
  stringsAsFactors = F
)

有人能帮我做一个有效的功能吗?我可以使用parse_numeric解析数字,但只返回后面跟着+的数字会给我带来问题。谢谢

1 回复 | 直到 10 月前

Ronak Shah 10 月前

这里有一个选项,使用 stringr::str_extract

stringr::str_extract(sample_data$txtnumbers, "(\\d+)\\+", group = 1)
#[1] NA    "40"  NA    NA    "50"  "500" NA

现在,它们被提取为字符串。你可以裹起来 as.integer 把它们变成数字。

推荐文章

user1245262 · 筛选Pandas数据帧时出现问题

1 年前

Foroand · 熊猫数据帧中的词频计数耗时过长

1 年前

user14696236 · 如何为每个对应的列创建一行[重复]

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

Karim Abou El Naga · 将带字符串的DataFrame绘制到堆叠条形图中

2 年前

The Great · 拆分并存储数据帧,但名称基于特定列中的唯一值

2 年前

nickolakis · 基于R中的列名复制列

2 年前

opposity · 形成一个数据帧,该数据帧包含R中包含类别和子类别的列

2 年前

A. Handler · 有没有办法将数据帧的列与完整列名向量相匹配?

2 年前

JasonX · 运行减法计算

2 年前