代码之家 › 专栏 › 技术社区 › user3353820

r如何将多个空格的文本文件拆分为数据帧

dataframe r

user3353820 · 技术社区 · 6 年前

我有一个包含以下字符的文本文件(其中一些字符在两个字符之间有多个空格):

a b c d  e     f     g  A B C D  E    F    G

我想把它们分成两行数据框,如下所示:

  H1 H2 H3 H4 H5 H6 H7
1 a  b  c  d  e  f  g
2 A  B  C  D  E  F  G

有人知道如何做到这一点吗?

3 回复 | 直到 6 年前

Maurits Evers 6 年前

你可以用 strsplit 然后变成两排 matrix (可选地 data.frame )

ss <- c("a b c d  e     f     g  A B C D  E    F    G")

as.data.frame(matrix(unlist(strsplit(ss, "\\s+")), nrow = 2, byrow = T))
#  V1 V2 V3 V4 V5 V6 V7
#1  a  b  c  d  e  f  g
#2  A  B  C  D  E  F  G

akrun 6 年前

我们可以这样做 read.table 在小写字母和大写字母之间的字符串中创建新行字符后

read.table(text=sub("(?<=[a-z])\\s+(?=[A-Z])", "\n", str1,
             perl = TRUE), header = FALSE, col.names = paste0("H", 1:7))
#  H1 H2 H3 H4 H5 H6 H7
#1  a  b  c  d  e  f  g
#2  A  B  C  D  E  F  G

如果模式在 n 单词

read.table(text = gsub("((\\S+\\s+){6}\\S+)\\s+", "\\1\n", str2), 
        header = FALSE, col.names = paste0("H", 1:7))

如果这是基于任何特定数量的字符,我们可以使用 scan 然后用 matrix 正如@Maurits Evers所展示的

matrix(scan(text=str1, what = "", quiet = TRUE), ncol=7, byrow = TRUE)

数据

str1 <- 'a b c d  e     f     g  A B C D  E    F    G'
str2 <- paste(str1, str1)

camille 6 年前

有两种方法可以利用 stringr ,是 tidyverse 是的。 str_split 在这种情况下,可以按模式拆分 "\\s+" 是的。设置 simplify = T 使其返回矩阵为了塑造你想要的形状,你可以从那个矩阵 nrow = 2 .

txt <- "a b c d  e     f     g  A B C D  E    F    G"
mtx <- stringr::str_split(txt, "\\s+", simplify = T)
as.data.frame(matrix(mtx, nrow = 2, byrow = T))
#>   V1 V2 V3 V4 V5 V6 V7
#> 1  a  b  c  d  e  f  g
#> 2  A  B  C  D  E  F  G

另一种方法是提取而不是分割。 str_extract_all 用于提取正则表达式的所有匹配项,并可选地返回矩阵。在这里,我将小写字母和大写字母提取为单独的矩阵,并 rbind 对他们说。

lower <- stringr::str_extract_all(txt, "[a-z]", simplify = T)
upper <- stringr::str_extract_all(txt, "[A-Z]", simplify = T)
as.data.frame(rbind(lower, upper))
#>   V1 V2 V3 V4 V5 V6 V7
#> 1  a  b  c  d  e  f  g
#> 2  A  B  C  D  E  F  G

你也可以跳过 lower &安培; upper 创建并一步完成:

as.data.frame(rbind(
  stringr::str_extract_all(txt, "[a-z]", simplify = T),
  stringr::str_extract_all(txt, "[A-Z]", simplify = T)
))
#>   V1 V2 V3 V4 V5 V6 V7
#> 1  a  b  c  d  e  f  g
#> 2  A  B  C  D  E  F  G

推荐文章

Marc B. · 使用ggplot2创建条形图时“缺少值”

1 年前

deschen · tidyverse与外部向量发生突变,该外部向量的元素是数据帧中的列值

1 年前

Laura · 在Shiny中使用可排序的包拖放名称,这些名称将成为图表

1 年前

Mallikarjun M · 如何使用随机森林进行时间序列预测?

1 年前

ly li · 模型摘要:当表格形状改变时,拟合优度消失

1 年前

C.Robin · 将marginaffects::predictions()的结果连接回main df?

1 年前

monotonic · 如何将格式为“col1+col3+col4”的数据帧的行名转换为一列数字向量“c(1,3,4)”?

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

RoyBatty · 统计每个字符在整个数据集中出现的次数

2 年前

stats_noob · R: 记录某个“行为”发生的循环的索引?

2 年前