代码之家 › 专栏 › 技术社区 › Kilian

将多个值字段转换为因子[重复]

Kilian · 技术社区 · 6 年前

从csv文件读取输入会留下一个包含多个值的奇数字段,例如。

 Title                Genres
1     A [Item1, Item2, Item3]
2     B                      
3     C        [Item4, Item1]


df <- data.frame(c("A","B","C"), c("[Item1, Item2, Item3]","","[Item4, Item1]"), 
           stringsAsFactors = FALSE)
colnames(df) <- c("Title","Genres")

检索单个令牌的函数

extractGenre <- function(genreVector){
  strsplit(substring(genreVector,2,nchar(genreVector)-1),", ")
}

我有点迷茫于如何转换 项目1,。。。项目4 添加到因子中并将它们附加到数据帧中。虽然apply允许我在每一行上执行函数,但是下一步看起来如何?

3 回复 | 直到 6 年前

A. Suliman 6 年前

library(dplyr)
library(tidyr)

df %>% mutate(Genres=gsub('\\[|\\]|\\s+','',Genres)) %>%  #remove []
       separate(Genres,paste0('Gen',1:3)) %>%             #separate Genres to multiple columns
       gather(key,Genres,-Title) %>% select(-key) %>%     #Gather to Genres columns
       filter(!is.na(Genres)) %>% arrange(Title,Genres) %>%    #filter and arrange
       mutate(Genres=as.factor(Genres))     


   Title Genres
1     A  Item1
2     A  Item2
3     A  Item3
4     B       
5     C  Item1
6     C  Item4

Silentdevildoll 6 年前

我不确定这是否正是你要找的,但我的做法有点不同。我用了dplyr和grepl:

    df <- data.frame(c("A","B","C"), c("[Item1, Item2, Item3]","","[Item4, Item1]"), 
                     stringsAsFactors = FALSE)
    colnames(df) <- c("Title","Genres")
    df
    df1<-df%>%
      mutate(Item1 = ifelse(grepl("Item1",Genres), T,F),
             Item2 = ifelse(grepl("Item2",Genres), T,F),
             Item3 = ifelse(grepl("Item3",Genres), T,F),
             Item4 = ifelse(grepl("Item4",Genres), T,F))

 Title                Genres Item1 Item2 Item3 Item4
1     A [Item1, Item2, Item3]  TRUE  TRUE  TRUE FALSE
2     B                       FALSE FALSE FALSE FALSE
3     C        [Item4, Item1]  TRUE FALSE FALSE  TRUE

希望这有帮助

demarsylvain 6 年前

你可以使用这个函数 separate() 正如Uwe所建议的,但你的风格顺序似乎并不总是一样的。一个选项是使用 mutate() ,并使用函数 grepl() 以确定是否存在每个令牌。

df %>% 
    mutate(
        Item1 = grepl('Item1', Genres),
        Item2 = grepl('Item2', Genres),
        Item3 = grepl('Item3', Genres),
        Item4 = grepl('Item4', Genres)
    )

#   Title                Genres Item1 Item2 Item3 Item4
# 1     A [Item1, Item2, Item3]  TRUE  TRUE  TRUE FALSE
# 2     B                       FALSE FALSE FALSE FALSE
# 3     C        [Item4, Item1]  TRUE FALSE FALSE  TRUE

推荐文章

Marc B. · 使用ggplot2创建条形图时“缺少值”

1 年前

deschen · tidyverse与外部向量发生突变,该外部向量的元素是数据帧中的列值

1 年前

Laura · 在Shiny中使用可排序的包拖放名称,这些名称将成为图表

1 年前

Mallikarjun M · 如何使用随机森林进行时间序列预测?

1 年前

ly li · 模型摘要:当表格形状改变时,拟合优度消失

1 年前

C.Robin · 将marginaffects::predictions()的结果连接回main df?

1 年前

monotonic · 如何将格式为“col1+col3+col4”的数据帧的行名转换为一列数字向量“c(1,3,4)”?

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

RoyBatty · 统计每个字符在整个数据集中出现的次数

2 年前

stats_noob · R: 记录某个“行为”发生的循环的索引?

2 年前