代码之家  ›  专栏  ›  技术社区  ›  notacodr

R导入初始行数不同的文件以跳过

  •  1
  • notacodr  · 技术社区  · 7 年前

    =~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2016.07.11 09:47:35 =~=~=~=~=~=~=~=~=~=~=~=
    up
    Upload #18
    Reader: S1  Site: AA
    --------- upload 18 start ---------
    Type,Date,Time,Duration,Type,Tag ID,Ant,Count,Gap
    E,2016-07-05,11:45:44.17,"upload 17 complete"
    D,2016-07-05,11:46:24.69,00:00:00.87,HA,900_226000745055,A2,8,1102
    D,2016-07-05,11:46:43.23,00:00:01.12,HA,900_226000745055,A2,10,143
    

    具有列标题的行是 "Type,Date,Time,Duration,Type,Tag ID,Ant,Count,Gap" . 数据应该有9列。问题是每个文件的标题字符串上方的行数不同,因此我不能简单地使用skip=5。我也只需要以开头的行 "D,"

    阅读文件的最佳方式是什么,确保我有9列并跳过所有垃圾?
    我一直在使用 read_csv readr() 因为到目前为止,它产生的格式问题最少。但是,我对任何新的想法都持开放态度,包括一种只阅读以开头的行的方式 “D,” . 我玩过使用 read.table skip = grep("Type," readLines(i)) ,但似乎无法正确找到标题字符串。这是我的基本代码:

    dataFiles <- Sys.glob("*.*")
    datalist <- list()
    for (i in dataFiles) {
     d01 <- read_csv(i, col_names = F, na = "NA", skip = 35)  
     # do clean-up stuff
     datalist[[i]] <- d 
    }
    
    3 回复  |  直到 7 年前
        1
  •  1
  •   You-leee    7 年前

    lines <- readLines(i)
    dataRows <- grep("^D,", lines)
    
    names <- unlist(strsplit(lines[grep("Type,", lines)], split = ","))
    
    data <- as.data.frame(matrix(unlist(strsplit(lines[dataRows], ",")), nrow = length(dataRows), byrow=T))
    names(data) <- names
    

    输出:

        Type       Date        Time    Duration Type           Tag ID Ant Count  Gap
    1      D 2016-07-05 11:46:24.69 00:00:00.87   HA 900_226000745055  A2     8 1102
    2      D 2016-07-05 11:46:43.23 00:00:01.12   HA 900_226000745055  A2    10  143
    
        2
  •  1
  •   D.sen    7 年前

    您可以使用自定义函数在每个文件上循环,并仅过滤以开头的文件 D type 列并在最后将它们全部绑定在一起。删除 bind_rows 如果你想把它们作为单独的列表。

    load_data <-function(path) {
      require(dplyr)
      setwd(path)
      files <- dir()
      read_files <- function(x) {
        data_file <- read.csv(paste(path, "/", x, ".csv", sep = ""), stringsAsFactors = FALSE, na.strings=c("","NA"))
        row.number <- grep("^Type$", data_file[,1])
        colnames(data_file) <- data_file[row.number,]
        data_file <- data_file[-c(1:row.number+1),]
        data_file <- data_file %>%
          filter(grepl("^D", Type))
        return(data_file)
      }
      data <- lapply(files, read_files)
    }
    
    list_of_file <- bind_rows(load_data("YOUR_FOLDER_PATH"))
    
        3
  •  1
  •   bmosov01    7 年前

    Type skip 选项,然后删除标题行之前的任何行。以下是一些帮助您入门的代码(未经测试):

    dataFiles <- Sys.glob("*.*")
    datalist <- list()
    for (i in dataFiles) {
     d01 <- read_csv(i, col_names = F, na = "NA")
     headerRow <- which( d01[,1] == 'Type' )
     d01 <- d01[headerRow+1,] # This keeps all rows after the header row.  
     # do clean-up stuff
     datalist[[i]] <- d 
    }
    

    如果要保留标题,可以使用:

    for (i in dataFiles) {
     d01 <- read_csv(i, col_names = F, na = "NA")
     headerRow <- which( d01[,1] == 'Type' )
     d01 <- d01[headerRow+1,]  # This keeps all rows after the header row.
     header <- d01[headerRow,] # Get names from header row.
     setNames( d01, header )   # Assign names.
     # do clean-up stuff
     datalist[[i]] <- d 
    }