代码之家  ›  专栏  ›  技术社区  ›  ChinookJargon

如何用数据集中其他地方的等效值替换NA?

  •  0
  • ChinookJargon  · 技术社区  · 7 年前

    我一直在做一个研究谷物主食的项目

                    nutrient.component.      grain nutrients
    1                Beta-carotene (μg) White Rice      0.00
    2                Beta-carotene (μg) Brown Rice        NA
    3                      Calcium (mg) White Rice     28.00
    4                      Calcium (mg) Brown Rice     23.00
    5                 Carbohydrates (g) White Rice     80.00
    6                 Carbohydrates (g) Brown Rice     77.00
    7                       Copper (mg) White Rice      0.22
    8                       Copper (mg) Brown Rice        NA
    9                       Energy (kJ) White Rice   1528.00
    10                      Energy (kJ) Brown Rice   1549.00
    11                          Fat (g) White Rice      0.66
    12                          Fat (g) Brown Rice      2.92
    13                        Fiber (g) White Rice      1.30
    14                        Fiber (g) Brown Rice      3.50
    15           Folate Total (B9) (μg) White Rice      8.00
    16           Folate Total (B9) (μg) Brown Rice     20.00
    17                        Iron (mg) White Rice      0.80
    18                        Iron (mg) Brown Rice      1.47
    19           Lutein+zeaxanthin (μg) White Rice      0.00
    20           Lutein+zeaxanthin (μg) Brown Rice        NA
    21                   Magnesium (mg) White Rice     25.00
    22                   Magnesium (mg) Brown Rice    143.00
    23                   Manganese (mg) White Rice      1.09
    24                   Manganese (mg) Brown Rice      3.74
    25  Monounsaturated fatty acids (g) White Rice      0.21
    26  Monounsaturated fatty acids (g) Brown Rice      1.05
    27                 Niacin (B3) (mg) White Rice      1.60
    28                 Niacin (B3) (mg) Brown Rice      5.09
    29       Pantothenic acid (B5) (mg) White Rice      1.01
    30       Pantothenic acid (B5) (mg) Brown Rice      1.49
    31                  Phosphorus (mg) White Rice    115.00
    32                  Phosphorus (mg) Brown Rice    333.00
    33  Polyunsaturated fatty acids (g) White Rice      0.18
    34  Polyunsaturated fatty acids (g) Brown Rice      1.04
    35                   Potassium (mg) White Rice    115.00
    36                   Potassium (mg) Brown Rice    223.00
    37                      Protein (g) White Rice      7.10
    38                      Protein (g) Brown Rice      7.90
    39              Riboflavin (B2)(mg) White Rice      0.05
    40              Riboflavin (B2)(mg) Brown Rice      0.09
    41        Saturated fatty acids (g) White Rice      0.18
    42        Saturated fatty acids (g) Brown Rice      0.58
    43                    Selenium (μg) White Rice     15.10
    44                    Selenium (μg) Brown Rice        NA
    45                      Sodium (mg) White Rice      5.00
    46                      Sodium (mg) Brown Rice      7.00
    47                        Sugar (g) White Rice      0.12
    48                        Sugar (g) Brown Rice      0.85
    49                 Thiamin (B1)(mg) White Rice      0.07
    50                 Thiamin (B1)(mg) Brown Rice      0.40
    51                   Vitamin A (IU) White Rice      0.00
    52                   Vitamin A (IU) Brown Rice      0.00
    53                  Vitamin B6 (mg) White Rice      0.16
    54                  Vitamin B6 (mg) Brown Rice      0.51
    55                   Vitamin C (mg) White Rice      0.00
    56                   Vitamin C (mg) Brown Rice      0.00
    57 Vitamin E, alpha-tocopherol (mg) White Rice      0.11
    58 Vitamin E, alpha-tocopherol (mg) Brown Rice      0.59
    59                  Vitamin K1 (μg) White Rice      0.10
    60                  Vitamin K1 (μg) Brown Rice      1.90
    61                        Water (g) White Rice     12.00
    62                        Water (g) Brown Rice     10.00
    63                        Zinc (mg) White Rice      1.09
    64                        Zinc (mg) Brown Rice      2.02
    

    糙米有四个NA值。
    基于此图, Graphic 我认为可以公平地假设糙米的NA值将非常接近白米的等效值。更准确的方法是镜像白米值,而不是将值转换为零。

    我的问题是,除了手动查找和输入糙米的白米当量营养素外,代码如何用白米的当量值替换NA?我希望结果能够转换铜的NA值;糙米与铜的价值相同;白米(0.22)。首先用零替换NA更好吗?但是如果我这样做,那么我有六种营养素的值为零,而不是钠的四个值。我试图找到正确的思维方式,通过代码解决这个问题。如能对此有所了解,我们将不胜感激。

    3 回复  |  直到 7 年前
        1
  •  4
  •   Uwe    7 年前

    假设输入数据的数据帧被调用 dt ,我们可以使用 fill 来自的函数 tidyr dt2 是最终输出。

    library(tidyr)
    
    dt2 <- dt %>% fill(nutrients)
    
    dt2
      nutrient.component.                         grain nutrients
    1                   1 Beta-carotene (µg) White Rice      0.00
    2                   2 Beta-carotene (µg) Brown Rice      0.00
    3                   3       Calcium (mg) White Rice     28.00
    4                   4       Calcium (mg) Brown Rice     23.00
    5                   5  Carbohydrates (g) White Rice     80.00
    6                   6  Carbohydrates (g) Brown Rice     77.00
    7                   7        Copper (mg) White Rice      0.22
    8                   8        Copper (mg) Brown Rice      0.22
    ...
    

    违约 填满 将归咎于 NA 基于上一行和最近的非NA行。因此,重要的是要确保每个糙米记录恰好是相关白米记录的下一行。

        2
  •  4
  •   Uwe    7 年前

    zoo 包有一些有用的函数需要处理 NA :

    library(data.table)
    setDT(DT)[, nutrients := zoo::na.aggregate(nutrients), by = nutrient.component][]
    
                      nutrient.component      grain nutrients
     1:        Beta-carotene (<U+00B5>g) White Rice      0.00
     2:        Beta-carotene (<U+00B5>g) Brown Rice      0.00
     3:                     Calcium (mg) White Rice     28.00
     4:                     Calcium (mg) Brown Rice     23.00
     5:                Carbohydrates (g) White Rice     80.00
     6:                Carbohydrates (g) Brown Rice     77.00
     7:                      Copper (mg) White Rice      0.22
     8:                      Copper (mg) Brown Rice      0.22
     9:                      Energy (kJ) White Rice   1528.00
    10:                      Energy (kJ) Brown Rice   1549.00
    11:                          Fat (g) White Rice      0.66
    12:                          Fat (g) Brown Rice      2.92
    13:                        Fiber (g) White Rice      1.30
    14:                        Fiber (g) Brown Rice      3.50
    15:    Folate Total (B9) (<U+00B5>g) White Rice      8.00
    16:    Folate Total (B9) (<U+00B5>g) Brown Rice     20.00
    17:                        Iron (mg) White Rice      0.80
    18:                        Iron (mg) Brown Rice      1.47
    19:    Lutein+zeaxanthin (<U+00B5>g) White Rice      0.00
    20:    Lutein+zeaxanthin (<U+00B5>g) Brown Rice      0.00
    ...
    

    注意第2、8和20行。

    data.table DT 在正确的位置 这避免了复制整个表以节省内存和时间。

        3
  •  3
  •   Rui Barradas    7 年前

    data.frame dat .

    我相信下面的代码可以做到这一点。它将df拆分为2行或1行的列表(示例中的最后一行是缺少糙米)。然后检查这些列表中是否有两行,糙米的营养成分是否 NA . 如果是这样,则指定白米的价值。然后,将结果列表收集回 数据框架

    sp <- split(dat, dat$nutrient.component.)
    res <- lapply(sp, function(x){
                if(nrow(x) == 2 & is.na(x$nutrients[x$grain == "Brown Rice"]))
                    x$grain[x$grain == "Brown Rice"] <- "White Rice"
                x
                }
            )
    
    rm(sp)   # tidy up
    
    res <- do.call(rbind, res)
    res