代码之家  ›  专栏  ›  技术社区  ›  bill999

如何创建显示观察值所处百分位数范围的变量

  •  0
  • bill999  · 技术社区  · 1 月前

    说我有 iris 数据。

    我知道我可以创建一个变量,显示落入某个百分位数的值:

    library(tidyverse)
    iris %>% mutate(Range = cut(Sepal.Length, quantile(Sepal.Length, probs=c(0,.2,.4,.6,.8,1)),include.lowest=TRUE))
    

    这产生了:

       Sepal.Length Sepal.Width Petal.Length Petal.Width Species   Range
    1           4.3         3.0          1.1         0.1  setosa [4.3,4.6]
    2           4.4         2.9          1.4         0.2  setosa [4.3,4.6]
    3           4.6         3.1          1.5         0.2  setosa [4.3,4.6]
    4           4.6         3.4          1.4         0.3  setosa [4.3,4.6]
    5           4.7         3.2          1.3         0.2  setosa (4.6,4.8]
    6           4.8         3.4          1.6         0.2  setosa (4.6,4.8]
    7           4.8         3.0          1.4         0.1  setosa (4.6,4.8]
    8           4.9         3.0          1.4         0.2  setosa   (4.8,5]
    9           4.9         3.1          1.5         0.1  setosa   (4.8,5]
    10          5.0         3.6          1.4         0.2  setosa   (4.8,5]
    11          5.0         3.4          1.5         0.2  setosa   (4.8,5]
    12          5.1         3.5          1.4         0.2  setosa   (5,5.4]
    13          5.4         3.9          1.7         0.4  setosa   (5,5.4]
    14          5.4         3.7          1.5         0.2  setosa   (5,5.4]
    15          5.7         4.4          1.5         0.4  setosa (5.4,5.8]
    16          5.8         4.0          1.2         0.2  setosa (5.4,5.8]
    

    我如何创建另一个变量来显示观察值所属的百分位数范围?我不想用ifelse语句等手动创建变量,但希望有一个函数可以自动创建它。

    我正在寻找一种可以制作这样一张桌子的东西:

       Sepal.Length Sepal.Width Petal.Length Petal.Width Species   Percent  Range
    1           4.3         3.0          1.1         0.1  setosa [4.3,4.6]  [0,.2]
    2           4.4         2.9          1.4         0.2  setosa [4.3,4.6]  [0,.2]
    3           4.6         3.1          1.5         0.2  setosa [4.3,4.6]  [0,.2]
    4           4.6         3.4          1.4         0.3  setosa [4.3,4.6]  [0,.2]
    5           4.7         3.2          1.3         0.2  setosa (4.6,4.8]  (.2,.4]
    6           4.8         3.4          1.6         0.2  setosa (4.6,4.8]  (.2,.4]
    7           4.8         3.0          1.4         0.1  setosa (4.6,4.8]  (.2,.4]
    8           4.9         3.0          1.4         0.2  setosa   (4.8,5]  (.4,.6]
    9           4.9         3.1          1.5         0.1  setosa   (4.8,5]  (.4,.6]
    10          5.0         3.6          1.4         0.2  setosa   (4.8,5]  (.4,.6]
    11          5.0         3.4          1.5         0.2  setosa   (4.8,5]  (.4,.6]
    12          5.1         3.5          1.4         0.2  setosa   (5,5.4]  (.6,.8]
    13          5.4         3.9          1.7         0.4  setosa   (5,5.4]  (.6,.8]
    14          5.4         3.7          1.5         0.2  setosa   (5,5.4]  (.6,.8]
    15          5.7         4.4          1.5         0.4  setosa (5.4,5.8]  [.8,1]
    16          5.8         4.0          1.2         0.2  setosa (5.4,5.8]  [.8,1]
    
    1 回复  |  直到 1 月前
        1
  •  2
  •   Gregor Thomas    1 月前

    是的 另请参见 部分 ?quantile 帮助页面将为您指向 ecdf 功能, “对于其经验分布 quantile 相反” .

    有趣的是, ecdf() 是一个函数式函数,所以我们必须用它创建一个函数,然后在输入上调用该函数。那么我们可以 cut 结果就像你对分位数所做的那样。

    iris %>%
      mutate(
        Range = cut(Sepal.Length, quantile(Sepal.Length, probs=c(0,.2,.4,.6,.8,1)),include.lowest=TRUE),
        ecdf = cut(ecdf(Sepal.Length)(Sepal.Length), breaks = c(0, 0.2, .4, .6, .8, 1), include.lowest = TRUE)
      )
    
    #     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species      Range      ecdf
    # 1            5.1         3.5          1.4         0.2     setosa    (5,5.6] (0.2,0.4]
    # 2            4.9         3.0          1.4         0.2     setosa    [4.3,5]   [0,0.2]
    # 3            4.7         3.2          1.3         0.2     setosa    [4.3,5]   [0,0.2]
    # 4            4.6         3.1          1.5         0.2     setosa    [4.3,5]   [0,0.2]
    # 5            5.0         3.6          1.4         0.2     setosa    [4.3,5] (0.2,0.4]
    # 6            5.4         3.9          1.7         0.4     setosa    (5,5.6] (0.2,0.4]
    # 7            4.6         3.4          1.4         0.3     setosa    [4.3,5]   [0,0.2]
    # 8            5.0         3.4          1.5         0.2     setosa    [4.3,5] (0.2,0.4]
    # ...