我试图应用一些方程来得到一条线(数值数组)与另一条线(另一个数值数组)重合的比例。我有一个包含所需值的数据框,我试图创建一个新的列,根据两行的重合程度使用百分比结果。我已经用一些例子检查了代码(如下),它工作正常,但是当我应用
case_when()
对于数据帧来说,输出不是它应该的样子。我给你举个基本的例子。
这是我的输出。它有一个'ID'列[char],一个'date'(days)列[dttm],一个'result'(value)列[double],'difs'列是前一行[int]和'Grp'列之间的天数,后者是一个子分组值。
这是我正在使用的代码。这样做的目的是得到之前的值,并计算出向量的百分之多少位于另一个极限为[2,3]的向量中。现在我只检查每一行的条件是否正确。然而,当它应该得到'0',得到'A',或者有时当它应该得到'A'时得到'Inf',等等。我不明白为什么。我认为mutate可以独立地迭代组内的每一行,所以我不明白为什么与手工检查相比,结果是错误的。
Rsup = 3 # Highlimit of target array
Rinf = 2 # Low limit of target array
example_output = example%>%
arrange(id,Grp,day) %>%
group_by(id,Grp) %>% # Group by episodes (id + Grp)
mutate(from_r = lag(result)) %>% # get previous result y(t-1)
filter(difs != 0, difs < 181) %>% # dischard first sample of every subgroup/episode
mutate(
p_days = case_when(
(min(result,from_r) < Rinf) & (max(result,from_r) > Rsup) ~ 'A',
(min(result,from_r) > Rinf) & (max(result,from_r) < Rsup) ~ '100',
(min(result,from_r) < Rinf) & (max(result,from_r) > Rinf) ~ 'Inf',
(min(result,from_r) < Rsup) & (max(result,from_r) > Rsup) ~ 'Sup',
TRUE ~ '0')
)
# Case 'A': check interval yt - yt-1 cuts target array for both limits
# Case '100': all the interval yt - yt-1 is inside target array (100%)
# Case 'Inf': interval cuts low limit of target array
# Case 'Sup': interval cuts high limit of target array
# Case True ~ '0': interval does not cut target array and it is not inside (0%)
下面是创建基本示例的方法:
structure(list(id = c("A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B"), day = structure(c(19104, 19105,
19106, 19107, 19108, 19109, 19110, 19111, 19112, 19113, 19304,
19305, 19306, 19307, 19604, 19605, 19606, 19607, 19608, 19609,
19204, 19205, 19206, 19207, 19208, 19209, 19210, 19211, 19212,
19213, 19214, 19215, 19216, 19217, 19218, 19219, 19220, 19221,
19222, 19223), class = "Date"), result = c(1.55, 1.92, 3.6, 3.45,
3.3, 3.46, 2.79, 2.55, 2.08, 2.27, 2.44, 4.59, 1.8, 0.75, 3.13,
2.59, 2.16, 2.93, 1.38, 2.92, 3.19, 3.23, 3.48, 3.39, 2.62, 2.66,
3.77, 3.44, 3.06, 2.59, 2.87, 1.97, 2.5, 2.84, 1.48, 3.04, 2.62,
0.76, 2.74, 2.84), difs = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 191,
1, 1, 1, 297, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1), Grp = c(1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -40L), groups = structure(list(
id = c("A", "B"), .rows = structure(list(1:20, 21:40), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
当然,如果有人知道一个函数可以获得与我尝试使用mutate+case_时相同的输出,它也会非常有用。提前谢谢。
编辑:我认为mutate可以独立地迭代组中的每一行,所以我不明白为什么结果是错误的。也许它以某种方式混合了每一组的结果(以及来自r的)值?