我正在研究病人生病时的通讯线路。例如:一个人生病去看医生(A),然后去医院(B),联系保险公司(C)等等。每个病人的顺序是不同的。例如,一个病人直接去医院,另一个人先检查保险等,我们全程跟踪病人,在与不同的权威机构接触后,让他们再填写一份调查表。所以在每一个授权(“步骤”)之后,我们得到了一个调查的分数。这为我提供了以下数据集设置(实际上它是一个非常大的数据集):
Patient<-c(1,1,1,1,1,1,1,2,2,2,2)
sample6<-c("A","A","A","A","A","A","A","A","A","A","A")
sample5<-c("Stop","B","B","B","B","B","B","Stop","C","C","C")
sample4<-c(NA,"Stop","C","C","C","C","C",NA, "Stop","F","F")
sample3<-c(NA,NA,"Stop","D","D","D","D",NA, NA,"Stop","G")
sample2<-c(NA,NA,NA,"Stop","E","E","E",NA, NA,NA,"Stop")
sample1<-c(NA,NA,NA,NA, "Stop","F","F",NA,NA,NA, NA)
sample0<-c(NA,NA,NA,NA, NA,"Stop","G",NA,NA,NA, NA)
sample00<-c(NA,NA,NA,NA, NA,NA,"Stop",NA,NA,NA, NA)
Score<-c(90,88,65,44,78,98,66,38,93,88,80)
Time<-c("01-01-2018", "02-01-2018", "03-01-2018", "04-01-2018", "05-01-2018", "06-01-2018", "07-01-2018","01-02-2018", "02-02-2018", "05-02-2018", "06-02-2018")
df<-data.frame("Patient"=Patient, "step0"=sample6, "step1"=sample5, "step2"=sample4, "step3"=sample3, "step4"=sample2,
"step5"=sample1,"step6"= sample0, "step7"=sample00, "Score"=Score, "Time"=Time)
> df
Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time
1 1 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 90 01-01-2018
2 1 A B Stop <NA> <NA> <NA> <NA> <NA> 88 02-01-2018
3 1 A B C Stop <NA> <NA> <NA> <NA> 65 03-01-2018
4 1 A B C D Stop <NA> <NA> <NA> 44 04-01-2018
5 1 A B C D E Stop <NA> <NA> 78 05-01-2018
6 1 A B C D E F Stop <NA> 98 06-01-2018
7 1 A B C D E F G Stop 66 07-01-2018
8 2 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 38 01-02-2018
9 2 A C Stop <NA> <NA> <NA> <NA> <NA> 93 02-02-2018
10 2 A C F Stop <NA> <NA> <NA> <NA> 88 05-02-2018
11 2 A C F G Stop <NA> <NA> <NA> 80 06-02-2018
例如:第1行在A权限之后有调查得分,第2行是针对同一患者的,在B权限之后有调查得分等。
现在我想比较具有相同最终过程的列,我将以“f”为例,但它也可以是“c”用于另一个分析。所以现在我要选择所有表示“f”的行作为最终权限,并选择之前的行,这样我就可以比较它们了。
所以我想创建这个数据集:
Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time Indicator
1 1 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 90 01-01-2018 0
2 1 A B Stop <NA> <NA> <NA> <NA> <NA> 88 02-01-2018 0
3 1 A B C Stop <NA> <NA> <NA> <NA> 65 03-01-2018 0
4 1 A B C D Stop <NA> <NA> <NA> 44 04-01-2018 0
5 1 A B C D E Stop <NA> <NA> 78 05-01-2018 Before
6 1 A B C D E F Stop <NA> 98 06-01-2018 After
7 1 A B C D E F G Stop 66 07-01-2018 0
8 2 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 38 01-02-2018 0
9 2 A C Stop <NA> <NA> <NA> <NA> <NA> 93 02-02-2018 Before
10 2 A C F Stop <NA> <NA> <NA> <NA> 88 05-02-2018 After
11 2 A C F G Stop <NA> <NA> <NA> 80 06-02-2018 0
我确实指出了包含“f”和前面的行:
ProcessColumns <- 2:9
d <- df[,ProcessColumns] == "F"
df$Indicator <- rowSums(d,na.rm=T)
df$filter[which(df$filter %in% 1)-1] <- "Before"
df$filter[which(df$filter %in% 1)] <- "After"
但现在它指出了所有包含“f”的行,而不仅仅是最后一行。有谁能帮我吗?