我假设您的数据框列是打印的,字符串列是
character
类型,非
factor
. 将其转换为
性格
如果他们还没有。(请参见底部的示例数据。)
# first we will create empty FINAL columns
DF_1$Final_1 = NA
DF_1$Final_2 = NA
如果阶段为sub\u stat\u 1,则vlookup Result和Result\u 2来自类型\u 2(来自DF\u 2)
cond = DF_1[, 'Stage'] == "sub_stat_1"
from = "Type_2"
DF_1[cond, "Final_1"] = DF_2[match(DF_1[cond, "Result"], DF_2[, 'ID']), from]
DF_1[cond, "Final_2"] = DF_2[match(DF_1[cond, "Result_2"], DF_2[, 'ID']), from]
如果阶段为status\u 1,则vlookup Result和Type\u 1的Result\u 2(来自DF\u 2)。
cond = DF_1[, 'Stage'] == "status_1"
from = "Type_1"
DF_1[cond, "Final_1"] = DF_2[match(DF_1[cond, "Result"], DF_2[, 'ID']), from]
DF_1[cond, "Final_2"] = DF_2[match(DF_1[cond, "Result_2"], DF_2[, 'ID']), from]
如果stage为status\u 1或sub\u stat\u 1,但Result或Result\u 2为nothing,则在输出数据帧中给出“-”值。
我用缺失的值初始化,
NA
. 我鼓励你离开他们,但如果你真的想,你可以
DF_1[is.na(DF_1)] = "-"
.
将与DF\u 1 Result和Result\u 2相同的其他值分别保留到所需的输出列Final\u 1和Final\u 2。
cond = ! DF_1[, 'Stage'] %in% c("status_1", "sub_stat_1")
DF_1[cond, "Final_1"] = DF_1[cond, "Result"]
DF_1[cond, "Final_2"] = DF_1[cond, "Result_2"]
只要时间中有历元时间,Result和Result\u 2列(如果可能)将其分别转换为所需输出列time\u 2、Final\u 1和Final\u 2中的正常时间。
我把这个留给你-你可以用
as.POSIXct()
在你的大纪元时代,如果你提供原点,但你的整数在我看来太大了。你可能想
format
在将它们插入最后一列之前,您可以控制它们转换为字符时的外观。如果你在这方面需要更多的帮助,可以问一个单独的问题。
DF_1
# Sr.No. Stage Time Result Result_2 Final_1 Final_2
# 1 1 updated_date 1.516869e+12 1516868822361 1516868822350 1516868822361 1516868822350
# 2 2 id 1.516869e+12 ABC - ABC -
# 3 3 engine_date 1.516869e+12 1516868822000 - 1516868822000 -
# 4 4 blocked 1.516869e+12 80000 0 80000 0
# 5 5 updated_date 1.516869e+12 1516868822350 1516866877815 1516868822350 1516866877815
# 6 6 list 1.516869e+12 BCD - BCD -
# 7 7 sub_stat_1 1.516869e+12 AC-12 AC-14 Y YY
# 8 8 status_1 1.516869e+12 AC-25 AC-38 A CC
使用此数据:
DF_1
# Sr.No. Stage Time Result Result_2 Final_1 Final_2
# 1 updated_date 1.516869e+12 1516868822361 1516868822350 1516868822361 1516868822350
# 2 id 1.516869e+12 ABC - ABC -
# 3 engine_date 1.516869e+12 1516868822000 - 1516868822000 -
# 4 blocked 1.516869e+12 80000 0 80000 0
# 5 updated_date 1.516869e+12 1516868822350 1516866877815 1516868822350 1516866877815
# 6 list 1.516869e+12 BCD - BCD -
# 7 sub_stat_1 1.516869e+12 AC-12 AC-14 Y YY
# 8 status_1 1.516869e+12 AC-25 AC-38 A CC
使用此数据:
DF_1 = read.table(text = "Sr.No. Stage Time Result
Result_2
1 updated_date 1516868822411 1516868822361 1516868822350
2 id 1516868822411 ABC -
3 engine_date 1516868822411 1516868822000 -
4 blocked 1516868822411 80000 0
5 updated_date 1516868822398 1516868822350 1516866877815
6 list 1516868822398 BCD -
7 sub_stat_1 1516868779095 AC-12 AC-14
8 status_1 1516868642468 AC-25 AC-38", check.names = F, stringsAsFactors = FALSE, header = T)
DF_2 = read.table(text = "Sr.No. ID Type_1 Type_2
1 AC-12 X Y
2 AC-14 XX YY
3 AC-25 A B
4 AC-38 CC CD", check.names = F, stringsAsFactors = FALSE, header = T)