我正在尝试拆分包含消息的字符向量,对吗
在前面
日期时间指示器的。
我在考虑使用
strsplit()
使用正则表达式和
perl = TRUE
以下是一些示例数据:
TEST <- c("05.10.17, 09:26 - Person One: How about we chill on sunday\n05.10.17, 09:27 - Person One: I could bring some beer\n05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n05.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n05.10.17, 09:27 - Person Two: ???\n05.10.17, 09:28 - Person Two: You guys have history?\n05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n")
这就是我迄今为止所尝试的:
Cut <- unlist(strsplit(TEST,"(?=[0-3][0-9][.][0-9]{2}[.][0-9]{2}[,][ ][0-9]{2}:[0-9]{2})", perl = TRUE))
Cut
根据
this website
,正则表达式应该在日期时间指示器的正前方剪切字符串。然而,我得到的结果是这样的,第一个角色被切掉了:
[1] "0"
[2] "5.10.17, 09:26 - Person One: How about we chill on sunday\n"
[3] "0"
[4] "5.10.17, 09:27 - Person One: I could bring some beer\n"
[5] "0"
[6] "5.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n"
[7] "0"
[8] "5.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n"
[9] "0"
[10] "5.10.17, 09:27 - Person Two: ???"
[11] "0"
[12] "5.10.17, 09:28 - Person Two: You guys have history?\n"
[13] "0"
[14] "5.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n"
这就是结果
应该
看起来像:
[1] "05.10.17, 09:26 - Person One: How about we chill on sunday\n"
[2] "05.10.17, 09:27 - Person One: I could bring some beer\n"
[3] "05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n"
[4] "05.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n"
[5] "05.10.17, 09:27 - Person Two: ???\n"
[6] "05.10.17, 09:28 - Person Two: You guys have history?\n"
[7] 05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n"
注意:我不能在换行符处拆分数据,因为某些消息包含消息中间的一个或多个数据。