代码之家 › 专栏 › 技术社区 › Vikram Karthic

通过在闪亮的应用程序文本框中键入的文本显示匹配的句子

quanteda tm shiny r

Vikram Karthic · 技术社区 · 7 年前

我试过了 kwic

require(quanteda)
require(tm)
data(crude, package = "tm")
mycorpus <- corpus(crude)

kwic(mycorpus, "company") # Pass the words from the text box corpus

请求帮助。。。

1 回复 | 直到 7 年前

mpadge 7 年前

我想你要的是,

table(kwic(mycorpus, phrase, join = FALSE)$keyword)

哪里 phrase 只是随着输入更多的术语而加长。(需要 quanteda >= 0.99 ,其中还包括函数,此处可能有用。)对于更一般的匹配,您可以转换语料库和所有输入的术语(以不断加长的方式) )进入标记词干

mystems <- corpus(crude) %>% texts() %>% tokens() %>% tokens_wordstem()
phrase <- tokens(phrase, remove_punct = TRUE, remove_symbols = TRUE) %>%
    tokens_wordstem(language = "greek") %>% # or whatever
    as.character()

table(kwic(mystems, phrase, join = FALSE)$keyword) 应该做同样的事情,但只匹配词干,而不是精确的单词。如果您想要与每个文档匹配的字数,那么 *apply purrr::map() )也会提取出来。

推荐文章

Bhavya · 从R中的dtm中按每个文档的频率提取顶部特征

7 年前

Travis Heeter · 如何在文档术语矩阵中组合术语?

7 年前

Jacek Kotowski · 使用哈希字典的柠檬化函数不适用于R中的tm包

7 年前

Doug Fir · 无法获取tm_地图以使用mc。核心参数

7 年前

jiji · 为什么我不能使用“TermDocumentMatrix”?

7 年前

Vikram Karthic · 通过在闪亮的应用程序文本框中键入的文本显示匹配的句子

7 年前

user1603472 · tm软件包中DocumentTermMarix功能的默认控制设置是什么?

7 年前

Adrian del rio rodriguez · 读取R中文件夹中的多个本地html文件

7 年前

Jacek Kotowski · R: regexpr()如何在模式参数中使用向量

7 年前

Rahul Chawla · 为R中的不同特征指定权重

8 年前