代码之家 › 专栏 › 技术社区 › John

使用R中的单词嵌入从文本变量预测数字变量

word-embedding nlp r

John · 技术社区 · 2 年前

我有一个包含电影评论的文本变量和另一个包含收视率的变量——我想尝试使用文本评论来预测收视率。

以下是一些示例数据:

movie_reviews <- c("I really loved the movie plot", "This movie really sucked", "I really found this movie thought provoking", "ahh what a boring movie", "A wonderful movie, with a wonderful end", "Great action movie: Very thrilling", "Worst movie ever, it never stopped being cheesy", "Enjoying, feelgood movie for the entire family", "I will definitely watch this movie again")

movie_ratings <- c(8, 2, 6, 3, 9, 8.5, 3.5, 9.5, 7.5)  
  
movie_df <- tibble(movie_reviews, movie_ratings)

非常感谢。

1 回复 | 直到 2 年前

Oscar Kjell 2 年前

为此,您可以使用 text -包裹

# Create word embedding representations of your text
help(textEmbed)
reviews_embeddings <- textEmbed(movie_df, 
                                model = "bert-base-uncased", # Select model you want from huggingface
                                layers = 11:12) # Select which layers you want to use

# Train the word embeddings to the numeric variable using ridge regression 
reviews_rating_model <- textTrain(reviews_embeddings$movie_reviews, 
                                  movie_df$movie_ratings) 
# See the results
reviews_rating_model

后果

$results

    Pearson's product-moment correlation

data:  predy_y$predictions and predy_y$y
t = 5.621, df = 7, p-value = 0.0003991
alternative hypothesis: true correlation is greater than 0
95 percent confidence interval:
 0.6785761 1.0000000
sample estimates:
      cor 
0.9047823

推荐文章

XYZ · 如何将每行的每个字转换为数据帧的数值

2 年前

August Nilsson · 计算R中两个单词嵌入之间的文本相似性时,`select()`不处理列表'

2 年前

John · 使用R中的单词嵌入从文本变量预测数字变量

2 年前

Mucida · BERT2:如何使用GPT2LMHeadModel开始一个句子,而不是完成它

2 年前

Sab Garduño · w2v_列中出现错误(trainFile=file_列,modelFile=model,stopWordsFile=file_stopwords)(下面是完整的错误文本)

2 年前

Bennet Weber · 有没有办法用python找到一个单词的反义词(意思相反的单词)?你知道数据集或nlp工具包吗?

2 年前

Michael W · 从数据帧创建术语频率矩阵的有效方法

2 年前

Moodhi · 计算GEC的F分数

2 年前

user18628526 · 什么时候使用Word2vec和一大堆单词?

2 年前

Merve · 如果我想返回列表上的一个操作,但当它返回空值时它保持不变,我怎么说呢?

2 年前