代码之家 › 专栏 › 技术社区 › mamafoku

在将Sklearn GridSearchCV与管道一起使用时如何传递权重

grid-search cross-validation pipeline scikit-learn python

mamafoku · 技术社区 · 6 年前

我正在研究一个文本分类模型,并使用 Pipeline 外加 GridSearch Cross Validation .下面的代码段:

count_vec=CountVectorizer(ngram_range=(1,2),stop_words=Stopwords_X,min_df=0.01)
TFIDF_Transformer=TfidfTransformer(sublinear_tf=True,norm='l2')

my_pipeline=Pipeline([('Count_Vectorizer',count_vec),
                    ('TF_IDF',TFIDF_Transformer),
                    ('MultiNomial_NB',MultinomialNB())])

param_grid={'Count_Vectorizer__ngram_range':[(1,1),(1,2),(2,2)],
               'Count_Vectorizer__stop_words':[Stopwords_X,stopwords],
               'Count_Vectorizer__min_df':[0.001,0.005,0.01],
               'TF_IDF__sublinear_tf':[True,False],
               'TF_IDF__norm':['l2'],
               'TF_IDF__smooth_idf':[True,False],
               'MultiNomial_NB__alpha':[0.2,0.4,0.5,0.6],
               'MultiNomial_NB__fit_prior':[True,False]}

# Grid Search CV with pipeline
model=GridSearchCV(estimator=my_pipeline,param_grid=param_grid,
                   scoring=scoring,cv=4,verbose=1,refit=False)

然而 ,由于数据高度不平衡,我想将权重传递给 MultinomialNB 管道中的分类器。我知道我可以将权重传递给管道内的元素(如下所示):

model.fit(Data_Labeled['Clean-Merged-Final'], 
          Data_Labeled['Labels'],MultiNomial_NB__sample_weight=weights)

我的问题是,如何在没有形状错误的情况下进行编译? 因为权重只传递给管道中的最终元素(多项式\u NB分类器),而CV对进入管道的X/Y馈送进行分区。

1 回复 | 直到 6 年前

Vivek Kumar 6 年前

GridSearchCV根据交叉验证迭代器处理sample\u权重的适当分解。

GridSearchCV调用 _fit_and_score() 方法,并传递训练数据的索引。到目前为止,fit\u参数用于整个数据。现在这个函数依次调用 _index_param_value ,它处理 sample_weight (或其他fit\u参数) 在此行中:

     ...
     return safe_indexing(v, indices)
     ...

这已在以下问题中讨论:

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前

Brian Johnson · 为什么在Python中列出字典列表会引发TypeError?[已关闭]

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

Ashok Shrestha · 需要追踪特定的颜色线并获取坐标

1 年前

Nicote Ool · 在FastApi和Vue3中获得422

1 年前

NeoExceptCodeBad · 如果我有很多垂直线,我如何找到它们的边缘?

1 年前

Abdulaziz · 如何对集合内的列表进行排序[重复]

1 年前

user2743931 · 带有src目录的Python setup.py

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前