代码之家  ›  专栏  ›  技术社区  ›  JE_Muc

sklearn分层k折CV,线性模型如ElasticNetCV

  •  0
  • JE_Muc  · 技术社区  · 5 年前

    sklearn 很简单,很直接。但是在设置 cv=5 在线性CV模型中,比如 ElasticNetCV LassoCV 是一个 KFold 简历。出于各种原因,我想使用 StratifiedKFold . 从 documentation ,好像 任何 cv= .

    经过 cv=KFold(5) 工作如预期,但 cv=StratifiedKFold(5)

    ValueError:支持的目标类型为:('binary','multiclass')。改为“连续”。

    cross_val_score 试穿之后,但我想通过 层状褶皱

    我最起码的工作示例是:

    from sklearn.linear_model import ElasticNetCV
    from sklearn.model_selection import KFold, StratifiedKFold
    import numpy as np
    
    x = np.arange(100, dtype=np.float64).reshape(-1, 1)
    y = np.arange(100) + np.random.rand(100)
    
    # KFold default implementation:
    model_default = ElasticNetCV(cv=5)
    model_default.fit(x, y)  # works fine
    # KFold given as cv explicitly:
    model_kfexp = ElasticNetCV(cv=KFold(5))
    model_kfexp.fit(x, y)  # also works fine
    
    # StratifiedKFold given as cv explicitly:
    model_skf = ElasticNetCV(cv=StratifiedKFold(5))
    model_skf.fit(x, y)  # THIS RAISES THE ERROR
    

    层状褶皱

    0 回复  |  直到 5 年前
        1
  •  3
  •   Sergey Bushmanov    5 年前

    问题的根源在于:

    y = np.arange(100) + np.random.rand(100)
    

    StratifiedKFold 无法从连续分布中取样,因此您的错误。尝试更改此行,您的代码将愉快地执行:

    from sklearn.linear_model import ElasticNetCV
    from sklearn.model_selection import KFold, StratifiedKFold
    import numpy as np
    
    x = np.arange(100, dtype=np.float64).reshape(-1, 1)
    y = np.random.choice([0,1], size=100)
    
    # KFold default implementation:
    model_default = ElasticNetCV(cv=5)
    model_default.fit(x, y)  # works fine
    # KFold given as cv explicitly:
    model_kfexp = ElasticNetCV(cv=KFold(5))
    model_kfexp.fit(x, y)  # also works fine
    
    # StratifiedKFold given as cv explicitly:
    model_skf = ElasticNetCV(cv=StratifiedKFold(5))
    model_skf.fit(x, y)  # no ERROR
    

    注意

    如果您对连续数据进行采样,请使用 KFold KFold公司 任何适合你需要的。

    附注2

    仿效 pandas.cut (train_id, test_id) 发电机至 cv

    x = np.arange(100, dtype=np.float64).reshape(-1, 1)
    y = np.arange(100) + np.random.rand(100)
    
    y_cat = pd.cut(y, 10, labels=range(10))
    skf_gen = StratifiedKFold(5).split(x, y_cat)
    
    model_skf = ElasticNetCV(cv=skf_gen)
    model_skf.fit(x, y)  # no ERROR