代码之家  ›  专栏  ›  技术社区  ›  Stefano Potter

利用统计模型预测置信区间

  •  0
  • Stefano Potter  · 技术社区  · 6 年前

    我正在建立这样的线性模型:

    import statsmodels.api as sm
    from statsmodels.stats.outliers_influence import summary_table
    import numpy as np
    import random
    
    x = np.arange(1,101, 1)
    y = random.sample(range(1,1000), 100)
    
    X = sm.add_constant(x)
    regr = sm.OLS(y, X)
    fit = regr.fit()
    
    st, data, ss2 = summary_table(fit, alpha=0.05)
    

    我可以确定标准误差和置信区间 data .

    现在,我想预测一下新数据的置信区间是多少,我正这样尝试:

    new_data = [102, 103, 104, 105]
    
    fit.get_prediction(new_data)
    

    但这又回来了:

    Traceback (most recent call last):
    
      File "<ipython-input-168-372d2610946d>", line 14, in <module>
        fit.get_prediction(new)
    
      File "/Users/spotter/anaconda3/lib/python3.6/site-packages/statsmodels/regression/linear_model.py", line 2138, in get_prediction
        weights=weights, row_labels=row_labels, **kwds)
    
      File "/Users/user/anaconda3/lib/python3.6/site-packages/statsmodels/regression/_prediction.py", line 163, in get_prediction
        predicted_mean = self.model.predict(self.params, exog, **pred_kwds)
    
      File "/Users/user/anaconda3/lib/python3.6/site-packages/statsmodels/regression/linear_model.py", line 261, in predict
        return np.dot(exog, params)
    
    ValueError: shapes (1,4) and (2,) not aligned: 4 (dim 1) != 2 (dim 0
    
    1 回复  |  直到 6 年前
        1
  •  1
  •   Jan K    6 年前

    由于您使用截获对模型进行了培训,因此在创建时还需要包含它 new_data (=添加1列)。

    new_data = sm.add_constant([102, 103, 104, 105])
    result = fit.get_prediction(new_data)
    result.conf_int()