代码之家  ›  专栏  ›  技术社区  ›  Souvik Ray

如何通过对数据趋势的线性回归找出斜率值?

  •  1
  • Souvik Ray  · 技术社区  · 6 年前

    我有一个时间序列数据,从中我可以找到 trend .现在我需要放一条回归线,它最适合趋势数据,并希望知道坡度是+ve还是-ve还是常量。

     date,cpu
    2018-02-10 11:52:59.342269+00:00,6.0
    2018-02-10 11:53:04.006971+00:00,6.0
    2018-02-10 22:35:33.438948+00:00,4.0
    2018-02-10 22:35:37.905242+00:00,4.0
    2018-02-11 12:01:00.663084+00:00,4.0
    2018-02-11 12:01:05.136107+00:00,4.0
    2018-02-11 12:31:00.228447+00:00,5.0
    2018-02-11 12:31:04.689054+00:00,5.0
    2018-02-11 13:01:00.362877+00:00,5.0
    2018-02-11 13:01:04.824231+00:00,5.0
    2018-02-11 23:42:40.304334+00:00,0.0
    2018-02-11 23:44:27.357619+00:00,0.0
    2018-02-12 01:38:25.012175+00:00,7.0
    2018-02-12 01:53:39.721800+00:00,8.0
    2018-02-12 01:53:53.310947+00:00,8.0
    2018-02-12 01:56:37.657977+00:00,8.0
    2018-02-12 01:56:45.133701+00:00,8.0
    2018-02-12 04:49:36.028754+00:00,9.0
    2018-02-12 04:49:40.097157+00:00,9.0
    2018-02-12 07:20:52.148437+00:00,9.0
    ...          ...                 ...
    

    首先我要找出 趋势 在给定的数据中。下面是找出 趋势

    df = pd.read_csv("test_forecast/cpu_data.csv")
    df["date"] = pd.to_datetime(df["date"], format="%Y-%m-%d")
    df.set_index("date", inplace=True)
    df = df.resample('D').mean().interpolate(method='linear', axis=0).fillna(0)
    
    X = df.index.strftime('%Y-%m-%d')
    Y = sm.tsa.seasonal_decompose(df["cpu"]).trend.interpolate(method='linear', axis=0).fillna(0).values
    

    所以 X 是每天的日期和 Y 是每天的趋势数据。现在我想应用线性回归来找到回归线,并找出斜率是+ve还是-ve还是常量。我尝试了下面的代码

    model = sm.OLS(y,X, missing='drop')
    results = model.fit()
    print(results)
    

    我希望结果变量会有一些关于因变量或自变量、斜率或截距的值。

    Traceback (most recent call last):
      File "/home/souvik/PycharmProjects/Pandas/test11.py", line 37, in <module>
        model = sm.OLS(y,X, missing='drop')
      File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/regression/linear_model.py", line 817, in __init__
        hasconst=hasconst, **kwargs)
      File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/regression/linear_model.py", line 663, in __init__
        weights=weights, hasconst=hasconst, **kwargs)
      File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/regression/linear_model.py", line 179, in __init__
        super(RegressionModel, self).__init__(endog, exog, **kwargs)
      File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/base/model.py", line 212, in __init__
        super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
      File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/base/model.py", line 64, in __init__
        **kwargs)
      File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/base/model.py", line 87, in _handle_data
        data = handle_data(endog, exog, missing, hasconst, **kwargs)
      File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/base/data.py", line 633, in handle_data
        **kwargs)
      File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/base/data.py", line 79, in __init__
        self._handle_constant(hasconst)
      File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/base/data.py", line 131, in _handle_constant
        ptp_ = self.exog.ptp(axis=0)
    TypeError: cannot perform reduce with flexible type
    

    我在一些网站上得到了上面的代码片段,但我无法在我的案例中应用。我做错了什么?

    1 回复  |  直到 6 年前
        1
  •  0
  •   Silenced Temporarily    6 年前

    你的问题在这里:

    X = df.index.strftime('%Y-%m-%d')

    因此x是一个字符串,所以不能用它来拟合回归。你会想要类似的东西

    X = (df.index.astype(np.int64) // 10**9).values 它会将日期时间转换为Unix秒。

    或者,如果您希望使用类似“自初始值以来的天数”的内容 X 你可以的

    start_date = df.index[0]
    X = (df.index - start_date).days.values
    

    无论哪种情况,您都要打印 results.summary() 而不是 results 也。