代码之家  ›  专栏  ›  技术社区  ›  Dan

MySQL多变量线性回归

  •  0
  • Dan  · 技术社区  · 15 年前

    我正在尝试对MySQL5.0数据库中的数据进行多变量(9个变量)线性回归(结果值字段只有2个可能的值,1和0)。

    我做了一些搜索,发现我可以使用:

    mysql> SELECT
        -> @n := COUNT(score) AS N,
        -> @meanX := AVG(age) AS "X mean",
        -> @sumX := SUM(age) AS "X sum",
        -> @sumXX := SUM(age*age) "X sum of squares",
        -> @meanY := AVG(score) AS "Y mean",
        -> @sumY := SUM(score) AS "Y sum",
        -> @sumYY := SUM(score*score) "Y sum of square",
        -> @sumXY := SUM(age*score) AS "X*Y sum"
    

    为了得到许多基本的回归变量,但是我真的不想为9个变量的每一个组合输入这个。我能找到的关于如何对多变量进行回归的所有来源都需要 矩阵运算 .我可以用MySQL做矩阵运算吗,或者有其他方法做9变量线性回归吗?

    我应该先从MySQL中导出数据吗?它有大约80000行,所以可以移动它,只是不确定我还应该使用什么。

    谢谢, 丹

    2 回复  |  直到 15 年前
        1
  •  1
  •   sibidiba    15 年前

    将这些数据存储在MySQL中是很好的,但是您可以从一种可以访问数据库的语言中处理这些数据。伪代码:

    variables = [ 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' ];
    
    for X in $variables do
        for Y in $variables do
            query = 'SELECT
                @'+$X+$Y+' := COUNT(score) AS '+$X+$Y+',
                @mean'+$X+' := AVG(age) AS "X mean",
                @sum'+$X+' := SUM(age) AS "X sum",
                @sum'+$X+$X+' := SUM(age*age) "X sum of squares",
                @mean'+$Y+' := AVG(score) AS "Y mean",
                @sum'+$Y+' := SUM(score) AS "Y sum",
                @sum'+$Y+$Y+' := SUM(score*score) "Y sum of square",
                @sum'+$X+$Y+' := SUM(age*score) AS "X*Y sum"';
            db_execute(query);
        done
    done
    

    但是为什么不将结果存储在表中呢?更适合数据库。

    for X in $variables do
        for Y in $variables do
            query = 'INSERT INTO regression SELECT FROM measurements
                "'+$X+'" AS X
                "'+$Y+'" AS Y
                score AS valX
                age AS valY
                COUNT(score) AS N,
                AVG(age) AS meanX,
                SUM(age) AS sumX,
                SUM(age*age) squareX,
                AVG(score) AS meanY,
                SUM(score) AS sumY,
                SUM(score*score) squareY,
                SUM(age*score) AS sumXY';
            db_execute(query);
        done
    done
    

    在X列和Y列上分别放置索引。

        2
  •  1
  •   TheSteve0    15 年前

    我建议将数据移出MySQL并移入R中。使用1/0响应数据,逻辑回归更为合适,而且它不是您正在实现的简单平方和。

    http://en.wikipedia.org/wiki/Logistic_regression

    这似乎很好地展示了如何解决后勤问题。

    http://www.omidrouhani.com/research/logisticregression/html/logisticregression.htm#_Toc147483467