代码之家 › 专栏 › 技术社区 › Keith

scipy.stats.ttest_ind,不带数组(python)

scipy statistics numpy python

Keith · 技术社区 · 10 年前

我已经做了许多计算来估计,两个样本的N。由于有很多近似值,我没有期望作为scipy.stats.ttest_ind输入的数组 welch's t test 。有没有在python中实现这一点的方法?

3 回复 | 直到 10 年前

rroowwllaanndd 10 年前

下面是一个基于 this :

import scipy.stats as stats
import numpy as np

def welch_t_test(mu1, s1, N1, mu2, s2, N2):
  # Construct arrays to make calculations more succint.
  N_i = np.array([N1, N2])
  dof_i = N_i - 1
  v_i = np.array([s1, s2]) ** 2
  # Calculate t-stat, degrees of freedom, use scipy to find p-value.
  t = (mu1 - mu2) / np.sqrt(np.sum(v_i / N_i))
  dof = (np.sum(v_i / N_i) ** 2) / np.sum((v_i ** 2) / ((N_i ** 2) * dof_i))
  p = stats.distributions.t.sf(np.abs(t), dof) * 2
  return t, p

它产生了几乎相同的结果:

sample1 = np.random.rand(10)
sample2 = np.random.rand(15)
result_test = welch_t_test(np.mean(sample1), np.std(sample1, ddof=1), sample1.size,
                           np.mean(sample2), np.std(sample2, ddof=1), sample2.size)
result_scipy = stats.ttest_ind(sample1, sample2,equal_var=False)
np.allclose(result_test, result_scipy)
True

Josef 9 年前

作为更新

该功能现在在中可用 scipy.stats ,自版本0.16.0起

http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.ttest_ind_from_stats.html

scipy.stats.ttest_ind_from_stats(mean1, std1, nobs1, mean2, std2, nobs2, equal_var=True)
T-test for means of two independent samples from descriptive statistics.

This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values.

Josef 10 年前

我已经编写了t检验和z检验函数,它们接受统计模型的汇总统计数据。

这些主要是为了避免代码重复的内部快捷方式,并且没有很好的文档记录。

例如 http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.weightstats._tstat_generic.html

编辑:回复评论

该函数只进行核心计算,不同假设下差值标准差的实际计算被添加到调用方法中。 https://github.com/statsmodels/statsmodels/blob/master/statsmodels/stats/weightstats.py#L713

编辑

下面是如何使用CompareMeans类的方法的示例,该类包括基于汇总统计的t检验。我们需要创建一个类,将相关的汇总统计信息作为属性保存。最后有一个函数,它只包装相关的调用。

"""
Created on Wed Jul 23 05:47:34 2014
Author: Josef Perktold
License: BSD-3

"""

import numpy as np
from scipy import stats
from statsmodels.stats.weightstats import CompareMeans, ttest_ind


class SummaryStats(object):

    def __init__(self, nobs, mean, std):
        self.nobs = nobs
        self.mean = mean
        self.std = std
        self._var = std**2

np.random.seed(123)
nobs = 20
x1 = 1 + np.random.randn(nobs)
x2 = 1 + 1.5 * np.random.randn(nobs)

print stats.ttest_ind(x1, x2, equal_var=False)
print ttest_ind(x1, x2, usevar='unequal')

s1 = SummaryStats(x1.shape[0], x1.mean(0), x1.std(0))
s2 = SummaryStats(x2.shape[0], x2.mean(0), x2.std(0))

print CompareMeans(s1, s2).ttest_ind(usevar='unequal')



def ttest_ind_summ(summ1, summ2, usevar='unequal'):
    """t-test for equality of means based on summary statistic

    Parameters
    ----------
    summ1, summ2 : tuples of (nobs, mean, std)
        summary statistic for the two samples

    """

    s1 = SummaryStats(*summ1)
    s2 = SummaryStats(*summ2)
    return CompareMeans(s1, s2).ttest_ind(usevar=usevar)

print ttest_ind_summ((x1.shape[0], x1.mean(0), x1.std(0)),
                     (x2.shape[0], x2.mean(0), x2.std(0)), 
                     usevar='unequal')

''' result
(array(1.1590347327654558), 0.25416326823881513)
(1.1590347327654555, 0.25416326823881513, 35.573591346616553)
(1.1590347327654558, 0.25416326823881513, 35.57359134661656)
(1.1590347327654558, 0.25416326823881513, 35.57359134661656)
'''