代码之家  ›  专栏  ›  技术社区  ›  Keith

scipy.stats.ttest_ind,不带数组(python)

  •  3
  • Keith  · 技术社区  · 10 年前

    我已经做了许多计算来估计,两个样本的N。由于有很多近似值,我没有期望作为scipy.stats.ttest_ind输入的数组 welch's t test 。有没有在python中实现这一点的方法?

    3 回复  |  直到 10 年前
        1
  •  3
  •   rroowwllaanndd    10 年前

    下面是一个基于 this :

    import scipy.stats as stats
    import numpy as np
    
    def welch_t_test(mu1, s1, N1, mu2, s2, N2):
      # Construct arrays to make calculations more succint.
      N_i = np.array([N1, N2])
      dof_i = N_i - 1
      v_i = np.array([s1, s2]) ** 2
      # Calculate t-stat, degrees of freedom, use scipy to find p-value.
      t = (mu1 - mu2) / np.sqrt(np.sum(v_i / N_i))
      dof = (np.sum(v_i / N_i) ** 2) / np.sum((v_i ** 2) / ((N_i ** 2) * dof_i))
      p = stats.distributions.t.sf(np.abs(t), dof) * 2
      return t, p
    

    它产生了几乎相同的结果:

    sample1 = np.random.rand(10)
    sample2 = np.random.rand(15)
    result_test = welch_t_test(np.mean(sample1), np.std(sample1, ddof=1), sample1.size,
                               np.mean(sample2), np.std(sample2, ddof=1), sample2.size)
    result_scipy = stats.ttest_ind(sample1, sample2,equal_var=False)
    np.allclose(result_test, result_scipy)
    True
    
        2
  •  2
  •   Josef    9 年前

    作为更新

    该功能现在在中可用 scipy.stats ,自版本0.16.0起

    http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.ttest_ind_from_stats.html

    scipy.stats.ttest_ind_from_stats(mean1, std1, nobs1, mean2, std2, nobs2, equal_var=True)
    T-test for means of two independent samples from descriptive statistics.
    
    This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values.
    
        3
  •  1
  •   Josef    10 年前

    我已经编写了t检验和z检验函数,它们接受统计模型的汇总统计数据。

    这些主要是为了避免代码重复的内部快捷方式,并且没有很好的文档记录。

    例如 http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.weightstats._tstat_generic.html

    相关功能列表如下: http://statsmodels.sourceforge.net/devel/stats.html#basic-statistics-and-t-tests-with-frequency-weights

    编辑:回复评论

    该函数只进行核心计算,不同假设下差值标准差的实际计算被添加到调用方法中。 https://github.com/statsmodels/statsmodels/blob/master/statsmodels/stats/weightstats.py#L713

    编辑

    下面是如何使用CompareMeans类的方法的示例,该类包括基于汇总统计的t检验。我们需要创建一个类,将相关的汇总统计信息作为属性保存。最后有一个函数,它只包装相关的调用。

    """
    Created on Wed Jul 23 05:47:34 2014
    Author: Josef Perktold
    License: BSD-3
    
    """
    
    import numpy as np
    from scipy import stats
    from statsmodels.stats.weightstats import CompareMeans, ttest_ind
    
    
    class SummaryStats(object):
    
        def __init__(self, nobs, mean, std):
            self.nobs = nobs
            self.mean = mean
            self.std = std
            self._var = std**2
    
    np.random.seed(123)
    nobs = 20
    x1 = 1 + np.random.randn(nobs)
    x2 = 1 + 1.5 * np.random.randn(nobs)
    
    print stats.ttest_ind(x1, x2, equal_var=False)
    print ttest_ind(x1, x2, usevar='unequal')
    
    s1 = SummaryStats(x1.shape[0], x1.mean(0), x1.std(0))
    s2 = SummaryStats(x2.shape[0], x2.mean(0), x2.std(0))
    
    print CompareMeans(s1, s2).ttest_ind(usevar='unequal')
    
    
    
    def ttest_ind_summ(summ1, summ2, usevar='unequal'):
        """t-test for equality of means based on summary statistic
    
        Parameters
        ----------
        summ1, summ2 : tuples of (nobs, mean, std)
            summary statistic for the two samples
    
        """
    
        s1 = SummaryStats(*summ1)
        s2 = SummaryStats(*summ2)
        return CompareMeans(s1, s2).ttest_ind(usevar=usevar)
    
    print ttest_ind_summ((x1.shape[0], x1.mean(0), x1.std(0)),
                         (x2.shape[0], x2.mean(0), x2.std(0)), 
                         usevar='unequal')
    
    ''' result
    (array(1.1590347327654558), 0.25416326823881513)
    (1.1590347327654555, 0.25416326823881513, 35.573591346616553)
    (1.1590347327654558, 0.25416326823881513, 35.57359134661656)
    (1.1590347327654558, 0.25416326823881513, 35.57359134661656)
    '''