代码之家  ›  专栏  ›  技术社区  ›  Guido


  •  15
  • Guido  · 技术社区  · 14 年前


    A = {100, 110, 120, 130}
    B = {110, 100, 110, 120, 90}
    C = { 90, 110, 120, 100}
    D = {120, 100, 120, 110, 110, 120}
    E = {110, 120, 120, 110, 120}

    弗斯特 ,我必须检测平均水平是否存在显著差异。所以我只跑一条路 ANOVA 使用 Statistical package provided by Apache Commons Math

    第二 ,如果发现差异,我需要知道 unpaired t-tests

    C is different than B
    C is different than D


    撇开统计问题不谈,问题可以是(一般而言): 给定集合中每对元素的相等/不相等信息,如何确定与其他元素不同的元素

    这似乎是一个图论可以应用的问题。我正在使用 JAVA


    4 回复  |  直到 11 年前
  •  4
  •   Guido    14 年前

    以防有人对最终代码感兴趣,使用 Apache Commons Math 进行统计操作,以及 Trove 使用基元类型的集合。


    import gnu.trove.iterator.TIntIntIterator;
    import gnu.trove.map.TIntIntMap;
    import gnu.trove.map.hash.TIntIntHashMap;
    import gnu.trove.procedure.TIntIntProcedure;
    import gnu.trove.set.TIntSet;
    import gnu.trove.set.hash.TIntHashSet;
    import java.util.ArrayList;
    import java.util.List;
    import org.apache.commons.math.MathException;
    import org.apache.commons.math.stat.inference.OneWayAnova;
    import org.apache.commons.math.stat.inference.OneWayAnovaImpl;
    import org.apache.commons.math.stat.inference.TestUtils;
    public class TestMath {
        private static final double SIGNIFICANCE_LEVEL = 0.001; // 99.9%
        public static void main(String[] args) throws MathException {
            double[][] observations = {
               {150.0, 200.0, 180.0, 230.0, 220.0, 250.0, 230.0, 300.0, 190.0 },
               {200.0, 240.0, 220.0, 250.0, 210.0, 190.0, 240.0, 250.0, 190.0 },
               {100.0, 130.0, 150.0, 180.0, 140.0, 200.0, 110.0, 120.0, 150.0 },
               {200.0, 230.0, 150.0, 230.0, 240.0, 200.0, 210.0, 220.0, 210.0 },
               {200.0, 230.0, 150.0, 180.0, 140.0, 200.0, 110.0, 120.0, 150.0 }
            final List<double[]> classes = new ArrayList<double[]>();
            for (int i=0; i<observations.length; i++) {
            OneWayAnova anova = new OneWayAnovaImpl();
    //      double fStatistic = anova.anovaFValue(classes); // F-value
    //      double pValue = anova.anovaPValue(classes);     // P-value
            boolean rejectNullHypothesis = anova.anovaTest(classes, SIGNIFICANCE_LEVEL);
            System.out.println("reject null hipothesis " + (100 - SIGNIFICANCE_LEVEL * 100) + "% = " + rejectNullHypothesis);
            // differences are found, so make t-tests
            if (rejectNullHypothesis) {
                TIntSet aux = new TIntHashSet();
                TIntIntMap fraud = new TIntIntHashMap();
                // i vs j unpaired t-tests - O(n^2)
                for (int i=0; i<observations.length; i++) {
                    for (int j=i+1; j<observations.length; j++) {
                        boolean different = TestUtils.tTest(observations[i], observations[j], SIGNIFICANCE_LEVEL);
                        if (different) {
                            if (!aux.add(i)) {
                                if (fraud.increment(i) == false) {
                                    fraud.put(i, 1);
                            if (!aux.add(j)) {
                                if (fraud.increment(j) == false) {
                                    fraud.put(j, 1);
                // TIntIntMap is sorted by value
                final int max = fraud.get(0);
                // Keep only those with a highest degree
                fraud.retainEntries(new TIntIntProcedure() {
                    public boolean execute(int a, int b) {
                        return b != max;
                // If more than half of the elements are different
                // then they are not really different (?)
                if (fraud.size() > observations.length / 2) {
                // output
                TIntIntIterator it = fraud.iterator();
                while (it.hasNext()) {
                    System.out.println("Element " + it.key() + " has significant differences");             
  •  0
  •   Alex Feinman    14 年前





  •  0
  •   TheSteve0    14 年前


  •  0
  •   Scott Smith    14 年前


    List A    List B
      1         1       // Match, increment both pointers
      3         3       // Match, increment both pointers
      5         4       // '4' missing in list A. Increment B pointer only.
    List A    List B
      1         1       // Match, increment both pointers
      3         3       // Match, increment both pointers
      4         5       // '4' missing in list B (or added to A). Incr. A pointer only.