代码之家  ›  专栏  ›  技术社区  ›  Cybrix

如何基于一个动态数组的值相似度进行数组合并

  •  0
  • Cybrix  · 技术社区  · 14 年前

    我使用cURL和各种解析技术从各种网站检索信息。我做的 代码,以便我可以,如果需要的话,添加额外的网站我扫描信息。

    检索到的信息如下:

    Array
    (
        [website1.com] => Array
            (
                [0] => Array
                    (
                        [0] => 60" BRAVIA LX900 Series 3D HDTV
                        [1] => website1.com
                        [2] => 5299.99
                    )
                [1] => Array
                    (
                        [0] => 52" BRAVIA LX900 Series 3D HDTV
                        [1] => website1.com
                        [2] => 4499.99
                    )
                [2] => Array
                    (
                        [0] => 46" BRAVIA LX900 Series 3D HDTV
                        [1] => website1.com
                        [2] => 3699.99
                    )
                [3] => Array
                    (
                        [0] => 40" BRAVIA LX900 Series 3D HDTV
                        [1] => website1.com
                        [2] => 2999.99
                    )
            )
        [website2.com] => Array
            (
                [0] => Array
                    (
                        [0] => Sony 3D 60" LX900 HDTV BRAVIA
                        [1] => website2.com
                        [2] => 5400.99
                    )
                [1] => Array
                    (
                        [0] => Sony 3D 52" LX900 HDTV BRAVIA
                        [1] => website2.com
                        [2] => 4699.99
                    )
                [2] => Array
                    (
                        [0] => Sony 3D 46" LX900 HDTV BRAVIA
                        [1] => website2.com
                        [2] => 3899.99
                    )
            )
    )
    

    所需的输出必须是:

    Array
    (
        [0] => Array
            (
                [Name] => 60" BRAVIA LX900 Series 3D HDTV
                [website1.com] => 5299.99
                [website2.com] => 5400.99
            )
        [1] => Array
            (
                [Name] => 52" BRAVIA LX900 Series 3D HDTV
                [website1.com] => 4499.99
                [website2.com] => 4699.99
            )
        [2] => Array
            (
                [Name] => 46" BRAVIA LX900 Series 3D HDTV
                [website1.com] => 3699.99
                [website2.com] => 3899.99
            )
        [3] => Array
            (
                [Name] => 40" BRAVIA LX900 Series 3D HDTV
                [website1.com] => 2999.99
            )
    )
    

    这是我要做的代码。

    <?php
        $_Retreived = array(
            "website1.com" => array(
                array('60" BRAVIA LX900 Series 3D HDTV', 'website1.com', 5299.99),
                array('52" BRAVIA LX900 Series 3D HDTV', 'website1.com', 4499.99),
                array('46" BRAVIA LX900 Series 3D HDTV', 'website1.com', 3699.99),
                array('40" BRAVIA LX900 Series 3D HDTV', 'website1.com', 2999.99)
            ),
            "website2.com" => array(
                array('Sony 3D 60" LX900 HDTV BRAVIA', 'website2.com', 5400.99),
                array('Sony 3D 52" LX900 HDTV BRAVIA', 'website2.com', 4699.99),
                array('Sony 3D 46" LX900 HDTV BRAVIA', 'website2.com', 3899.99),
            )
        );
    
        $_Prices = array();
        $_PricesTemp = array();
        $_Sites = array("website1.com", "website2.com");
    
        for($i = 0; $i < sizeOf($_Sites); $i++)
        {
            $_PricesTemp = array_merge($_PricesTemp, $_Retreived[ $_Sites[$i] ]);
        }
    
        /*
            print_r($_PricesTemp);
    
            Array
            (
                [0] => Array
                    (
                        [0] => 60" BRAVIA LX900 Series 3D HDTV
                        [1] => website1.com
                        [2] => 5299.99
                    )
                [1] => Array
                    (
                        [0] => 52" BRAVIA LX900 Series 3D HDTV
                        [1] => website1.com
                        [2] => 4499.99
                    )
                [2] => Array
                    (
                        [0] => 46" BRAVIA LX900 Series 3D HDTV
                        [1] => website1.com
                        [2] => 3699.99
                    )
                [3] => Array
                    (
                        [0] => 40" BRAVIA LX900 Series 3D HDTV
                        [1] => website1.com
                        [2] => 2999.99
                    )
                [4] => Array
                    (
                        [0] => Sony 3D 60" LX900 HDTV BRAVIA
                        [1] => website2.com
                        [2] => 5400.99
                    )
                [5] => Array
                    (
                        [0] => Sony 3D 52" LX900 HDTV BRAVIA
                        [1] => website2.com
                        [2] => 4699.99
                    )
                [6] => Array
                    (
                        [0] => Sony 3D 46" LX900 HDTV BRAVIA
                        [1] => website2.com
                        [2] => 3899.99
                    )
            )
        */
    
        foreach($_PricesTemp As $_KeyOne => $_EntryOne)
        {
            foreach(array_reverse($_PricesTemp, true) As $_KeyTwo => $_EntryTwo)
            {
                if ($_KeyOne != $_KeyTwo)
                {
                    $_Percent = 0;
    
                    similar_text(strtoupper($_EntryOne[0]), strtoupper($_EntryTwo[0]), $_Percent);
    
                    if ($_Percent >= 90) //If names matches 90%+
                    {
                        echo "Similar : <b>" . $_KeyOne . "</b> " . $_EntryOne[0] . " and <b>" . $_KeyTwo . "</b> " . $_EntryTwo[0] . " Percent : " . $_Percent . "<br />";
    
                        $_Prices[] = array();
                        $_Prices[ sizeOf($_Prices)-1 ]['Name'] = $_EntryOne[0]; //Use the product name of the most revelant website (website1.com)
    
                        foreach($_Sites As $_Site)
                        {
                            if (isset($_EntryOne[ 1 ]) && $_EntryOne[ 1 ] == $_Site) //Check if it contains price from website1.com
                            {
                                $_Prices[ sizeOf($_Prices)-1 ][ $_Site ] = $_EntryOne[ 2 ];
                            }
                            if (isset($_EntryTwo[ 1 ]) && $_EntryTwo[ 1 ] == $_Site) //Check if it contains price from website2.com
                            {
                                $_Prices[ sizeOf($_Prices)-1 ][ $_Site ] = $_EntryTwo[ 2 ];
                            }
                        }
                    }
                }
            }
        }
    
        /*
            print_r($_Prices);
    
            Array
            (
                [0] => Array
                    (
                        [Name] => 60" BRAVIA LX900 Series 3D HDTV
                        [website1.com] => 2999.99
                    )
                [1] => Array
                    (
                        [Name] => 60" BRAVIA LX900 Series 3D HDTV
                        [website1.com] => 3699.99
                    )
                [2] => Array
                    (
                        [Name] => 60" BRAVIA LX900 Series 3D HDTV
                        [website1.com] => 4499.99
                    )
                [3] => Array
                    (
                        [Name] => 52" BRAVIA LX900 Series 3D HDTV
                        [website1.com] => 2999.99
                    )
                [4] => Array
                    (
                        [Name] => 52" BRAVIA LX900 Series 3D HDTV
                        [website1.com] => 3699.99
                    )
                [5] => Array
                    (
                        [Name] => 52" BRAVIA LX900 Series 3D HDTV
                        [website1.com] => 5299.99
                    )
                [6] => Array
                    (
                        [Name] => 46" BRAVIA LX900 Series 3D HDTV
                        [website1.com] => 2999.99
                    )
                [7] => Array
                    (
                        [Name] => 46" BRAVIA LX900 Series 3D HDTV
                        [website1.com] => 4499.99
                    )
                [8] => Array
                    (
                        [Name] => 46" BRAVIA LX900 Series 3D HDTV
                        [website1.com] => 5299.99
                    )
                [9] => Array
                    (
                        [Name] => 40" BRAVIA LX900 Series 3D HDTV
                        [website1.com] => 3699.99
                    )
                [10] => Array
                    (
                        [Name] => 40" BRAVIA LX900 Series 3D HDTV
                        [website1.com] => 4499.99
                    )
                [11] => Array
                    (
                        [Name] => 40" BRAVIA LX900 Series 3D HDTV
                        [website1.com] => 5299.99
                    )
                [12] => Array
                    (
                        [Name] => Sony 3D 60" LX900 HDTV BRAVIA
                        [website2.com] => 3899.99
                    )
                [13] => Array
                    (
                        [Name] => Sony 3D 60" LX900 HDTV BRAVIA
                        [website2.com] => 4699.99
                    )
                [14] => Array
                    (
                        [Name] => Sony 3D 52" LX900 HDTV BRAVIA
                        [website2.com] => 3899.99
                    )
                [15] => Array
                    (
                        [Name] => Sony 3D 52" LX900 HDTV BRAVIA
                        [website2.com] => 5400.99
                    )
                [16] => Array
                    (
                        [Name] => Sony 3D 46" LX900 HDTV BRAVIA
                        [website2.com] => 4699.99
                    )
                [17] => Array
                    (
                        [Name] => Sony 3D 46" LX900 HDTV BRAVIA
                        [website2.com] => 5400.99
                    )
            )
        */
    ?>
    

    相信代码将工作的情况下,我添加第三个网站的清单。

    有什么主意吗?从今天早上开始我就一直这样。

    3 回复  |  直到 13 年前
        1
  •  1
  •   samshull    13 年前

    试试这个要点更清楚 https://gist.github.com/835099

    它给我带来了你想要的结果。

        2
  •  0
  •   Fanis Hatzidakis    14 年前

    一个高层次的概述应该是这样的:

    • 循环浏览所有网站中找到的所有项目
    • 对于每个项目,检查它是否与$items中的任何现有项目名称足够相似

    而不是 similar_text() levenshtein() 这在实践中是相似的,但相当快。

    $levThreshold = 3 ;
    
    $_Prices = array() ;
    foreach ($_Retreived as $website => $websiteItems) {
        $currName = $websiteItems[0] ;
        $currWebsite = $websiteItems[1] ;
        $currPrice = $websiteItems[2] ;
    
        $foundItemKey = false ;
    
        //check current price structure. Get $priceData by reference
        //so we can modify it in the loop and keep the changed instead 
        //of the loop copy.
        foreach ($_Prices as &$priceData) {
    
            if (isset($priceData[$website])) {
                //already done this
                continue ;
            }
    
            //check if this is the item name we are looping over
            $lev = levenshtein($priceData['Name'], $currName) ;
    
            if ($lev < $levThreshold) {
                //item exists, add price and break
                $priceData[$website] = $currPrice ;
                $foundItemKey = true ;
                break ;
            }
    
        }
    
        //if we haven't found the item key, create a new one
        if (!$foundItemKey) {
            $newItem = array() ;
            $newItem['Name'] = $currName ;
            $newItem[$website] = $currPrice ; 
            $_Prices[] = $newItem ;
        }
    
    }
    

    $levThreshold 两个字符串之间必须不同的最小字符数,才能将其视为不同的字符串。你可以相应地调整。

        3
  •  0
  •   David Gillen    13 年前

    使用类似的文本无法回答此问题。你想要匹配吗 60" BRAVIA LX900 Series 3D HDTV Sony 3D 60" LX900 HDTV BRAVIA . 然而, 60英寸BRAVIA LX900系列3D高清电视 实际上更类似于 52" BRAVIA LX900 Series 3D HDTV

    我怀疑您需要一个自定义处理程序来匹配特定于您要匹配的产品的详细信息。例如,对于电视机来说,你可能想和它的尺寸相匹配( xx" BRAVIA LX900 ).

    这并不能解决你的问题,但我害怕答案。