代码之家  ›  专栏  ›  技术社区  ›  Philipp Lenssen

将数字转换为强调的拼音?

  •  6
  • Philipp Lenssen  · 技术社区  · 15 年前

    给定一个源文本

    nin2 hao3 ma
    

    (这是一种典型的书写ASCII拼音的方法,没有适当的重音字符) 并给出一个(utf8)转换表

    a1;ā
    e1;ē
    i1;Ä«
    o1;ō
    u1;Å«
    ü1;ǖ
    A1;Ā
    E1;Ē
    ...
    

    如何将源文本转换为

    nín hǎo ma
    

    ?

    为了我使用PHP的价值,这可能是我正在研究的regex?

    5 回复  |  直到 11 年前
        1
  •  1
  •   Ollie Saunders    15 年前
    <?php
    $in = 'nin2 hao3 ma';
    $out = 'nín hǎo ma';
    
    function replacer($match) {
      static $trTable = array(
        1 => array(
          'a' => 'ā',
          'e' => 'ē',
          'i' => 'Ä«',
          'o' => 'ō',
          'u' => 'Å«',
          'ü' => 'ǖ',
          'A' => 'Ā',
          'E' => 'Ē'),
        2 => array('i' => 'í'),
        3 => array('a' => 'ǎ')
      );
      list(, $word, $i) = $match;
      return str_replace(
        array_keys($trTable[$i]),
        array_values($trTable[$i]),
        $word); }
    
    // Outputs: bool(true)
    var_dump(preg_replace_callback('~(\w+)(\d+)~', 'replacer', $in) === $out);
    
        2
  •  9
  •   Bouke Versteegh clauziere    13 年前

    奥利的算法是一个很好的开始,但它没有正确应用标记。例如,桥1变成了“Q_«”。这个是正确和完整的。您可以很容易地看到如何定义替换规则。

    它对音调5也做了全部的事情,尽管它不影响输出,除了删除数字。我把它放在里面,以防你想用第5音做些什么。

    该算法的工作原理如下:

    • 单词和音调在$match[1]和[2]中提供。
    • 字母后面加了一个星号,应该加上重音符号。
    • 带有星号的字母被带有正确音调标记的字母代替。

    例子:

    qiao => (iao becomes ia*o) => qia*o => qiǎo

    这个策略,以及 strtr (优先考虑较长的替换),确保不会发生这种情况:

    巧1=>Q«

    
    function pinyin_addaccents($string) {
        # Find words with a number behind them, and replace with callback fn.
        return preg_replace_callback(
            '~([a-zA-ZüÜ]+)(\d)~',
            'pinyin_addaccents_cb',
            $string);
    }
    
    # Helper callback
    function pinyin_addaccents_cb($match) {
        static $accentmap = null;
    
        if( $accentmap === null ) {
            # Where to place the accent marks
            $stars =
                'a* e* i* o* u* ü* '.
                'A* E* I* O* U* Ü* '.
                'a*i a*o e*i ia* ia*o ie* io* iu* '.
                'A*I A*O E*I IA* IA*O IE* IO* IU* '.
                'o*u ua* ua*i ue* ui* uo* üe* '.
                'O*U UA* UA*I UE* UI* UO* ÜE*';
            $nostars = str_replace('*', '', $stars);
    
            # Build an array like Array('a' => 'a*') and store statically
            $accentmap = array_combine(explode(' ',$nostars), explode(' ', $stars));
            unset($stars, $nostars);
        }
    
        static $vowels =
            Array('a*','e*','i*','o*','u*','ü*','A*','E*','I*','O*','U*','Ü*');
    
        static $pinyin = Array(
            1 => Array('ā','ē','ī','ō','ū','ǖ','Ā','Ē','Ī','Ō','Ū','Ǖ'),
            2 => Array('á','é','í','ó','ú','ǘ','Á','É','Í','Ó','Ú','Ǘ'),
            3 => Array('ǎ','ě','ǐ','ǒ','ǔ','ǚ','Ǎ','Ě','Ǐ','Ǒ','Ǔ','Ǚ'),
            4 => Array('à','è','ì','ò','ù','ǜ','À','È','Ì','Ò','Ù','Ǜ'),
            5 => Array('a','e','i','o','u','ü','A','E','I','O','U','Ü')
        );
    
        list(,$word,$tone) = $match;
        # Add star to vowelcluster
        $word = strtr($word, $accentmap);
        # Replace starred letter with accented 
        $word = str_replace($vowels, $pinyin[$tone], $word);
        return $word;
    }
    
        3
  •  1
  •   Dale    13 年前

    对于.NET解决方案,请尝试 Pinyin4j.NET

    特征 将中文(包括简体和繁体)转换为最流行的拼音系统。支持拼音系统如下。

    • 汉语拼音
    • 通永拼音
    • 韦德·吉尔斯
    • mps2(汉语语音符号II)_½_··························
    • 耶鲁罗马化
    • Gwoyeu Romatzyh_½_·················································
        4
  •  1
  •   JasonL tike    11 年前

    要添加JavaScript解决方案,请执行以下操作:

    此代码根据放置一个的官方算法放置音调标记, 看见 wikipedia .

    希望对你们中的一些人有所帮助,建议和改进会很好!

        var ACCENTED = {
                '1': {'a': '\u0101', 'e': '\u0113', 'i': '\u012B', 'o': '\u014D', 'u': '\u016B', 'ü': '\u01D6'},
                '2': {'a': '\u00E1', 'e': '\u00E9', 'i': '\u00ED', 'o': '\u00F3', 'u': '\u00FA', 'ü': '\u01D8'},
                '3': {'a': '\u01CE', 'e': '\u011B', 'i': '\u01D0', 'o': '\u01D2', 'u': '\u01D4', 'ü': '\u01DA'},
                '4': {'a': '\u00E0', 'e': '\u00E8', 'i': '\u00EC', 'o': '\u00F2', 'u': '\u00F9', 'ü': '\u01DC'},
                '5': {'a': 'a', 'e': 'e', 'i': 'i', 'o': 'o', 'u': 'u', 'ü': 'ü'}
        };
    
         function getPos (token) {
                if (token.length === 1){
                    // only one letter, nothing to differentiate
                    return 0;
                }
                var precedence = ['a', 'e', 'o'];
                for (i=0; i<precedence.length; i += 1){
                    var pos = token.indexOf(precedence[i]);
                    // checking a before o, will take care of ao automatically
                    if (pos >= 0){
                        return pos;
                    }
                }
                var u = token.indexOf('u');
                var i = token.indexOf('i');
                if (i < u){
                    // -iu OR u-only case, accent goes to u
                    return u;
                } else {
                    // -ui OR i-only case, accent goes to i
                    return i;
                }
                // the only vowel left is ü
                var ü = token.indexOf('ü');
                if (ü >= 0){
                    return ü;
                }
            }
    
         //and finally:
         // we asume the input to be valid PinYin, therefore no security checks....
         function placeTone(numbered_PinYin){
                  var ToneIndex = numbered_PinYin.charAt(numbered_PinYin.length -1);
                  var accentpos = getPos(numbered_PinYin);
                  var accented_Char = ACCENTED[ToneIndex][numbered_PinYin.charAt(accentpos)];
    
                  var accented_PinYin = "";
                  if (accentpos === 0){
                      // minus one to trimm the number off
                      accented_PinYin = accented_Char + numbered_PinYin.substr(1, numbered_PinYin.length-1); 
                  } else {
                      var before = numbered_PinYin.substr(0, accentpos);
                      var after = numbered_PinYin.substring(accentpos+1, numbered_PinYin.length-1);
                      accented_PinYin = before + accented_Char + after;
                   }
                   return accented_PinYin;
        }
    
        console.log(placeTone('han4 zi4'));
    
        5
  •  0
  •   Francois    12 年前

    VB宏(libre)办公室: Convert pinyin tone numbers to accents

    希望该算法对I和U的拼音规则是正确的。

        sub replaceNumberByTones
    
        call PinyinTonesNumber("a([a-z]*[a-z]*)0", "a$1")
        call PinyinTonesNumber("a([a-z]*[a-z]*)1", "a$1")
        call PinyinTonesNumber("a([a-z]*[a-z]*)2", "á$1")
        call PinyinTonesNumber("a([a-z]*[a-z]*)3", "a$1")
        call PinyinTonesNumber("a([a-z]*[a-z]*)4", "à$1")
    
        call PinyinTonesNumber("o([a-z]*[a-z]*)0", "o$1")
        call PinyinTonesNumber("o([a-z]*[a-z]*)1", "o$1")
        call PinyinTonesNumber("o([a-z]*[a-z]*)2", "ó$1")
        call PinyinTonesNumber("o([a-z]*[a-z]*)3", "o$1")
        call PinyinTonesNumber("o([a-z]*[a-z]*)4", "ò$1")
    
        call PinyinTonesNumber("e([a-z]*[a-z]*)0", "e$1")
        call PinyinTonesNumber("e([a-z]*[a-z]*)1", "e$1")
        call PinyinTonesNumber("e([a-z]*[a-z]*)2", "é$1")
        call PinyinTonesNumber("e([a-z]*[a-z]*)3", "e$1")
        call PinyinTonesNumber("e([a-z]*[a-z]*)4", "è$1")
    
        call PinyinTonesNumber("u([a-hj-z]*[a-hj-z]*)0", "u$1")
        call PinyinTonesNumber("u([a-hj-z]*[a-hj-z]*)1", "u$1")
        call PinyinTonesNumber("u([a-hj-z]*[a-hj-z]*)2", "ú$1")
        call PinyinTonesNumber("u([a-hj-z]*[a-hj-z]*)3", "u$1")
        call PinyinTonesNumber("u([a-hj-z]*[a-hj-z]*)4", "ù$1")
    
        call PinyinTonesNumber("i([a-z]*[a-z]*)0", "i$1")
        call PinyinTonesNumber("i([a-z]*[a-z]*)1", "i$1")
        call PinyinTonesNumber("i([a-z]*[a-z]*)2", "í$1")
        call PinyinTonesNumber("i([a-z]*[a-z]*)3", "i$1")
        call PinyinTonesNumber("i([a-z]*[a-z]*)4", "ì$1")
    
        End sub
    
        sub PinyinTonesNumber(expression, replacement)
        rem ----------------------------------------------------------------------
        rem define variables
        dim document   as object
        dim dispatcher as object
        rem ----------------------------------------------------------------------
        rem get access to the document
        document   = ThisComponent.CurrentController.Frame
        dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
    
        rem ----------------------------------------------------------------------
        dim args1(18) as new com.sun.star.beans.PropertyValue
        args1(0).Name = "SearchItem.StyleFamily"
        args1(0).Value = 2
        args1(1).Name = "SearchItem.CellType"
        args1(1).Value = 0
        args1(2).Name = "SearchItem.RowDirection"
        args1(2).Value = true
        args1(3).Name = "SearchItem.AllTables"
        args1(3).Value = false
        args1(4).Name = "SearchItem.Backward"
        args1(4).Value = false
        args1(5).Name = "SearchItem.Pattern"
        args1(5).Value = false
        args1(6).Name = "SearchItem.Content"
        args1(6).Value = false
        args1(7).Name = "SearchItem.AsianOptions"
        args1(7).Value = false
        args1(8).Name = "SearchItem.AlgorithmType"
        args1(8).Value = 1
        args1(9).Name = "SearchItem.SearchFlags"
        args1(9).Value = 65536
        args1(10).Name = "SearchItem.SearchString"
        args1(10).Value = expression
        args1(11).Name = "SearchItem.ReplaceString"
        args1(11).Value = replacement
        args1(12).Name = "SearchItem.Locale"
        args1(12).Value = 255
        args1(13).Name = "SearchItem.ChangedChars"
        args1(13).Value = 2
        args1(14).Name = "SearchItem.DeletedChars"
        args1(14).Value = 2
        args1(15).Name = "SearchItem.InsertedChars"
        args1(15).Value = 2
        args1(16).Name = "SearchItem.TransliterateFlags"
        args1(16).Value = 1280
        args1(17).Name = "SearchItem.Command"
        args1(17).Value = 3
        args1(18).Name = "Quiet"
        args1(18).Value = true
    
        dispatcher.executeDispatch(document, ".uno:ExecuteSearch", "", 0, args1())
    
    
        end sub
    

    希望这能帮助别人

    弗兰·奥斯