代码之家 › 专栏 › 技术社区 › Alix Axel

合并两个正则表达式以截断字符串中的单词

multibyte truncate string regex php

Alix Axel · 技术社区 · 15 年前

我正在尝试使用以下函数将字符串截断为整词(如果可能,否则它应该截断为字符):

function Text_Truncate($string, $limit, $more = '...')
{
    $string = trim(html_entity_decode($string, ENT_QUOTES, 'UTF-8'));

    if (strlen(utf8_decode($string)) > $limit)
    {
        $string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)~su', '$1', $string);

        if (strlen(utf8_decode($string)) > $limit)
        {
            $string = preg_replace('~^(.{' . intval($limit) . '}).*~su', '$1', $string);
        }

        $string .= $more;
    }

    return trim(htmlentities($string, ENT_QUOTES, 'UTF-8', true));
}

以下是一些测试:

// IÃ±tÃ«rnÃ¢tiÃ´nÃ lizÃ¦tiÃ¸n and then the quick brown fox... (49 + 3 chars)
echo dyd_Text_Truncate('IÃ±tÃ«rnÃ¢tiÃ´nÃ lizÃ¦tiÃ¸n and then the quick brown fox jumped overly the lazy dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');

// IÃ±tÃ«rnÃ¢tiÃ´nÃ lizÃ¦tiÃ¸n_and_then_the_quick_brown_fox_...  (50 + 3 chars)
echo dyd_Text_Truncate('IÃ±tÃ«rnÃ¢tiÃ´nÃ lizÃ¦tiÃ¸n_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');

但如果我放弃第二个,它们都能正常工作。 preg_replace() 我得到以下信息:

我t_ rn_ t i_ n_ liz_ t i_n_n_u和_然后_快速棕色狐狸u过度地跳懒惰狗有一天,懒惰的狗背着可怜的狐狸直到她死了…

我不能用 substr() 因为它只在字节级别工作,我没有访问 mb_substr() ATM,我已经尝试了好几次加入第二个regex和第一个regex,但都没有成功。

请帮帮S.M.S.,我已经挣扎了将近一个小时了。

编辑:对不起,我已经醒了40个小时了,我无耻地错过了:

$string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)?~su', '$1', $string);

不过,如果有人有更优化的regex(或忽略尾随空格的regex),请分享:

"IÃ±tÃ«rnÃ¢tiÃ´nÃ lizÃ¦tiÃ¸n and then "
"IÃ±tÃ«rnÃ¢tiÃ´nÃ lizÃ¦tiÃ¸n_and_then_"

编辑2:我还是摆脱不了后面的空白,有人能帮我吗?

编辑3:好吧,我的编辑都没有真正起作用,我被雷格斯巴迪愚弄了-我可能应该把这个留到另一天再睡一觉。今天就出发。

2 回复 | 直到 15 年前

Jan Goyvaerts 15 年前

也许我可以给你一个快乐的早晨,在经历了一个漫长的夜晚的瑞格派噩梦之后:

'~^(.{1,' . intval($limit) . '}(?<=\S)(?=\s)|.{'.intval($limit).'}).*~su'

沸腾:

^      # Start of String
(       # begin capture group 1
 .{1,x} # match 1 - x characters
 (?<=\S)# lookbehind, match must end with non-whitespace 
 (?=\s) # lookahead, if the next char is whitespace, match
 |      # otherwise test this:
 .{x}   # got to x chars anyway.
)       # end cap group
.*     # match the rest of the string (since you were using replace)

您可以随时添加 |$ 到最后 (?=\s) 但由于代码已经在检查字符串长度是否大于 $limit 我觉得这个案子不必要。

Roonaan 15 年前

你考虑过使用自动换行吗?( http://us3.php.net/wordwrap )