代码之家 › 专栏 › 技术社区 › UncleSax

查找并用wchart替换出现

find-occurrences wchar-t replace c++

0

UncleSax · 技术社区 · 10 年前

我需要开发一个小函数来查找wchar_t字符序列中的出现。该函数将指针wchart*作为字符串的输入,但由于它是unicode,因此每个字符的值显然都显示为数字。

有没有一种优雅的方法可以做到这一点,而不需要解析字符串中的每个字母并比较unicode编号?当我试图将指针传递给函数时,这个函数只接受第一个字符,为什么?

1 回复 | 直到 10 年前

1

0

nodakai 10 年前

std::wstring 和 std::wstream 如果 locale 正确设置:

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

static void searchAndReport(const wstring &line) {
    wstring::size_type pos = line.find(L"ãª"); // hiragana "na"
    if (wstring::npos == pos) {
        wcout << L"è¦ã¤ããã¾ãã" << endl; // not found
        return;
    }
    for (bool first = true; wstring::npos != pos; pos = line.find(L"ãª", pos + 1)) {
        if (first)
            first = false;
        else
            wcout << L", " ;
        wcout << L"ç¬¬" << pos << L"æ¡" ; // the pos-th column
    }
    wcout << endl;
}

static void readLoop(wistream &is) {
    wstring line;

    for (int cnt = 0; getline(is, line); ++cnt) {
        wcout << L"ç¬¬" << cnt << L"è¡ç®: " ; // the cnt-th line:
        searchAndReport(line);
    }
}

int main(int argc, char *argv[]) {
//  locale::global(std::locale("ja_JP.UTF-8"));
    locale::global(std::locale(""));

    if (1 < argc) {
        wcout << L"å¥åãã¡ã¤ã«: [" << argv[1] << "]" << endl; // input file
        wifstream ifs( argv[1] );
        readLoop(ifs);
    } else {
        wcout << L"æ¨æºå¥åãä½¿ç¨ãã¾ã" << endl; // using the standard input
        readLoop(wcin);
    }
}

成绩单:

$ cat scenery-by-bocho-yamamura.txt
ãã¡ããã®ãªã®ã¯ãª
ãã¡ããã®ãªã®ã¯ãª
ãã¡ããã®ãªã®ã¯ãª
ãã¡ããã®ãªã®ã¯ãª
ãã¡ããã®ãªã®ã¯ãª
ãã¡ããã®ãªã®ã¯ãª
ãã¡ããã®ãªã®ã¯ãª
ããããªãããã¶ã
ãã¡ããã®ãªã®ã¯ãª
$ ./wchar_find scenery-by-bocho-yamamura.txt
å¥åãã¡ã¤ã«: [scenery-by-bocho-yamamura.txt]
ç¬¬0è¡ç®: ç¬¬5æ¡, ç¬¬8æ¡
ç¬¬1è¡ç®: ç¬¬5æ¡, ç¬¬8æ¡
ç¬¬2è¡ç®: ç¬¬5æ¡, ç¬¬8æ¡
ç¬¬3è¡ç®: ç¬¬5æ¡, ç¬¬8æ¡
ç¬¬4è¡ç®: ç¬¬5æ¡, ç¬¬8æ¡
ç¬¬5è¡ç®: ç¬¬5æ¡, ç¬¬8æ¡
ç¬¬6è¡ç®: ç¬¬5æ¡, ç¬¬8æ¡
ç¬¬7è¡ç®: ç¬¬3æ¡
ç¬¬8è¡ç®: ç¬¬5æ¡, ç¬¬8æ¡

所有文件都是UTF-8格式。

小心不要混合 cout 和 wcout :

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27569

环境:

$ lsb_release -a
LSB Version:    core-2.0-amd64: [...snip...]
Distributor ID: Ubuntu
Description:    Ubuntu 12.04.5 LTS
Release:        12.04
Codename:       precise
$ env | grep -i ja
LANGUAGE=ja:en
GDM_LANG=ja
LANG=ja_JP.UTF-8
$ g++ --version
g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.