代码之家  ›  专栏  ›  技术社区  ›  Martin DeMello

在C语言中实现字符串转换表

  •  4
  • Martin DeMello  · 技术社区  · 15 年前

    我想在C语言中实现一个基本的搜索/替换翻译表;也就是说,它将从一个配置文件中读取一对单词的列表,并遍历在运行时接收到的文本,将它找到的每个源单词替换为相应的目标单词。例如,如果我的用户输入文本是

    "Hello world, how are you today?"
    

    我的配置文件是

    world user
    how why
    

    运行函数将返回

    "Hello user, why are you today?"
    

    我可以用少量的沉闷来做这个(目前正在看 glib string utility functions 因为它们在那里),但我认为这在某些图书馆或其他图书馆必须是一个完全解决的问题。有什么指针吗?

    (不,这不是家庭作业,尽管我承认这个问题听起来相当家庭作业):我正在写一个libpurple插件,因此纯C要求。)

    4 回复  |  直到 15 年前
        1
  •  5
  •   blak3r    15 年前

    我也惊讶于找到非常简单的字符串操作方法是多么困难。 我想要的是与面向对象的string.replace()方法等效的过程语言。据我所知,这也是你问题的本质…使用这种方法,您可以添加额外的代码来逐行读取文件,并在空格中标记它。

    实现这种方法的难点在于,要指定分配缓冲区以将转换后的字符串版本放入其中的最佳方法,这实际上是一个应用程序决策。您有几个选择: 1)让用户将缓冲区传递给应用程序,并将其留给用户,以确保缓冲区对于转换后的版本始终足够大。 2)在方法内部执行一些动态内存分配,并强制调用方对返回的指针调用free()。

    我选择1是因为动态内存分配的开销对于嵌入式应用程序来说太大了。另外,它还要求用户稍后调用free(),这是很容易忘记的。

    结果函数看起来很难看。我做了一个非常快速的实现,并将其包含在下面。这种方法在用于生产前应进一步试验。在使用这个项目之前,我最终选择了一个不同的方向。

    #include <stdio.h>
    #include <time.h>
    #include <string.h>
    #include <assert.h>
    
    /*
     * searches an input string for occurrence of a particular string and replaces it with another.  The resulting string is
     * stored in a buffer which is passed in to the function. 
     * 
     * @param pDest is a buffer which the updated version of the string will be placed into.  THIS MUST BE PREALLOCATED.  It's 
              the callers responsibility to make sure that pDest is of sufficient size that the buffer will not be overflowed.
     * @param pDestLen is the number of chars in pDest
     * @param pSrc is a constant string which is the original string
     * @param pSearch is the string to search for in pSrc.
     * @param pReplacement is the string that pSearch will be replaced with.
     * @return if successful it returns the number of times pSearch was replaced in the string.  Otherwise it returns a negative number
     *         to indicate an error.  It returns -1 if one of the strings passed in == NULL, -2 if the destination buffer is of insufficient size.  
     *         Note: the value stored in pDest is undefined if an error occurs.  
     */
    int string_findAndReplace( char* pDest, int pDestLen, const char* pSrc, const char* pSearch, const char* pReplacement) {
        int destIndex=0;
        char* next;
        const char* prev = pSrc;
        int copyLen=0;
        int foundCnt = 0;
    
        if( pDest == NULL || pDestLen == 0 || pSrc == NULL || pSrc == NULL || pReplacement == NULL ) {
            return -1;
        }
    
        // TODO: BEFORE EACH MEMCPY, IT SHOULD BE VERIFIED THAT IT WILL NOT COPY OUT OF THE BOUNDS OF THE BUFFER SPACE
        //       THIS IS A VERY BASIC CHECK 
        if( pDestLen < strlen(pSrc) ) {
            return -2;
        }
    
    
        memset(pDest, 0x00, pDestLen);
    
        //printf("Entered findAndReplace\r\n");
    
        do {    
            next = strstr( prev, pSearch );
    
            if( next != NULL ) {        
                //printf("  next -> %s\r\n", next);
    
                copyLen = (next-prev);
    
                // copy chars before the search string
                memcpy( &pDest[destIndex], prev, copyLen ); 
                destIndex += copyLen;
    
                // insert the replacement               
                memcpy( &pDest[destIndex], pReplacement, strlen(pReplacement) );
                destIndex += strlen(pReplacement);              
    
                prev = next;
                prev += strlen(pSearch);
                foundCnt++;         
            }
        }while( next != NULL );
    
        //copy what's left from prev to the end to the end of dest.
        copyLen = strlen(prev);
        memcpy( &pDest[destIndex], prev, copyLen+1); // +1 makes it null terminate.
    
        //printf("prev='%s'\r\ndest='%s'\r\n", prev, pDest);
        return foundCnt;
    }
    
    
    // --------- VERY BASIC TEST HARNESS FOR THE METHOD ABOVE --------------- // 
    
    #define NUM_TESTS 8
    
    // Very rudimentary test harness for the string_findAndReplace method.
    int main(int argsc, char** argsv) {
    
    int i=0;
    char newString[1000];
    
    char input[][1000] = { 
    "Emergency condition has been resolved. The all clear has been issued.",
    "Emergency condition has been resolved and the all clear has been issued.",
    "lions, tigers, and bears",
    "and something, and another thing and",
    "too many commas,, and, also androids",
    " and and and,, and and ",
    "Avoid doors, windows and large open rooms.",
    "Avoid doors and windows."
    
    };
    
    char output[][1000] = { 
    "Emergency condition has been resolved. The all clear has been issued.",
    "Emergency condition has been resolved, and the all clear has been issued.",
    "lions, tigers,, and bears",
    "and something,, and another thing and",
    "too many commas,, and, also androids",
    ", and, and, and,,, and, and, ",
    "Avoid doors, windows, and large open rooms.",
    "Avoid doors, and windows."
    };
    
        char searchFor[] = " and ";
        char replaceWith[] = ", and ";
    
        printf("String replacer\r\n");
    
        for( i=0; i< NUM_TESTS; i++ ) {
    
            string_findAndReplace( newString, sizeof( newString ), input[i], searchFor, replaceWith );
    
            if( strcmp( newString, output[i] ) == 0 ) {
                printf("SUCCESS\r\n\r\n");
            }
            else {
                printf("FAILED: \r\n IN :'%s'\r\n OUT:'%s'\r\n EXP:'%s'\r\n\r\n", input[i],newString,output[i]);
            }
    
        }
    
        printf("\r\nDONE.\r\n");
        return 0;
    }
    
        2
  •  1
  •   Jason Catena    15 年前

    如果您没有配置文件要求,那么可以让(f)lex为您生成C代码。但这意味着每次单词对列表更改时都要重新编译。

    也许这太过分了,但是你可以将每个单词存储在一个链接列表的节点中。这使得我们可以很容易地通过来回移动和替换单词来构造新的句子。

        3
  •  0
  •   Evan Shaw    15 年前

    你可以退房 GNU gettext . (也见其 Wikipedia article 。)

        4
  •  0
  •   Jamie Hale    15 年前