代码之家 › 专栏 › 技术社区 › aahrens

遍历NSString中所有字符的最有效方法

objective-c

aahrens · 技术社区 · 14 年前

在NSString中遍历所有字符的最佳方法是什么?是否要在字符串的长度上循环并使用该方法。

[aNSString characterAtIndex:index];

8 回复 | 直到 14 年前

Jacob Relkin 10 年前

我肯定会先得到一个char缓冲区,然后遍历它。

NSString *someString = ...

unsigned int len = [someString length];
char buffer[len];

//This way:
strncpy(buffer, [someString UTF8String]);

//Or this way (preferred):

[someString getCharacters:buffer range:NSMakeRange(0, len)];

for(int i = 0; i < len; ++i) {
   char current = buffer[i];
   //do something with current...
}

138

Daniel Bruce 9 年前

我认为人们理解如何处理unicode是很重要的,所以我最终写了一个非常可怕的答案,但是本着 tl;博士

NSUInteger len = [str length];
unichar buffer[len+1];

[str getCharacters:buffer range:NSMakeRange(0, len)];

NSLog(@"getCharacters:range: with unichar buffer");
for(int i = 0; i < len; i++) {
  NSLog(@"%C", buffer[i]);
}

还和我在一起?很好!

当前接受的答案似乎是字节与字符/字母混淆。当遇到unicode时,这是一个常见的问题,特别是在C背景下。Objective-C中的字符串表示为unicode字符( unichar )它比字节大得多,不应该与标准的C字符串操作函数一起使用。

编辑 :这不是全部的故事!令我非常遗憾的是,我完全忘了解释可组合字符,其中一个“字母”是由多个unicode代码点组成的。这为您提供了一种情况,您可以将一个“字母”解析为多个unichars,而每个unichars又是多个字节。哦,孩子。请参考 this great answer 关于这方面的细节。)

这个问题的正确答案取决于您是否要在 字符/字母 (与类型不同 char )或者字节实际上是指)。本着限制混乱的精神,我将使用这些术语和信性格 .

NSUInteger len=[字符串长度];
unichar缓冲区[len+1];


NSLog(@“getCharacters:range:with unichar buffer”);
NSLog(@%C,缓冲区[i]);
}

-length . 这很容易出错的一个主要原因,特别是对于一个美国开发人员来说,是一个字母落入7位ASCII频谱的字符串 字节和字母长度相等

正确的方法是使用 -lengthOfBytesUsingEncoding:NSUTF8StringEncoding 然后将字符串转换为与 -cStringUsingEncoding:

NSUInteger byteLength = [str lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
char proper_c_buffer[byteLength+1];
strncpy(proper_c_buffer, [str cStringUsingEncoding:NSUTF8StringEncoding], byteLength);

NSLog(@"strncpy with proper length");
for(int i = 0; i < byteLength; i++) {
  NSLog(@"%c", proper_c_buffer[i]);
}

为了说明为什么保持事情的正确性很重要,我将展示以四种不同方式处理这个迭代的示例代码,两种错误,两种正确。这是代码:

#import <Foundation/Foundation.h>

int main() {
  NSString *str = @"Ð±ÑÐºÐ²Ð°";
  NSUInteger len = [str length];

  // Try to store unicode letters in a char array. This will fail horribly
  // because getCharacters:range: takes a unichar array and will probably
  // overflow or do other terrible things. (the compiler will warn you here,
  // but warnings get ignored)
  char c_buffer[len+1];
  [str getCharacters:c_buffer range:NSMakeRange(0, len)];

  NSLog(@"getCharacters:range: with char buffer");
  for(int i = 0; i < len; i++) {
    NSLog(@"Byte %d: %c", i, c_buffer[i]);
  }

  // Copy the UTF string into a char array, but use the amount of letters
  // as the buffer size, which will truncate many non-ASCII strings.
  strncpy(c_buffer, [str UTF8String], len);

  NSLog(@"strncpy with UTF8String");
  for(int i = 0; i < len; i++) {
    NSLog(@"Byte %d: %c", i, c_buffer[i]);
  }

  // Do It Right (tm) for accessing letters by making a unichar buffer with
  // the proper letter length
  unichar buffer[len+1];
  [str getCharacters:buffer range:NSMakeRange(0, len)];

  NSLog(@"getCharacters:range: with unichar buffer");
  for(int i = 0; i < len; i++) {
    NSLog(@"Letter %d: %C", i, buffer[i]);
  }

  // Do It Right (tm) for accessing bytes, by using the proper
  // encoding-handling methods
  NSUInteger byteLength = [str lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
  char proper_c_buffer[byteLength+1];
  const char *utf8_buffer = [str cStringUsingEncoding:NSUTF8StringEncoding];
  // We copy here because the documentation tells us the string can disappear
  // under us and we should copy it. Just to be safe
  strncpy(proper_c_buffer, utf8_buffer, byteLength);

  NSLog(@"strncpy with proper length");
  for(int i = 0; i < byteLength; i++) {
    NSLog(@"Byte %d: %c", i, proper_c_buffer[i]);
  }
  return 0;
}

运行此代码将输出以下内容(去掉NSLog cruft),显示字节和字母表示的确切差异(最后两个输出):

getCharacters:range: with char buffer
Byte 0: 1
Byte 1: 
Byte 2: C
Byte 3: 
Byte 4: :
strncpy with UTF8String
Byte 0: Ã
Byte 1: Â±
Byte 2: Ã
Byte 3: 
Byte 4: Ã
getCharacters:range: with unichar buffer
Letter 0: Ð±
Letter 1: Ñ
Letter 2: Ðº
Letter 3: Ð²
Letter 4: Ð°
strncpy with proper length
Byte 0: Ã
Byte 1: Â±
Byte 2: Ã
Byte 3: 
Byte 4: Ã
Byte 5: Âº
Byte 6: Ã
Byte 7: Â²
Byte 8: Ã
Byte 9: Â°

Casey Fleser 10 年前

虽然Daniel的解决方案可能大部分时间都会奏效,但我认为解决方案取决于上下文。例如,我有一个拼写应用程序,当每个字符出现在屏幕上时,需要对其进行迭代,这可能与它在内存中的表示方式不一致。对于用户提供的文本尤其如此。

- (void) dumpChars
{
    NSMutableArray  *chars = [NSMutableArray array];
    NSUInteger      len = [self length];
    unichar         buffer[len+1];

    [self getCharacters: buffer range: NSMakeRange(0, len)];
    for (int i=0; i<len; i++) {
        [chars addObject: [NSString stringWithFormat: @"%C", buffer[i]]];
    }

    NSLog(@"%@ = %@", self, [chars componentsJoinedByString: @", "]);
}

喂它一个像ma±ana这样的词可能会产生:

maÃ±ana = m, a, Ã±, a, n, a

maÃ±ana = m, a, n, Ì, a, n, a

如果字符串是预合成的unicode格式,则会生成前者;如果字符串是分解的格式,则会生成后者。

Technical Q&A 1225 . 例如 eÌgÃ¢ds (我完全编好了)即使转换成预合成形式,仍然会产生以下结果。

 eÌgÃ¢ds = e, Ì, g, Ã¢, d, s

我的解决方案是使用NSString的枚举子字符串sinrange传递NSStringEnumerationByComposedCharacterSequences作为枚举选项。重写前面的示例如下:

- (void) dumpSequences
{
    NSMutableArray  *chars = [NSMutableArray array];

    [self enumerateSubstringsInRange: NSMakeRange(0, [self length]) options: NSStringEnumerationByComposedCharacterSequences
        usingBlock: ^(NSString *inSubstring, NSRange inSubstringRange, NSRange inEnclosingRange, BOOL *outStop) {
        [chars addObject: inSubstring];
    }];

    NSLog(@"%@ = %@", self, [chars componentsJoinedByString: @", "]);
}

egds系统 然后我们得到

eÌgÃ¢ds = eÌ, g, Ã¢, d, s

文件中关于 Characters and Grapheme Clusters 也可能有助于解释其中的一些问题。

Community Egal 7 年前

也不是。这个 "Optimize Your Text Manipulations" section of the "Cocoa Performance Guidelines" in the Xcode Documentation 建议:

字符串的字符,其中一个你不应该做的事是使用 characterAtIndex: 要检索的方法每个字符分开。这种方法相反,请考虑获取同时使用 getCharacters:range: 直接遍历字节。

如果要搜索字符串特定字符或子字符串,do 一个人。相反,使用更高的级别方法,例如 rangeOfString: rangeOfCharacterFromSet: substringWithRange: ,它们是为搜索 NSString 角色。

看这个 Stack Overflow answer on How to remove whitespace from right end of NSString 字符集范围: 迭代字符串的字符,而不是自己做。

Scott Gardner 10 年前

NSRange range = NSMakeRange(0, 1);
for (__unused int i = range.location; range.location < [starring length]; range.location++) {
  NSLog(@"%@", [aNSString substringWithRange:range]);
}

(在 __未使用的int i 位是关闭编译器警告所必需的。)

user1644430 8 年前

尝试使用块枚举字符串

.h.小时

@interface NSString (Category)

- (void)enumerateCharactersUsingBlock:(void (^)(NSString *character, NSInteger idx, bool *stop))block;

@end

.m.公司

@implementation NSString (Category)

- (void)enumerateCharactersUsingBlock:(void (^)(NSString *character, NSInteger idx, bool *stop))block
{
    bool _stop = NO;
    for(NSInteger i = 0; i < [self length] && !_stop; i++)
    {
        NSString *character = [self substringWithRange:NSMakeRange(i, 1)];
        block(character, i, &_stop);
    }
}
@end

NSString *string = @"Hello World";
[string enumerateCharactersUsingBlock:^(NSString *character, NSInteger idx, bool *stop) {
        NSLog(@"char %@, i: %li",character, (long)idx);
}];

marcusthierfelder 7 年前

你不应该用

NSUInteger len = [str length];
unichar buffer[len+1];

NSUInteger len = [str length];
unichar* buffer = (unichar*) malloc (len+1)*sizeof(unichar);

free(buffer);

为了避免记忆问题。

CodeOverRide 5 年前

NSString * str = @"hello ð¤ ð©";

NSRange range = NSMakeRange(0, str.length);
[str enumerateSubstringsInRange:range
                          options:NSStringEnumerationByComposedCharacterSequences
                       usingBlock:^(NSString *substring, NSRange substringRange,
                                    NSRange enclosingRange, BOOL *stop)
{
    NSLog(@"%@", substring);
}];