代码之家  ›  专栏  ›  技术社区  ›  Isabelle Newbie

逐码点读取文本流

  •  5
  • Isabelle Newbie  · 技术社区  · 6 年前

    InputStreamReader 类返回流的内容 int 通过 内景 ,我希望它能做我想做的,但它不能组成代理对。

    import java.io.*;
    import java.nio.charset.*;
    
    class TestChars {
        public static void main(String args[]) {
            InputStreamReader reader =
                new InputStreamReader(System.in, StandardCharsets.UTF_8);
            try {
                System.out.print("> ");
                int code = reader.read();
                while (code != -1) {
                    String s =
                        String.format("Code %x is `%s', %s.",
                                      code,
                                      Character.getName(code),
                                      new String(Character.toChars(code)));
                    System.out.println(s);
                    code = reader.read();
                }
            } catch (Exception e) {
            }
        }
    }
    

    其行为如下:

    $ java TestChars 
    > keyboard ⌨. pizza 🍕
    Code 6b is `LATIN SMALL LETTER K', k.
    Code 65 is `LATIN SMALL LETTER E', e.
    Code 79 is `LATIN SMALL LETTER Y', y.
    Code 62 is `LATIN SMALL LETTER B', b.
    Code 6f is `LATIN SMALL LETTER O', o.
    Code 61 is `LATIN SMALL LETTER A', a.
    Code 72 is `LATIN SMALL LETTER R', r.
    Code 64 is `LATIN SMALL LETTER D', d.
    Code 20 is `SPACE',  .
    Code 2328 is `KEYBOARD', ⌨.
    Code 2e is `FULL STOP', ..
    Code 20 is `SPACE',  .
    Code 70 is `LATIN SMALL LETTER P', p.
    Code 69 is `LATIN SMALL LETTER I', i.
    Code 7a is `LATIN SMALL LETTER Z', z.
    Code 7a is `LATIN SMALL LETTER Z', z.
    Code 61 is `LATIN SMALL LETTER A', a.
    Code 20 is `SPACE',  .
    Code d83c is `HIGH SURROGATES D83C', ?.
    Code df55 is `LOW SURROGATES DF55', ?.
    Code a is `LINE FEED (LF)', 
    .
    

    我的问题是,组成比萨饼表情符号的代理对是分开阅读的。我想把这个符号读成一个单字 内景 就这样吧。

    是否有一个reader(-like)类在读取时自动将代理项对组合到字符中(并且,如果输入格式不正确,可能会引发异常。)

    我知道我可以自己创作这对,但我宁愿避免重新发明轮子。

    2 回复  |  直到 6 年前
        1
  •  4
  •   Shawn    6 年前

    如果你利用 String

    import java.io.*;
    
    class cptest {
        public static void main(String[] args) {
            try (BufferedReader br =
                    new BufferedReader(new InputStreamReader(System.in, "UTF-8"))) {
                br.lines().flatMapToInt(String::codePoints).forEach(cptest::print);
            } catch (Exception e) {
                System.err.println("Error: " + e);
            }
        }
        private static void print(int cp) {
            String s = new String(Character.toChars(cp));
            System.out.println("Character " + cp + ": " + s);
        }
    }
    

    将产生

    $ java cptest <<< "keyboard ⌨. pizza 🍕"
    Character 107: k
    Character 101: e
    Character 121: y
    Character 98: b
    Character 111: o
    Character 97: a
    Character 114: r
    Character 100: d
    Character 32:  
    Character 9000: ⌨
    Character 46: .
    Character 32:  
    Character 112: p
    Character 105: i
    Character 122: z
    Character 122: z
    Character 97: a
    Character 32:  
    Character 127829: 🍕
    
        2
  •  4
  •   Codo    6 年前

    你可以把它包起来 读卡器 使用简单类解码代理项对的实例:

    import java.io.Closeable;
    import java.io.IOException;
    import java.io.Reader;
    
    public class CodepointStream implements Closeable {
    
        private Reader reader;
    
        public CodepointStream(Reader reader) {
            this.reader = reader;
        }
    
        public int read() throws IOException {
            int unit0 = reader.read();
            if (unit0 < 0)
                return unit0; // EOF
    
            if (!Character.isHighSurrogate((char)unit0))
                return unit0;
    
            int unit1 = reader.read();
            if (unit1 < 0)
                return unit1; // EOF
    
            if (!Character.isLowSurrogate((char)unit1))
                throw new RuntimeException("Invalid surrogate pair");
    
            return Character.toCodePoint((char)unit0, (char)unit1);
        }
    
        public void close() throws IOException {
            reader.close();
            reader = null;
        }
    }
    

    主要的 功能需要稍加修改:

    import java.io.InputStreamReader;
    import java.nio.charset.StandardCharsets;
    
    public final class App {
        public static void main(String args[]) {
            CodepointStream reader = new CodepointStream(
                    new InputStreamReader(System.in, StandardCharsets.UTF_8));
            try {
                System.out.print("> ");
                int code = reader.read();
                while (code != -1) {
                    String s =
                            String.format("Code %x is `%s', %s.",
                                    code,
                                    Character.getName(code),
                                    new String(Character.toChars(code)));
                    System.out.println(s);
                    code = reader.read();
                }
            } catch (Exception e) {
            }
        }
    }
    

    > keyboard ⌨. pizza 🍕
    Code 6b is `LATIN SMALL LETTER K', k.
    Code 65 is `LATIN SMALL LETTER E', e.
    Code 79 is `LATIN SMALL LETTER Y', y.
    Code 62 is `LATIN SMALL LETTER B', b.
    Code 6f is `LATIN SMALL LETTER O', o.
    Code 61 is `LATIN SMALL LETTER A', a.
    Code 72 is `LATIN SMALL LETTER R', r.
    Code 64 is `LATIN SMALL LETTER D', d.
    Code 20 is `SPACE',  .
    Code 2328 is `KEYBOARD', ⌨.
    Code 2e is `FULL STOP', ..
    Code 20 is `SPACE',  .
    Code 70 is `LATIN SMALL LETTER P', p.
    Code 69 is `LATIN SMALL LETTER I', i.
    Code 7a is `LATIN SMALL LETTER Z', z.
    Code 7a is `LATIN SMALL LETTER Z', z.
    Code 61 is `LATIN SMALL LETTER A', a.
    Code 20 is `SPACE',  .
    Code 1f355 is `SLICE OF PIZZA', 🍕.
    Code a is `LINE FEED (LF)', 
    .