代码之家  ›  专栏  ›  技术社区  ›  dhana

“ascii”编解码器无法解码位置319中的字节0xef:序号不在范围(128)内?

  •  2
  • dhana  · 技术社区  · 11 年前

    我在这里对数据进行编码

    post = """
    ='Brand New News Fr0m The Timber Industry!!'=
    
    ========Latest Profile==========
    Energy & Asset Technology, Inc. (EGTY)
    Current Price $0.15
    ================================
    
    Recognize this undiscovered gem which is poised to jump!! 
    
    Please read the following Announcement in its Entierty and 
    Consider the Possibilities�
    Watch this One to Trad,e!
    
    Because, EGTY has secured the global rights to market 
    genetically enhanced fast growing, hard-wood trees!
    
    EGTY trading volume is beginning to surge with landslide Announcement. 
    The value of this Stoc,k appears poised for growth! This one will not 
    remain on the ground floor for long.
    
    KEEP READING!!!!!!!!!!!!!!!
    
    ===============
    "BREAKING NEWS"
    ===============
    
    -Energy and Asset Technology, Inc. (EGTY) owns a global license to market
    the genetically enhanced Global Cedar growth trees, with plans to 
    REVOLUTIONIZE the forest-timber industry. 
    
    These newly enhanced Globa| Cedar trees require only 9-12 years of growth 
    before they can be harvested for lumber, whereas worldwide growth time for 
    lumber is 30-50 years. 
    
    Other than growing at an astonishing rate, the Global Cedar has a number 
    of other benefits. Its natural elements make it resistant to termites, and 
    the lack of oils and sap found in the wood make it resistant to forest fire, 
    ensuring higher returns on investments.
    T
    he wood is very lightweight and strong, lighter than Poplar and over twice
    as strong as Balsa, which makes it great for construction. It also has 
    the unique ability to regrow itself from the stump, minimizing the land and
    time to replant and develop new root systems.
    
    Based on current resources and agreements, EGTY projects revenues of $140 
    Million with an approximate profit margin of 40% for each 9-year cycle. With 
    anticipated growth, EGTY is expected to challenge Deltic Timber Corp. during 
    its initial 9-year cycle.
    
    Deltic Timber Corp. currently trades at over $38.00 a share with about $153 
    Million in revenues. As the reputation and demand for the Global Cedar tree 
    continues to grow around the world EGTY believes additional multi-million 
    dollar agreements will be forthcoming. The Global Cedar nursery has produced 
    about 100,000 infant plants and is developing a production growth target of 
    250,000 infant plants per month.
    
    Energy and Asset Technology is currently in negotiations with land and business 
    owners in New Zealand, Greece and Malaysia regarding the purchase of their popular 
    and profitable fast growing infant tree plants. Inquiries from the governments of 
    Brazil and Ecuador are also being evaluated.
    
    Conclusion:
    
    The examples above show the Awesome, Earning Potential of little
    known Companies That Explode onto Investor�s Radar Screens. 
    This s-t0ck will not be a Secret for long. Then You May Feel the Desire to Act Right 
    Now! And Please Watch This One Trade!!
    
    
    GO EGTY!
    
    
    All statements made are our express opinion only and should be treated as such.
    We may own, take position and sell any securities mentioned at any time. Any 
    statements that express or involve discussions with respect to predictions, 
    goals, expectations, beliefs, plans, projections, object'ives, assumptions or 
    future events or perfo'rmance are not
    statements of historical fact and may be 
    "forward,|ooking statements." forward,|ooking statements are based on expectations, 
    estimates and projections at the time the statements are made that involve a number 
    of risks and uncertainties which could cause actual results or events to differ 
    materially from those presently anticipated. This newsletter was paid $3,000 from 
    third party (IR Marketing). Forward,|ooking statements in this action may be identified 
    through the use of words such as: "pr0jects", "f0resee", "expects". in compliance with 
    Se'ction 17. {b), we disclose the holding of EGTY shares prior to the publication of 
    this report. Be aware of an inherent conflict of interest resulting from such holdings 
    due to our intent to profit from the liquidation of these shares. Shar,es may be sold 
    at any time, even after positive statements have been made regarding the above company. 
    Since we own shares, there is an inherent conflict of interest in our statements and 
    opinions. Readers of this publication are cautioned not 
    to place undue reliance on 
    forward,|ooking statements, which are based on certain assumptions and expectations 
    involving various risks and uncertainties that could cause results to differ materially 
    from those set forth in the forward- looking statements. This is not solicitation to 
    buy or sell st-0cks, this text is or informational purpose only and you should seek 
    professional advice from registered financial advisor before you do anything related 
    with buying or selling st0ck-s, penny st'0cks are very high risk and you can lose your 
    entire inves,tment.
    """
    
    In [147]: post.encode('utf-8')
    

    我得到了输出

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 319: ordinal not in range(128)
    
    2 回复  |  直到 11 年前
        1
  •  3
  •   Don Question    11 年前

    Unicode是一个试图包含(所有)已知字母、字符和符号的表,通常也称为字形。这一数字略高于11万,意味着在自动取款机上持有标志。因此,DECODED状态是这个表中的一个(代码)点。但是,因为一个字节不能容纳超过8bits=256个状态,所以必须将unicode表示形式编码为字节流。最常用的编码技术是所谓的UTF-8编码,它继承了旧的ASCII编码。UTF-8编码允许使用一到四个字节对Unicode字形进行编码。

    所以编码或解码总是从unicode开始或朝着unicode进行。如果你想从一种编码转换到另一种编码,你必须通过unicode来完成:

        [decode]     [encode]
    ASCII ---> UNICODE ---> UTF-8
    1 Glyph                 1 Glyph 
      =        1 Glyph        =
    1 Byte                  1-4 Bytes
    

       unicode_str = mystring.decode('ascii')
       utf8_str = unicode_str.encode('utf-8')
    

    (不是最好的例子,因为ASCII总是适合utf-8)

    所以如果你想解码你的 post 变量,您必须知道哪个编码具有引用的字符串。在python2.x中,它通常是ASCII编码的。在python 3.x中,它应该是UTF-8。

    import sys
    print sys.getdefaultencoding()
    

    如果你 邮递 -变量不是在源代码中定义的,而是从外部字节流中读取的 必须知道 编码,否则你会倒霉的。

        2
  •  3
  •   Wooble    11 年前

    首先,通过将其作为文件的第二行(或者第一行,如果您不使用shebang),告诉Python您正在使用什么编码:

    # coding=utf-8
    

    (参见 PEP 263 )

    然后,不要使用字节字符串,而是始终对文本内容使用unicode文字:

    post = u"""
    ='Brand New News Fr0m The Timber Industry!!'=
    etc. etc. etc."""