代码之家  ›  专栏  ›  技术社区  ›  jason

regex提取具有未知数字格式的负数

  •  1
  • jason  · 技术社区  · 6 年前

    我可以从这个字符串中提取数字:

    string_p= 'seven 5 blah 6 decimal 6.5 thousands 8,999 with dollar signs $9,000 and $9,500,001.45 end ... lastly.... 8.4% now end

    使用此代码:

    import re
    
    def extractVal2(s,n):
        if n > 0:
            return re.findall(r'[0-9$,.%]+\d*', s)[n-1]
        else:
            return re.findall(r'[0-9$,.%]+\d*', s)[n]
    
    
    for i in range(1,7): 
        print extractVal2(string_n,i)
    

    但我不能用它做负数。负数是括号中的数字。

    string_n= 'seven (5) blah (6) decimal (6.5) thousands (8,999) with dollar signs $(9,000) and $(9,500,001.45) end lastly.... (8.4)% now end'

    我试着先把 () 像这样的负号

    string_n= re.sub(r"\((\d*,?\d*)\)", r"-\1", string_n)

    然后这些得到负数

    r'[0-9$,.%-]+\d*', s)[n]
    r'[0-9$,.%]+-\d*', s)[n]
    r'[-0-9$,.%]+-\d*', s)[n]
    

    即使使用不同的方法:

    words = string_n.split(" ")
    for i in words:
        try:
            print -int(i.translate(None,"(),"))
        except:
            pass
    
    1 回复  |  直到 6 年前
        1
  •  3
  •   Patrick Artner    6 年前

    import re
    
    def extractVal2(s,n):
        try:
            pattern = r'\$?\(?[0-9][0-9,.]*\)?%?'
            if n > 0:
                return re.findall(pattern, s)[n-1].replace("(","-").replace(")","")
            else:
                return re.findall(pattern, s)[n].replace("(","-").replace(")","")
        except IndexError as e:
            return None    
    
    string_n=  ',seven (5) blah (6) decimal (6.5) thousands (8,999) with dollar ' + \
               'signs $(9,000) and $(9,500,001.45) end lastly.... (8.4)%'
    
    for i in range(1,9): 
        print extractVal2(string_n,i)
    

    9,500,001.45 ( $ - ) 2,200.200,22

    -5
    -6
    -6.5
    -8,999
    $-9,000
    $-9,500,001.45
    -8.4%
    None
    

    IndexError re.findall(..)


    leading literal $       (not interpreded as ^...$ end of string)
    optional literal (  
    [0-9]                   one digit
    [0-9,.%]*               any number (maybe 0 times) of the included characters in any order  
                            to the extend that it would mach smth like 000,9.34,2
    optional literal )
    optional literal %