代码之家  ›  专栏  ›  技术社区  ›  geoJshaun

Python,从列表中构建dict,其中某些项作为键,中间的项作为值

  •  0
  • geoJshaun  · 技术社区  · 6 年前

    我将一个文本文件分解为以下格式的字符串列表:

    ['DATE','NAME', 'RT','1A','541','09947','199407',552','09949','BOON','101C','SMITH','00321','1553678','1851243','561','559','004789',1749201',ANDERSON']
    

    我想使用项[0:-1]中的项创建dict。isdigit()和项[-1]。isalpha(),因此在上面的示例中,这将是1A和101C。然后我只想添加int(item.isdigit())>100000个符合此标准的项通过for循环(或while循环)组装到新列表中,直到循环命中下一个键值。

    结果将是dct= {'1A': ['199407'], '101C':['1553678','1851243','1749201']}

    我目前得到了一个索引错误,尽管在迭代达到键列表中项目的长度时设置了一个中断条件。在出现这个错误之前,我对这些值进行了不同的索引,并得到了一个空dict。我希望在索引错误修复后得到另一个空dict。

    这是我的代码:

    # create a list of the dictionary keys to find values in 1A format
    # in order to avoid key error when building dict, do not add duplicate 
    # values to list. Needs to be a list andd not tuple so it can be indexed
    
    for line in lines:
        if line[0:-1].isdigit() and line[-1].isalpha() and line not in keys:
            keys.append(line)
    
    print str(keys) + " " + str(len(keys))
    
    
    # build a list of values for each item in keys. Should find the first
    # key and check if a converted string to number is > 100000. If it is
    # the value is appended to the valLst. If the next key is encountered
    # the nested loop breaks and valLst is added to the current key. The 
    # primary loop moves to the next key while the nested loop should only 
    # consider items between the current primary iterable and the next.
    
    passes = 0
    while passes <=len(keys): # exit loop before index error
        for key in keys:
            passes += 1 
            curKey = keys.index(key) # current primary iterable position 
            nextKey = curKey + 1 # next primary iterable position
            print "Passes: " + str(passes)
            valLst = [] # empty list for dct values--resets after nested loop break
            for line in lines: #iterate through text
                if line == keys[nextKey]: # the next key value is encountered in text
                    break
                    dict[key] = valLst # valList added to current dict key
                curLine = lines.index(line) # start at current key value found in text
                if curLine == key: # find current key in text
                    nextLine = curLine + 1 # get index of next value after current key in text
                    val = lines[nextLine] # next text value
                    if val.isdigit(): #append value to valLst if it is > 100000
                        num = int(val)
                        if num > 100000:
                            valLst.append(num)
    

    Traceback (most recent call last):
      File "C:\Python27\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 323, in RunScript
        debugger.run(codeObject, __main__.__dict__, start_stepping=0)
      File "C:\Python27\Lib\site-packages\pythonwin\pywin\debugger\__init__.py", line 60, in run
        _GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
      File "C:\Python27\Lib\site-packages\pythonwin\pywin\debugger\debugger.py", line 654, in run
        exec cmd in globals, locals
      File "C:\Users\user\Desktop\Scripts\PDF_Extractor.py", line 1, in <module>
        from cStringIO import StringIO
    IndexError: list index out of range
    

    我一直在研究列表的理解,但还没有掌握到足以在这种情况下应用列表的程度。我是否朝着正确的方向使用上述代码,或者是否有一种列表理解方法,我可以采取如下方式:

    valLst = {key for keys in lines for line in line if line == key and int(line.isdigit()) > 100000 valLst.append(line)}
    
    1 回复  |  直到 6 年前
        1
  •  0
  •   chickity china chinese chicken    6 年前
    keys = ['DATE', 'NAME', 'RT', '1A', '541', '09947', '199407', '552', '09949', 'BOON', \
            '101C', 'SMITH', '00321', '1553678', '1851243', '561', '559', '004789', '1749201', 'ANDERSON']
    
    from collections import OrderedDict
    
    valList = OrderedDict()
    
    for k in keys:
        if len(k) > 0:
            if k[0].isdigit() and k[-1].isalpha() and ' ' not in k and k not in valList.keys():
                valList[k] = []
            try:
                if int(k) > 100000:
                    try:
                        valList[valList.keys()[-1]].append(k)
                    except ValueError:
                        valList[valList.keys()[-1]] = k
            except ValueError:
                continue
    
    print valList
    

    输出:

    OrderedDict([('1Y', ['15538870', '15922112', '16037395', '16069918', '16116102', '16292996', '16658378', '16700710', '16783588', '16832641', '16944735', '16994444', '313132', '12722185', '11415965', '10966593', '9983979', '8573715', '11733178', '552204', '3150537', '552422', '8013132', '9298415', '8742458', '8626402', '4708497', '11687768', '12192686', '734061', '734171', '9896029', '8636757', '2662814', '10407886', '11730755', '4504371', '9187313', '2362896', '7891338', '3519990', '12293652', '9226220', '5984854', '3295145', '1068579', '2031247', '11242586', '8408050', '8440673', '2752194', '5843333', '1740045', '2584772']), ('2A', ['16174735', '16330036', '16334662', '16345573', '16350100', '16376985', '16397823', '16411821', '16435182', '16443451', '16449626', '16574945', '16590154', '16597759', '16615837', '16649016', '16756921', '16762759', '16795828', '16879043', '16887968', '16900090', '16900428', '16902522', '16910127']), ('3A', ['16320336', '16328934', '16331684', '16346347', '16360892', '16370045', '16407413', '16408287', '16444990', '16446211', '16453706', '16467695', '16468032', '11697249', '11843287', '1339389', '2435865', '10001948', '4760965', '2480063', '13588296', '1813233', '11741885', '8972714', '9688478', '16070245']), ('3Y', ['13226120', '13232404', '13233834', '13235601', '13238679', '13241985', '13247504', '13249817', '13262823', '13268442', '13269981', '13270318', '13272413', '13282003', '13284535', '13288943', '13294453'])])
    

    或者一次检查一个字典,以确认我们得到了预期的字典键和项目:

    for d in valList.items():
        print d
    
    OrderedDict([
    ('1Y', ['15538870', '15922112', '16037395', '16069918', '16116102', '16292996', '16658378', '16700710', '16783588', '16832641', '16944735', '16994444', '313132', '12722185', '11415965', '10966593', '9983979', '8573715', '11733178', '552204', '3150537', '552422', '8013132', '9298415', '8742458', '8626402', '4708497', '11687768', '12192686', '734061', '734171', '9896029', '8636757', '2662814', '10407886', '11730755', '4504371', '9187313', '2362896', '7891338', '3519990', '12293652', '9226220', '5984854', '3295145', '1068579', '2031247', '11242586', '8408050', '8440673', '2752194', '5843333', '1740045', '2584772']), ('2A', ['16174735', '16330036', '16334662', '16345573', '16350100', '16376985', '16397823', '16411821', '16435182', '16443451', '16449626', '16574945', '16590154', '16597759', '16615837', '16649016', '16756921', '16762759', '16795828', '16879043', '16887968', '16900090', '16900428', '16902522', '16910127']), ('3A', ['16320336', '16328934', '16331684', '16346347', '16360892', '16370045', '16407413', '16408287', '16444990', '16446211', '16453706', '16467695', '16468032', '11697249', '11843287', '1339389', '2435865', '10001948', '4760965', '2480063', '13588296', '1813233', '11741885', '8972714', '9688478', '16070245']), ('3Y', ['13226120', '13232404', '13233834', '13235601', '13238679', '13241985', '13247504', '13249817', '13262823', '13268442', '13269981', '13270318', '13272413', '13282003', '13284535', '13288943', '13294453'])])
    ('1Y', ['15538870', '15922112', '16037395', '16069918', '16116102', '16292996', '16658378', '16700710', '16783588', '16832641', '16944735', '16994444', '313132', '12722185', '11415965', '10966593', '9983979', '8573715', '11733178', '552204', '3150537', '552422', '8013132', '9298415', '8742458', '8626402', '4708497', '11687768', '12192686', '734061', '734171', '9896029', '8636757', '2662814', '10407886', '11730755', '4504371', '9187313', '2362896', '7891338', '3519990', '12293652', '9226220', '5984854', '3295145', '1068579', '2031247', '11242586', '8408050', '8440673', '2752194', '5843333', '1740045', '2584772'])
    ('2A', ['16174735', '16330036', '16334662', '16345573', '16350100', '16376985', '16397823', '16411821', '16435182', '16443451', '16449626', '16574945', '16590154', '16597759', '16615837', '16649016', '16756921', '16762759', '16795828', '16879043', '16887968', '16900090', '16900428', '16902522', '16910127'])
    ('3A', ['16320336', '16328934', '16331684', '16346347', '16360892', '16370045', '16407413', '16408287', '16444990', '16446211', '16453706', '16467695', '16468032', '11697249', '11843287', '1339389', '2435865', '10001948', '4760965', '2480063', '13588296', '1813233', '11741885', '8972714', '9688478', '16070245'])
    ('3Y', ['13226120', '13232404', '13233834', '13235601', '13238679', '13241985', '13247504', '13249817', '13262823', '13268442', '13269981', '13270318', '13272413', '13282003', '13284535', '13288943', '13294453'])