代码之家  ›  专栏  ›  技术社区  ›  William Lombard

一起使用熊猫和xlrd。忽略列标题的缺失/存在

  •  1
  • William Lombard  · 技术社区  · 6 年前

    我希望你能帮助我-我相信这可能是一个小事情来解决,当一个人知道如何。

    在我的工作坊里,我和我的同事都不能通过数据库前端进行“查找和替换所有”更改。老板只是不让我们进入。如果我们需要对几十条或几百条记录进行更改,那么必须通过复制粘贴或类似的方法来完成。疯狂。

    我正在尝试用python2解决这个问题,特别是Pandas、pyautogui和xlrd等库。

    ANR
    51234
    34567
    12345
    ...
    

    我们还可以通过红外扫描仪将物品扫描到iPad上的“工作流”应用程序中,并从扫描的物品列表中自动生成一个XL文件。

    56788
    12345
    89012
    ...
    

    不同之处在于没有列标题。所有的XL文件都将其数据“锚定”在“Sheet1”上的单元格A1上,同样只使用一列。这里没有不必要的麻烦!

    这是剧本。当它完全工作时,系统参数将提供给它。现在,让我们假设需要更改记录以使其“RAM”值从
    "2GB" "2 GB" .

    import xlrd
    import string
    import re
    import pandas as pd
    
    
    field = "RAM"
    value = "2 GB"
    
    myFile = "/Users/me/folder/testArticles.xlsx"
    df = pd.read_excel(myFile)
    myRegex = "^[0-9]{5}$"
    
    
    # data collection and putting into lists.
    workbook = xlrd.open_workbook(myFile)
    sheet = workbook.sheet_by_index(0)
    data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in     range(sheet.nrows)]
    
    formatted = []
    deDuped = []
    
    # removing any possible XL headers, setting all values to strings
    # that look like five-digit ints, apply a regex to be sure.
    for i in data:
        cellValue = str(i)
        cellValue = cellValue.translate(None, '\'[u]\'')
    
    
        # remove the decimal point
        # Searching for the header will cause a database front-end problem. 
        cellValue = cellValue[:-2]
        cellValue = cellValue.translate(None, string.letters)
    
        # making sure only valid article numbers get through
        # blank rows etc can take a hike
        if len(cellValue) != 0:
            if re.match(myRegex, cellValue):
                formatted.append(cellValue)
    
    # weeding out any possilbe dupes.
    for i in formatted:
        if i not in deDuped:
            deDuped.append(i)
    
    
    #main code block
    for i in deDuped:
    
        #lots going on here involving pyauotgui
        #making sure of no error running searches, checking for warnings, moving/tabbing around DB front-end etc
    
        #if all goes to plan
        #removing that record number from the excel file and saving the change
        #so that if we run the script again for the same XL file 
        #we don't needlessly update an already OK record again. 
    
            df = df[~df['ANR'].astype(str).str.startswith(i)]
            df.to_excel(myFile, index=False)
    

    不在乎 “关于是否存在列标题。

    df = df[~df['ANR'].astype(str).str.startswith(i)]
    

    如果列标题(在我的例子中称为“ANR”)对于这种特殊的“pandas”方法是必不可少的,那么如果列标题首先缺少一个,是否有一种直接的方法将列标题插入到XL文件中,即来自IR扫描仪和iPad上的“Workflow”应用程序的XL文件?

    谢谢你们!

    我试着按照Patrick的建议实现一些代码来检查单元格“A1”是否有头。部分成功。我可以把“ANR”放在A1号房,如果它不见了,但我一开始就把它丢了。

    import xlwt
    from openpyxl import Workbook, load_workbook
    from xlutils.copy import copy
    import openpyxl
    
    # data collection
    workbook = xlrd.open_workbook(myFile)
    sheet = workbook.sheet_by_index(0)
    data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
    
    
    
    cell_a1 = sheet.cell_value(rowx=0, colx=0)
    
    if cell_a1 == "ANR":
        print "has header"
    else:
        wb = openpyxl.load_workbook(filename= myFile)
        ws = wb['Sheet1']
        ws['A1'] = "ANE"
        wb.save(myFile)
        #re-open XL file again etc etc.
    

    我发现这段新代码在 writing to existing workbook using xlwt . 在这个例子中,贡献者实际上使用了openpyxl。

    1 回复  |  直到 6 年前
        1
  •  0
  •   William Lombard    6 年前

    我想我自己修好了。

    还是有点凌乱,但似乎在工作。添加了“if/else”子句来检查单元格A1的值并采取相应的操作。在上找到了此的大部分代码 how to append data using openpyxl python to excel file from a specified row? -使用openpyxl的建议

    import pyperclip
    import xlrd
    import pyautogui
    import string
    import re
    import os
    import pandas as pd
    import xlwt
    from openpyxl import Workbook, load_workbook
    from xlutils.copy import copy
    
    
    field = "RAM"
    value = "2 GB"
    myFile = "/Users/me/testSerials.xlsx"
    df = pd.read_excel(myFile)
    
    
    myRegex = "^[0-9]{5}$"
    
    # data collection
    workbook = xlrd.open_workbook(myFile)
    sheet = workbook.sheet_by_index(0)
    data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
    
    cell_a1 = sheet.cell_value(rowx=0, colx=0)
    
    if cell_a1 == "ANR":
        print "has header"
    else:
        headers = ['ANR']
        workbook_name = 'myFile'
        wb = Workbook()
        page = wb.active
        # page.title = 'companies'
        page.append(headers)  # write the headers to the first line
    
        workbook = xlrd.open_workbook(workbook_name)
        sheet = workbook.sheet_by_index(0)
        data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
    
        for records in data:
            page.append(records)
    
            wb.save(filename=workbook_name)
    
            #then load the data all over again, this time with inserted header
            workbook = xlrd.open_workbook(myFile)
            sheet = workbook.sheet_by_index(0)
            data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
    
    
    formatted = []
    deDuped = []
    
    # removing any possible XL headers, setting all values to strings that look like five-digit ints, apply a regex to be sure.
    for i in data:
        cellValue = str(i)
        cellValue = cellValue.translate(None, '\'[u]\'')
    
        # remove the decimal point
        cellValue = cellValue[:-2]
        # cellValue = cellValue.translate(None, ".0")
        cellValue = cellValue.translate(None, string.letters)
    
        # making sure any valid ANRs get through
        if len(cellValue) != 0:
            if re.match(myRegex, cellValue):
                formatted.append(cellValue)
    # ------------------------------------------
    
    # weeding out any possilbe dupes.
    for i in formatted:
        if i not in deDuped:
            deDuped.append(i)
    
    
    # ref - https://stackoverflow.com/questions/48942743/python-pandas-to-remove-rows-in-excel
    df = pd.read_excel(myFile)
    
    print df
    
    
    for i in deDuped:
        #pyautogui code is run here...
    
        #if all goes to plan update the XL file
            df = df[~df['ANR'].astype(str).str.startswith(i)]
    
            df.to_excel(myFile, index=False)