我是python新手,希望缩短代码行/减少列出所有可用子节点的工作量:
我的档案。xml如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>green cat w short hair</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>green cat w short hair and unlimited zoomies</DESCRIPTION_LONG>
<BUYER_PID type="supplier_specific">100000000</BUYER_PID>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Hair</FNAME>
<FVALUE>medium</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>green</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Legs</FNAME>
<FVALUE>14</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
我的代码如下所示:
from lxml import etree as et
import pandas as pd
xml_data = et.parse('file.xml')
products = xml_data.xpath('//HEADER')
headers=[elem.tag for elem in xml_data.xpath('//HEADER[1]//PRODUCT_DETAILS//*')]
headers.extend(xml_data.xpath('//HEADER[1]//FNAME/text()'))
rows = []
for product in products:
row = [product.xpath(f'.//{headers[0]}/text()')[0],product.xpath(f'.//{headers[1]}/text()')[0],product.xpath(f'.//{headers[2]}/text()')[0]]
f_values = product.xpath('.//FVALUE/text()')
row.extend(f_values)
rows.append(row)
df = pd.ataFrame(rows,columns=headers)
df
具体来说,我指的是这一部分:
row = [product.xpath(f'.//{headers[0]}/text()')[0],product.xpath(f'.//{headers[1]}/text()')[0],product.xpath(f'.//{headers[2]}/text()')[0]]
当越来越多的子节点被添加到我的xml中时,我现在通过以下方式扩展代码:
,product.xpath(f'.//{headers[3]}/text()')[0],product.xpath(f'.//{headers[4]}/text()')[0],...]
当然,必须有一种更简单的方法,不必事先检查子节点的数量,只需显示所有可用的子节点(?)。这是我的输出顺便说一句:
DESCRIPTION_SHORT DESCRIPTION_LONG BUYER_PID Hair Colour Legs
0 green cat w short hair green cat w short hair and unlimited zoomies 100000000 medium green 14
谢谢你的帮助!
~C