代码之家  ›  专栏  ›  技术社区  ›  mare

解析第三方XML

  •  1
  • mare  · 技术社区  · 14 年前

    解析没有架构的大型XML文件(2 MB-20 MB或更大)时,您会采用什么路径(我无法用xsd.exe推断一个架构,因为文件结构很奇怪,请检查下面的代码片段)?

    选项

    1)XML反序列化(但如上所述,我没有模式,XSD工具抱怨文件内容)。 2)链接到XML, 3)加载到xmldocument中, 4)使用xmlreader&stuff手动解析。

    这是XML文件片段:

    <?xml version="1.0" encoding="utf-8"?>
    <xmlData date="29.04.2010 12:09:13">
     <Table>
      <ident>079186</ident>
      <stock>0</stock>
      <pricewotax>33.94000000</pricewotax>
      <discountpercent>0.00000000</discountpercent>
     </Table>
     <Table>
      <ident>079190</ident>
      <stock>1</stock>
      <pricewotax>10.50000000</pricewotax>
      <discountpercent>0.00000000</discountpercent>
      <pricebyquantity>
       <Table>
        <quantity>5</quantity>
        <pricewotax>10.00000000</pricewotax>
        <discountpercent>0.00000000</discountpercent>
       </Table>
       <Table>
        <quantity>8</quantity>
        <pricewotax>9.00000000</pricewotax>
        <discountpercent>0.00000000</discountpercent>
       </Table>
      </pricebyquantity>
     </Table>
    </xmlData>
    
    2 回复  |  直到 14 年前
        1
  •  0
  •   code4life    14 年前

    这是XSD:

    <?xml version="1.0" encoding="utf-8"?>
    <xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <xs:element name="xmlData">
        <xs:complexType>
          <xs:sequence>
            <xs:element maxOccurs="unbounded" name="Table">
              <xs:complexType>
                <xs:sequence>
                  <xs:element name="ident" type="xs:int" />
                  <xs:element name="stock" type="xs:int" />
                  <xs:element name="pricewotax" type="xs:double" />
                  <xs:element name="discountpercent" type="xs:double" />
                  <xs:element minOccurs="0" name="pricebyquantity">
                    <xs:complexType>
                      <xs:sequence>
                        <xs:element maxOccurs="unbounded" name="Table">
                          <xs:complexType>
                            <xs:sequence>
                              <xs:element name="quantity" type="xs:int" />
                              <xs:element name="pricewotax" type="xs:double" />
                              <xs:element name="discountpercent" type="xs:double" />
                            </xs:sequence>
                          </xs:complexType>
                        </xs:element>
                      </xs:sequence>
                    </xs:complexType>
                  </xs:element>
                </xs:sequence>
              </xs:complexType>
            </xs:element>
          </xs:sequence>
          <xs:attribute name="date" type="xs:string" use="required" />
        </xs:complexType>
      </xs:element>
    </xs:schema>
    

    下面是可序列化类:

    //------------------------------------------------------------------------------
    // <auto-generated>
    //     This code was generated by a tool.
    //     Runtime Version:2.0.50727.3603
    //
    //     Changes to this file may cause incorrect behavior and will be lost if
    //     the code is regenerated.
    // </auto-generated>
    //------------------------------------------------------------------------------
    
    // 
    // This source code was auto-generated by xsd, Version=2.0.50727.1432.
    // 
    namespace StockInfo {
        using System.Xml.Serialization;
    
    
        /// <remarks/>
        [System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.1432")]
        [System.SerializableAttribute()]
        [System.Diagnostics.DebuggerStepThroughAttribute()]
        [System.ComponentModel.DesignerCategoryAttribute("code")]
        [System.Xml.Serialization.XmlTypeAttribute(AnonymousType=true)]
        [System.Xml.Serialization.XmlRootAttribute(Namespace="", IsNullable=false)]
        public partial class xmlData {
    
            private xmlDataTable[] tableField;
    
            private string dateField;
    
            /// <remarks/>
            [System.Xml.Serialization.XmlElementAttribute("Table")]
            public xmlDataTable[] Table {
                get {
                    return this.tableField;
                }
                set {
                    this.tableField = value;
                }
            }
    
            /// <remarks/>
            [System.Xml.Serialization.XmlAttributeAttribute()]
            public string date {
                get {
                    return this.dateField;
                }
                set {
                    this.dateField = value;
                }
            }
        }
    
        /// <remarks/>
        [System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.1432")]
        [System.SerializableAttribute()]
        [System.Diagnostics.DebuggerStepThroughAttribute()]
        [System.ComponentModel.DesignerCategoryAttribute("code")]
        [System.Xml.Serialization.XmlTypeAttribute(AnonymousType=true)]
        public partial class xmlDataTable {
    
            private int identField;
    
            private int stockField;
    
            private double pricewotaxField;
    
            private double discountpercentField;
    
            private xmlDataTableTable[] pricebyquantityField;
    
            /// <remarks/>
            public int ident {
                get {
                    return this.identField;
                }
                set {
                    this.identField = value;
                }
            }
    
            /// <remarks/>
            public int stock {
                get {
                    return this.stockField;
                }
                set {
                    this.stockField = value;
                }
            }
    
            /// <remarks/>
            public double pricewotax {
                get {
                    return this.pricewotaxField;
                }
                set {
                    this.pricewotaxField = value;
                }
            }
    
            /// <remarks/>
            public double discountpercent {
                get {
                    return this.discountpercentField;
                }
                set {
                    this.discountpercentField = value;
                }
            }
    
            /// <remarks/>
            [System.Xml.Serialization.XmlArrayItemAttribute("Table", IsNullable=false)]
            public xmlDataTableTable[] pricebyquantity {
                get {
                    return this.pricebyquantityField;
                }
                set {
                    this.pricebyquantityField = value;
                }
            }
        }
    
        /// <remarks/>
        [System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.1432")]
        [System.SerializableAttribute()]
        [System.Diagnostics.DebuggerStepThroughAttribute()]
        [System.ComponentModel.DesignerCategoryAttribute("code")]
        [System.Xml.Serialization.XmlTypeAttribute(AnonymousType=true)]
        public partial class xmlDataTableTable {
    
            private int quantityField;
    
            private double pricewotaxField;
    
            private double discountpercentField;
    
            /// <remarks/>
            public int quantity {
                get {
                    return this.quantityField;
                }
                set {
                    this.quantityField = value;
                }
            }
    
            /// <remarks/>
            public double pricewotax {
                get {
                    return this.pricewotaxField;
                }
                set {
                    this.pricewotaxField = value;
                }
            }
    
            /// <remarks/>
            public double discountpercent {
                get {
                    return this.discountpercentField;
                }
                set {
                    this.discountpercentField = value;
                }
            }
        }
    }
    

    一个警告:反序列化可能不是解析20MB文件最有效的方法。xmlreader可能是最快的方法,但这意味着要手动完成任务。

        2
  •  0
  •   Josh Stodola    14 年前

    我会把它装进 XmlDocument 然后使用xpath进行相应的处理。林肯可能是最好的选择,但我对它不是很熟悉,所以我不能说。