代码之家  ›  专栏  ›  技术社区  ›  Aakash aggarwal

如何从网页中提取表

  •  0
  • Aakash aggarwal  · 技术社区  · 6 年前

    我一直在尝试从网页中提取表。
    我不知道下一步该怎么做,这是我写的。

    import requests
    from bs4 import BeautifulSoup
    page= requests.get('http://www.moneycontrol.com/financials/nmdc/ratios/NMD02')
    soup = BeautifulSoup(page.text, 'html.parser')
    table = soup.find(class_='tabns MR10')
    

    现在我不知道该怎么办。我找不到桌子。

    2 回复  |  直到 6 年前
        1
  •  0
  •   mp035    6 年前

    类tabs和mr10不引用您试图获取的页面上的表。这个类组合引用了一个包含无序列表的DIV,该列表在表的顶部列出了选项卡。课堂上。DET看起来能拿到你的桌子,但不知道你想刮什么,我不能确定。

    试试这个:

    #! /usr/bin/env python3
    import requests
    from bs4 import BeautifulSoup
    page= requests.get('http://www.moneycontrol.com/financials/nmdc/ratios/NMD02')
    soup = BeautifulSoup(page.text, 'html.parser')
    table = soup.findAll(class_='det')
    
    for node in table:
        if 'colspan' in node.attrs:
            if len(node.contents) == 1:
                print('')
                print(node.contents[0].ljust(48), end="")
    
        elif len(node.contents) == 1:
            print(node.contents[0].ljust(10), end="")
    

    这就是我从代码中得到的:

    Face Value                                      1.00      1.00      1.00          1.00      1.00      
    Dividend Per Share                              5.15      11.00     8.55      8.50      7.00      
    Operating Profit Per Share (Rs)                 11.38     8.04      19.62     19.61     18.60     
    Net Operating Profit Per Share (Rs)             27.90     16.28     31.17     30.41     27.00     
    Free Reserves Per Share (Rs)                    --        --        --        --        --        
    Bonus in Equity Capital                         83.53     66.66     66.66     66.66     66.66     
    Operating Profit Margin(%)                      40.79     49.40     --        64.46     68.89     
    Profit Before Interest And Tax Margin(%)        34.97     36.23     --        53.86     55.91     
    Gross Profit Margin(%)                          38.57     46.18     --        63.21     67.60     
    Cash Profit Margin(%)                           28.61     41.57     --        46.11     50.05     
    Adjusted Cash Margin(%)                         28.61     41.57     45.80     46.11     50.05     
    Net Profit Margin(%)                            29.32     46.90     51.97     53.24     59.25     
    Adjusted Net Profit Margin(%)                   26.59     36.79     --        45.36     49.00     
    Return On Capital Employed(%)                   19.15     15.04     --        32.40     34.44     
    Return On Net Worth(%)                          11.49     10.05     --        21.40     23.05     
    Adjusted Return on Net Worth(%)                 11.49     10.67     20.21     21.26     23.04      
    Return on Assets Excluding Revaluations         71.17     75.95     81.55     75.64     69.39     
    Return on Assets Including Revaluations         71.17     75.95     81.55     75.64     69.39     
    Return on Long Term Funds(%)                    19.15     15.79     30.56     32.40     34.44     
    Current Ratio                                   3.52      4.44      11.63     16.52     7.73      
    Quick Ratio                                     3.20      11.73     11.31     16.06     7.54      
    Debt Equity Ratio                               --        0.05      --        --        --        
    Long Term Debt Equity Ratio                     --        --        --        --        --        
    Interest Cover                                  207.82    72.68     --        5,252.61  717.84    
    Total Debt to Owners Fund                       --        0.05      --        --        --        
    Financial Charges Coverage Ratio                217.27    75.86     --        5,333.91  728.34    
    Financial Charges Coverage Ratio Post Tax       135.17    50.45     --        3,552.62  491.98    
    Inventory Turnover Ratio                        16.35     10.14     --        17.71     16.81     
    Debtors Turnover Ratio                          6.01      3.54      7.72      9.53      11.77     
    Investments Turnover Ratio                      16.35     10.14     17.87     17.71     16.81     
    Fixed Assets Turnover Ratio                     4.26      1.96      --        4.66      4.46      
    Total Assets Turnover Ratio                     0.84      0.30      --        0.40      0.39      
    Asset Turnover Ratio                            0.54      0.24      0.40      0.42      0.41      
    Average Raw Material Holding                    --        --        --        --        --        
    Average Finished Goods Held                     --        --        --        --        --        
    Number of Days In Working Capital               156.37    1,054.96  644.26    680.29    752.56    
    Material Cost Composition                       3.05      4.52      2.81      3.09      2.69      
    Imported Composition of Raw Materials Consumed  --        --        --        --        --        
    Selling Distribution Cost Composition           0.08      0.23      --        --        --        
    Expenses as Composition of Total Sales          --        --        --        --        --        
    Dividend Payout Ratio Net Profit                50.71     144.01    52.78     52.49     43.75     
    Dividend Payout Ratio Cash Profit               47.14     134.76    51.48     51.29     42.82     
    Earning Retention Ratio                         49.30     -35.71    48.14     47.15     56.23     
    Cash Earning Retention Ratio                    52.87     -27.46    49.39     48.37     57.17     
    AdjustedCash Flow Times                         --        0.44      --        --        --   
    
        2
  •  0
  •   Oguz A.    6 年前

    你可以用 pyquery 很容易。

    import requests
    from pyquery import PyQuery as pq
    page = requests.get('http://www.moneycontrol.com/financials/nmdc/ratios/NMD02')
    html = pq(page.content)
    rows = html(".table4:last tr")
    for tr in rows:
        # tr.find("td")
        # pq(tr)("td")
        print tr