代码之家  ›  专栏  ›  技术社区  ›  Ranadip Dutta Sekhar E

数据帧中的Python DateDiff

  •  0
  • Ranadip Dutta Sekhar E  · 技术社区  · 6 年前

    我基本上是试图将给定的一组数据时间与预定义的阈值进行比较。最终目标是在列超过阈值时获取列中的行。

    以下是我迄今为止尝试的代码:

    #!/usr/bin/python
    
    from datetime import datetime
    import sys
    import logging
    import operator
    import pymysql
    import pandas as pd
    
    db_endpoint = "awsendpoint"
    db_username="user"
    db_password="password"
    db_name="database_name"
    port = 3306
    
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    try:
        conn = pymysql.connect(db_endpoint, user=db_username,
                               passwd=db_password, db=db_name, connect_timeout=5)
    except:
        logger.error("ERROR: Unexpected error: Could not connect to MySql instance.")
        sys.exit()
    
    logger.info("SUCCESS: Connection to RDS mysql instance succeeded")
    
    cur=conn.cursor()
    cur.execute("select talendjobname, taskstartdate from taskexecutionhistory where basicstatus = 'RUNNING'")
    
    #OUTPUT is : 
    [('Prod_Adobe_Master_Process_v2', datetime.datetime(2018, 12, 17, 3, 30)), ('Prod_Sales_n_DG_Master_Process_v2', datetime.datetime(2018, 12, 17, 4, 0)), ('SDG_download_mail_attachments', datetime.datetime(2018, 12, 23, 3, 0, 1))]
    
    aws = []
    for row in cur:
        aws.append(row)
    # All working upto this. 
    aws = pd.DataFrame(aws)
    
    aws_time = aws.iloc[:,1]
    
    ## I am getting the longer running jobs with respect to current time.
    def days_between(d1):
    # d1 = datetime.strptime(d1, "%Y-%m-%d")
    return abs((datetime.now() - d1))
    
    #Here is the problem
       OUTPUT is a list of : 3Days 11 hours 30 mins,
                             2Days 10 hours 12 mins, 
                             so on and so forth
    

    我的阈值是8小时,我无法与此结果进行比较。我想得到一份只跨过这个门槛的工作清单。

    我还尝试了其他一些方法:

    time_passed = []
    for i in range(0,len(aws_time.index)):
        x = days_between(aws_time[i])
    
        time_passed.append(x)
    

    让我知道我遗漏了什么,或者是否有任何不同的方法。TimeDelta是我正在努力学习的主要课程。我试图处理字符串操作,但也无法将输出转换为字符串。

    1 回复  |  直到 6 年前
        1
  •  1
  •   willk    6 年前

    datetime.timedelta Python中的对象有一个名为 .total_seconds() 您可以使用它来查找两次之间的小时数。

    from datetime import datetime, timedelta
    t1 = datetime.now()
    t2 = datetime.now() - timedelta(hours=10)
    type(t1 - t2)
    
    datetime.timedelta
    
    # Find total hours between times
    (t1 - t2).total_seconds() / 3600
    
    9.99999
    

    您可以对操作进行矢量化,以一次查找以小时为单位的所有时间差(使用 .dt 要访问时间差,请执行以下操作:

    # Find the time differences and convert to hours
    aws['time_diff'] = aws.iloc[:, 1].apply(days_between) 
    aws['time_diff_hours'] = aws['time_diff'].dt.total_seconds() / 3600
    

    然后子集到小时数大于8的行

    greater_than_8_hours = aws[aws['time_diff_hours'] > 8]