代码之家 › 专栏 › 技术社区 › zzzeek

偏移量与行数()

postgresql

zzzeek · 技术社区 · 14 年前

正如我们所知,PostgreSQL的偏移要求它扫描所有行,直到到达您请求的位置,这使得通过巨大的结果集进行分页变得有点无用,随着偏移量的增加,变得越来越慢。

PG 8.4现在支持窗口功能。而不是:

SELECT * FROM table ORDER BY somecol LIMIT 10 OFFSET 500

你可以说:

SELECT * FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY somecol ASC) AS rownum FROM table) AS foo
WHERE rownum > 500 AND rownum <= 510

后一种方法对我们有帮助吗?还是我们必须继续使用标识列和临时表来进行大分页?

2 回复 | 直到 14 年前

zzzeek 14 年前

我构建了一个测试,比较偏移量、光标和行编号()。我对row_number()的印象是正确的,即无论结果集在哪里,它的速度都是一致的。然而,这个速度比偏移或光标都要慢得多,正如我的印象一样,速度几乎是一样的,都会在速度上降低,直到结果结束。

结果:

offset(100,100): 0.016359
scroll(100,100): 0.018393
rownum(100,100): 15.535614

offset(100,480000): 1.761800
scroll(100,480000): 1.781913
rownum(100,480000): 15.158601

offset(100,999900): 3.670898
scroll(100,999900): 3.664517
rownum(100,999900): 14.581068

from sqlalchemy import *

metadata = MetaData()
engine = create_engine('postgresql://scott:tiger@localhost/test', echo=True)

t1 = Table('t1', metadata,
    Column('id', Integer, primary_key=True),
    Column('d1', String(50)),
    Column('d2', String(50)),
    Column('d3', String(50)),
    Column('d4', String(50)),
    Column('d5', String(50))
)

if not engine.has_table('t1'):
    conn = engine.connect()
    t1.create(conn)

    # 1000000 rows
    for i in range(100):
        conn.execute(t1.insert(), [
            dict(
                ('d%d' % col, "data data data %d %d" % (col, (i * 10000) + j))
                for col in range(1, 6)
            ) for j in xrange(1, 10001)
        ])

import time

def timeit(fn, count, *args):
    now = time.time()
    for i in xrange(count):
        fn(*args)
    total = time.time() - now
    print "%s(%s): %f" % (fn.__name__, ",".join(repr(x) for x in args), total)

# this is a raw psycopg2 connection.
conn = engine.raw_connection()

def offset(limit, offset):
    cursor = conn.cursor()
    cursor.execute("select * from t1 order by id limit %d offset %d" % (limit, offset))
    cursor.fetchall()
    cursor.close()

def rownum(limit, offset):
    cursor = conn.cursor()
    cursor.execute("select * from (select *, "
                    "row_number() over (order by id asc) as rownum from t1) as foo "
                    "where rownum>=%d and rownum<%d" % (offset, limit + offset))
    cursor.fetchall()
    cursor.close()

def scroll(limit, offset):
    cursor = conn.cursor('foo')
    cursor.execute("select * from t1 order by id")
    cursor.scroll(offset)
    cursor.fetchmany(limit)
    cursor.close()

print 

timeit(offset, 10, 100, 100)
timeit(scroll, 10, 100, 100)
timeit(rownum, 10, 100, 100)

print 

timeit(offset, 10, 100, 480000)
timeit(scroll, 10, 100, 480000)
timeit(rownum, 10, 100, 480000)

print 

timeit(offset, 10, 100, 999900)
timeit(scroll, 10, 100, 999900)
timeit(rownum, 10, 100, 999900)

Frank Heikens 14 年前

使用A CURSOR 对于一个大的结果集,速度会快得多。对于较小的结果设置了限值补偿施工工程,但它有它的限制。

行数是一件好事,但不是为了分页。由于连续扫描,您的性能很差。