代码之家 › 专栏 › 技术社区 › max

python:如何创建用于调试的持久内存结构

python-3.x persistence debugging python

max · 技术社区 · 14 年前

[巨蟒3.1]

我的程序需要很长时间才能运行,因为 pickle.load 方法对一个巨大的数据结构。这使得调试非常烦人和耗时:每次我做一个小的更改,我都需要等待几分钟,看看回归测试是否通过。

我想换一个 pickle 具有内存中的数据结构。

我曾想过在一个进程中启动一个python程序,并从另一个进程连接到它;但我担心进程间通信开销会很大。

也许我可以从解释器中运行一个python函数来将结构加载到内存中。然后,当我修改程序的其余部分时,我可以多次运行它(在中间不退出解释器)。这似乎是可行的,但我不确定我是否会遭受任何开销或其他问题。

2 回复 | 直到 14 年前

Ignacio Vazquez-Abrams 14 年前

你可以用 mmap 在多个进程中打开同一文件的视图,文件加载后访问速度几乎与内存速度相同。

mariana soffer 14 年前

首先,可以使用以下方法对孔对象的不同部分进行pickle:

# gen_objects.py

import random
import pickle

class BigBadObject(object):
   def __init__(self):
      self.a_dictionary={}
      for x in xrange(random.randint(1, 1000)):
         self.a_dictionary[random.randint(1,98675676)]=random.random()
      self.a_list=[]
      for x in xrange(random.randint(1000, 10000)):
         self.a_list.append(random.random())
      self.a_string=''.join([chr(random.randint(65, 90)) 
                        for x in xrange(random.randint(100, 10000))])

if __name__=="__main__":
   output=open('lotsa_objects.pickled', 'wb')
   for i in xrange(10000):
      pickle.dump(BigBadObject(), output, pickle.HIGHEST_PROTOCOL)
   output.close()

一旦在不同的部分中生成了BigFile,就可以用一个python程序读取它,同时运行多个python程序读取每个不同的部分。

# reader.py

from threading import Thread
from Queue import Queue, Empty
import cPickle as pickle
import time
import operator

from gen_objects import BigBadObject

class Reader(Thread):
   def __init__(self, filename, q):
      Thread.__init__(self, target=None)
      self._file=open(filename, 'rb')
      self._queue=q
   def run(self):
      while True:
         try:
            one_object=pickle.load(self._file)
         except EOFError:
            break
         self._queue.put(one_object)

class uncached(object):
   def __init__(self, filename, queue_size=100):
      self._my_queue=Queue(maxsize=queue_size)
      self._my_reader=Reader(filename, self._my_queue)
      self._my_reader.start()
   def __iter__(self):
      while True:
         if not self._my_reader.is_alive():
            break
         # Loop until we get something or the thread is done processing.
         try:
            print "Getting from the queue. Queue size=", self._my_queue.qsize()
            o=self._my_queue.get(True, timeout=0.1) # Block for 0.1 seconds 
            yield o
         except Empty:
            pass
      return

# Compute an average of all the numbers in a_lists, just for show.
list_avg=0.0
list_count=0

for x in uncached('lotsa_objects.pickled'):
   list_avg+=reduce(operator.add, x.a_list)
   list_count+=len(x.a_list)

print "Average: ", list_avg/list_count

这种读取pickle文件的方式将占用另一种方式所需时间的1%。这是因为您同时运行100个并行线程。