代码之家 › 专栏 › 技术社区 › Robert Lacok

Python Bigtable客户端在deepcopy上花费了很长时间

google-cloud-bigtable bigtable google-kubernetes-engine google-cloud-platform

Robert Lacok · 技术社区 · 6 年前

我有一个Python Kafka消费者有时从Bigtable读得非常慢。它从Bigtable中读取一行,执行一些计算,偶尔回写一些信息,然后继续。

问题是,GCE中的1 vCPU虚拟机读/写速度极快,用户每秒咀嚼100-150条消息。没有问题。

但是,当部署在生产Kubernetes集群(GKE)上时,它是多区域的( europe-west1-b/c/d ),它的速度大约为0.5条消息/秒。 是-每条消息2秒。

Bigtable在 europe-west1-d -但是在同一区域的节点上安排的吊舱( d ),与其他区域节点上的pod具有相同的性能,这很奇怪。

pod不断地达到CPU限制(1个vCPU)。分析程序显示,大部分时间(95%)都花在 PartialRowData.cells() 函数,在 copy.py:132(deepcopy)

它使用最新的 google-cloud-bigtable==0.29.0 包裹。

现在,我知道这个包是alpha格式的,但是是什么因素使性能显著降低了300倍呢?

读取行数据的代码如下:

def _row_to_dict(cls, row):
    if row is None:
        return {}

    item_dict = {}

    if COLUMN_FAMILY in row.cells:
        structured_cells = {}
        for field_name in STRUCTURED_STATIC_FIELDS:
            if field_name.encode() in row.cells[COLUMN_FAMILY]:
                structured_cells[field_name] = row.cells[COLUMN_FAMILY][field_name.encode()][
                    0].value.decode()
        item_dict[COLUMN_FAMILY] = structured_cells

    return item_dict

其中 row 传入的是来自

row = self.bt_table.read_row(row_key, filter_=filter_)

可能有50个 STRUCTURED_STATIC_FIELDS .

深度复制真的只是花了很长时间来复制吗?还是在等待Bigtable的数据传输?我是不是误用了图书馆? 关于如何提高性能有什么建议吗?

提前多谢了。

1 回复 | 直到 6 年前

Robert Lacok 6 年前

原来库定义了 row.cells 作为:

@property
def cells(self):
    """Property returning all the cells accumulated on this partial row.

    :rtype: dict
    :returns: Dictionary of the :class:`Cell` objects accumulated. This
              dictionary has two-levels of keys (first for column families
              and second for column names/qualifiers within a family). For
              a given column, a list of :class:`Cell` objects is stored.
    """
    return copy.deepcopy(self._cells)

所以每次查字典的时候 deepcopy 除了查找。

添加

row_cells = row.cells

后来只提到这一点就解决了这个问题。

dev/prod环境的性能差异还在于prod表已经有了更多的时间戳/单元版本,而dev表只有两个。这使得那些必须被深度复制的词典变得更大了。

链接现有的过滤器 CellsColumnLimitFilter 更进一步的帮助:

filter_ = RowFilterChain(filters=[filter_, CellsColumnLimitFilter(num_cells=1)])

推荐文章

Saravana Kumar · 是否将值插入结构列?

1 年前

Supplementing · 使用nuxt在谷歌应用程序引擎上部署失败

1 年前

mehere · 谷歌搜索控制台-批量导出-缺少权限

1 年前

Chris A · 生成中未拾取环境变量

2 年前

zinger44 · 如何将数据集从Huggingface移动到谷歌云?

2 年前

Mazen Ezzeddine · 使用一点上下文搜索谷歌云日志(匹配关键字的日志和匹配日志前后的少量日志。)

2 年前

nav112 · 在GCP中将桶数据从区域移动到多区域

2 年前

so beautiful memory · 如何将我在本地训练的tensorflow模型正确部署到谷歌云人工智能平台?我部署了它,但没有图像返回

2 年前

Theo75 · R库将消息发布到Google Cloud Pub/Sub主题

2 年前

Itamar Cohen · 谷歌管理的SSL证书不起作用

2 年前