代码之家 › 专栏 › 技术社区 › Roman

我可以通过将一个大表拆分为多个小表来优化我的数据库吗?

split optimization mysql

Roman · 技术社区 · 14 年前

假设我有一个包含三列的大表:“user_name”、“user_property”、“value_of_property”。Lat还假设我有很多用户(比如说100000)和很多属性(比如10000)。然后这个表将是巨大的(10亿行)。

当我从表中提取信息时,我总是需要有关特定用户的信息。所以,我举个例子 where user_name='Albert Gates'

将大表拆分成多个小表,对应于固定用户,这不明智吗?

4 回复 | 直到 7 年前

Mark Byers 14 年前

不,我觉得那不是个好主意。更好的方法是 add an index 上 user_name (user_name, user_property) 找一处房产。然后,数据库不需要扫描所有行—只需要在索引中找到适当的条目,该索引存储在 B-Tree ,使得在很短的时间内就可以很容易地找到一个记录。

partition 你最大的桌子。

user_property .

Jon Black 14 年前

你应该 正常化 你的设计如下:

drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varbinary(32) unique not null
)
engine=innodb;

drop table if exists properties;
create table properties
(
property_id smallint unsigned not null auto_increment primary key,
name varchar(255) unique not null
)
engine=innodb;

drop table if exists user_property_values;
create table user_property_values
(
user_id int unsigned not null,
property_id smallint unsigned not null,
value varchar(255) not null,
primary key (user_id, property_id),
key (property_id)
)
engine=innodb;

insert into users (username) values ('f00'),('bar'),('alpha'),('beta');

insert into properties (name) values ('age'),('gender');

insert into user_property_values values 
(1,1,'30'),(1,2,'Male'),
(2,1,'24'),(2,2,'Female'),
(3,1,'18'),
(4,1,'26'),(4,2,'Male');

select count(*) from product
count(*)
========
1,000,000 (1M)

select count(*) from category
count(*)
========
250,000 (500K)

select count(*) from product_category
count(*)
========
125,431,192 (125M)

select
 c.*,
 p.*
from
 product_category pc
inner join category c on pc.cat_id = c.cat_id
inner join product p on pc.prod_id = p.prod_id
where
 pc.cat_id = 1001;
0:00:00.030: Query OK (0.03 secs)

dkinzer 14 年前

John Nicholas 14 年前

为什么需要这个表结构。我的基本问题是,每次你想使用它的时候,你必须把数据转换成财产的价值。这在我看来是不好的-也存储数字作为文本是疯狂的,因为它的所有二进制无论如何。例如,您将如何获得必需字段?或者需要基于其他字段进行约束的字段?开始和结束日期?

为什么不简单地将属性作为字段而不是一些多对多关系呢?

我经常看到数据库性能被完全阉割的一种方式是

Id,属性类型,属性名称,属性值表。

如果数据与实体的关系是1:1,那么它应该是同一个表上的字段。如果您的表宽度超过30个字段,请考虑将它们移到另一个表中 但不要称之为正常化,因为它不是 . 这是一种帮助开发人员以性能为代价将字段分组在一起的技术,以帮助理解。

我不知道mysql是否有类似的功能,但是sqlserver 2008有稀疏的列-空值不占用空间。 SParse column datatypes

我并不是说EAV方法总是错误的,但是我认为使用关系数据库进行这种方法可能不是最好的选择。