代码之家  ›  专栏  ›  技术社区  ›  Lev

在Mysql主表中插入非常大的varchar字段

  •  0
  • Lev  · 技术社区  · 1 年前

    我有我的主源表:

    CREATE TABLE masterSources (
    pk int NOT NULL AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(500)
    );
    
    INSERT INTO masterSources (name) VALUES 
    ('derived:com.google.active_minutes:com.google.ios.fit:appleinc.:iphone:1213084:top_level'),
    ('derived:com.google.active_minutes:com.google.ios.fit:appleinc.:watch:f8df280c:top_level'),
    ('derived:com.google.step_count.delta:com.google.android.gms:estimated_steps'),
    ('derived:com.google.activity.segment:com.google.ios.fit:appleinc.:iphone:1213084:top_level'),
    ('derived:com.google.activity.segment:com.google.ios.fit:appleinc.:watch:f8df280c:top_level'),
    ('derived:com.google.activity.segment:com.google.ios.fit:appleinc.:watch:f8df280c:session_activity_segment'),
    ('derived:com.google.calories.expended:com.google.ios.fit:appleinc.:iphone:1213084:top_level'),
    ('derived:com.google.calories.expended:com.google.ios.fit:appleinc.:watch:f8df280c:top_level'),
    ('derived:com.google.calories.expended:com.google.android.gms:from_activities');
    

    然后,我收到了成千上万的来源,我可以将它们批量添加到这个表中:

    CREATE TABLE receivedSources (
    name VARCHAR(500)
    );
    
    INSERT INTO receivedSources (name) VALUES 
    ('derived:com.google.active_minutes:com.google.ios.fit:appleinc.:iphone:1213084:top_level'),
    ('derived:com.google.active_minutes:com.google.ios.fit:appleinc.:watch:f8df280c:top_level'),
    ('derived:com.google.active_minutes:com.google.ios.fit:appleinc.:watch:f8df280c:top_level'), #repeated
    ('derived:com.google.active_minutes:com.google.ios.fit:appleinc.:watch:gew8df280c:top_level'), #new
    ('derived:com.google.active_minutes:com.google.ios.fit:appleinc.:watch:ged8df280c:low_level'), #new
    ('derived:com.google.step_count.delta:com.google.android.gms:estimated_steps'),
    ('derived:com.google.step_count.delta:com.google.android.gms:estimated_steps'), #repeated
    ('derived:com.google.activity.segment:com.google.ios.fit:appleinc.:iphone:1213084:top_level'),
    ('derived:com.google.activity.segment:com.google.ios.fit:appleinc.:watch:f8df280c:top_level'),
    ('derived:com.google.activity.segment:com.google.ios.fit:appleinc.:watch:f8df280c:session_activity_segment'),
    ('derived:com.google.calories.expended:com.google.ios.fit:appleinc.:iphone:1213084:top_level'),
    ('derived:com.google.calories.expended:com.google.ios.fit:appleinc.:iphone:1213084:top_level'), #repeated
    ('derived:com.google.calories.expended:com.google.ios.fit:appleinc.:watch:f8df280c:top_level'),
    ('derived:com.google.calories.expended:com.google.android.gms:fromx2_activities'), #new
    ('derived:com.google.calories.expended:com.google.android.gms:from_activities');
    

    我想要的是将“新”来源添加到我的 masterSources 表,最后生成一个新表,将每个源关联起来 receivedSources 其对应的pk。

    我可以通过一个程序来实现:

    CREATE PROCEDURE my_procedure()
    BEGIN
        DECLARE newName VARCHAR(500);
        DECLARE done BOOLEAN DEFAULT FALSE;                     
    
    
        # receivedSources MINUS masterSources 
        DECLARE cur CURSOR FOR  SELECT n.name 
                                FROM receivedSources n LEFT JOIN masterSources m ON (n.name=m.name)
                                WHERE m.pk IS NULL;
        DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = TRUE;
        
        OPEN cur;
    
        loop1: LOOP
        FETCH cur INTO newName;
       
            IF done THEN 
                LEAVE loop1;            
            END IF; 
        
            # Inserting new Sources in master
            INSERT INTO masterSources (name) VALUES (newName);     
    
        END LOOP loop1;
        CLOSE cur;
       
        # Getting my desired table
        CREATE TABLE newReceivedSources AS
        SELECT m.pk,m.name FROM receivedSources n INNER JOIN masterSources m ON (n.name=m.name);
    
    END
    
    call my_procedure()
    
    select * from newReceivedSources
    

    enter image description here

    我的问题是:考虑到加入的成本有多高,是否有更快/更聪明的方法 varchar 领域?

    0 回复  |  直到 1 年前
        1
  •  1
  •   Rob Eyre    1 年前

    首先在上添加一个唯一密钥 masterSources

    ALTER TABLE masterSources ADD UNIQUE KEY (name)
    

    (可选)要提高性能,还可以在上添加(非唯一)键 receivedSources

    ALTER TABLE receivedSources ADD KEY (name)
    

    然后将任何新的唯一值插入到masterSources中

    INSERT IGNORE INTO masterSources (name) SELECT name FROM receivedSources
    

    然后检索所需的数据集

    SELECT masterSources.pk, masterSources.name
    FROM receivedSources
    INNER JOIN masterSources ON receivedSources.name = masterSources.name