代码之家 › 专栏 › 技术社区 › Björn Pollex

在哪里用JDBC创建一个准备好的语句?

prepared-statement transactions jdbc sqlite java

Björn Pollex · 技术社区 · 14 年前

考虑使用以下方法从某些数据结构中读取数据( InteractionNetwork )并使用 SQLite-JDBC dirver :

private void loadAnnotations(InteractionNetwork network) throws SQLException {
    PreparedStatement insertAnnotationsQuery = 
        connection.prepareStatement(
        "INSERT INTO Annotations(GOId, ProteinId, OnthologyId) VALUES(?, ?, ?)");
    PreparedStatement getProteinIdQuery = 
        connection.prepareStatement(
        "SELECT Id FROM Proteins WHERE PrimaryUniProtKBAccessionNumber = ?");
    connection.setAutoCommit(false);
    for(common.Protein protein : network.get_protein_vector()) {
        /* Get ProteinId for the current protein from another table and
           insert the value into the prepared statement. */
        getProteinIdQuery.setString(1, protein.get_primary_id());
        ResultSet result = getProteinIdQuery.executeQuery();
        result.next();
        insertAnnotationsQuery.setLong(2, result.getLong(1));
        /* Extract all the other data and add all the tuples to the batch. */
    }
    insertAnnotationsQuery.executeBatch();
    connection.commit();
    connection.setAutoCommit(true);
}

此代码工作正常,程序运行大约30秒,平均占用80米堆空间。因为代码看起来很难看,所以我想重构它。我做的第一件事就是把 getProteinIdQuery 进入循环:

private void loadAnnotations(InteractionNetwork network) throws SQLException {
    PreparedStatement insertAnnotationsQuery = 
        connection.prepareStatement(
        "INSERT INTO Annotations(GOId, ProteinId, OnthologyId) VALUES(?, ?, ?)");
    connection.setAutoCommit(false);
    for(common.Protein protein : network.get_protein_vector()) {
        /* Get ProteinId for the current protein from another table and
           insert the value into the prepared statement. */
        PreparedStatement getProteinIdQuery = // <--- moved declaration of statement here
            connection.prepareStatement(
            "SELECT Id FROM Proteins WHERE PrimaryUniProtKBAccessionNumber = ?");
        getProteinIdQuery.setString(1, protein.get_primary_id());
        ResultSet result = getProteinIdQuery.executeQuery();
        result.next();
        insertAnnotationsQuery.setLong(2, result.getLong(1));
        /* Extract all the other data and add all the tuples to the batch. */
    }
    insertAnnotationsQuery.executeBatch();
    connection.commit();
    connection.setAutoCommit(true);
}

当我现在运行代码时,会发生的事情是它需要大约130m的堆空间,并且需要永恒的运行时间。有人能解释这种奇怪的行为吗?

2 回复 | 直到 14 年前

Donal Fellows 14 年前

正如你所发现的,准备一份声明需要时间。无论代码是否丑陋,速度的下降也是非常丑陋的,所以您需要使用更快的形式。

但是您可以使用一个内部类来保存细节并提供一个更好的接口:

private class DatabaseInterface {
    private PreparedStatement insertAnnotation, getProteinId;
    public DatabaseInterface() {
        // This is an inner class; 'connection' is variable in outer class
        insertAnnotation = connection.prepareStatement(
            "INSERT INTO Annotations(GOId, ProteinId, OnthologyId) VALUES(?, ?, ?)");
        getProteinId = connection.prepareStatement(
            "SELECT Id FROM Proteins WHERE PrimaryUniProtKBAccessionNumber = ?");
    }
    public long getId(Protein protein) { // Exceptions omitted...
        getProteinId.setString(1, protein.get_primary_id());
        ResultSet result = getProteinId.executeQuery();
        try {
            result.next();
            return result.getLong(1);
        } finally {
            result.close();
        }
    }
    public void insertAnnotation(int GOId, long proteinId, String ontologyId) {
        insertAnnotation.setInt(1, GOId);          // type may be wrong
        insertAnnotation.setLong(2, proteinId);
        insertAnnotation.setString(3, ontologyId); // type may be wrong
        insertAnnotation.executeUpdate();
    }
}
private void loadAnnotations(InteractionNetwork network) throws SQLException {
    connection.setAutoCommit(false);
    DatabaseInterface dbi = new DatabaseInterface();
    for(common.Protein protein : network.get_protein_vector()) {
        dbi.insertAnnotation(..., dbi.getId(protein), ...);
    }
    connection.commit();
    connection.setAutoCommit(true);
}

其目的是,您有一段代码知道如何将事情分为SQL(如果您转到不同的数据库,这段代码很容易适应),还有一段代码知道如何将这些事情协调在一起。

Jan 14 年前

我想,如果第一个片段看起来很丑,那是一个品味问题;-)…

但是,第二个代码片段需要更长时间(imho)的原因是,现在对于for循环的每次迭代,都会创建一个新的PreparedStatement实例(getProteinIdQuery),而在第一个片段中,您会重新使用准备好的语句,并按照其预期的方式使用它:实例化,然后提供适当的值。

至少,这是我的观点… 简