代码之家  ›  专栏  ›  技术社区  ›  Nikita Vlasenko

Snakemake规则只对一个文件运行

  •  0
  • Nikita Vlasenko  · 技术社区  · 6 年前

    snakemake 那就跑 HDBSCAN 集群。以前是正常的 DBSCAN 工作正常,但我修改后,问题就开始了(我也修改了 Snakemake HDBSCAN公司 HDBSCAN公司 毒蛇

    configfile: "config.yml"
    
    samples,=glob_wildcards('data_files/normalized/{sample}.hdf5')
    rule all:
        input:
            expand('results/tsne/{sample}_tsne.csv', sample=samples),
            expand('results/umap/{sample}_umap.csv', sample=samples),
            expand('results/umap/img/{sample}_umap.png', sample=samples),
            expand('results/tsne/img/{sample}_tsne.png', sample=samples),
            expand('results/clusters/umap/{sample}_umap_clusters.csv', sample=samples),
            expand('results/clusters/tsne/{sample}_tsne_clusters.csv', sample=samples),
            expand('results/neo4j/{sample}/{file}', sample=samples,
              file=['cells.csv', 'genes.csv', 'cl_contains.csv', 'cl_isin.csv', 'cl_nodes.csv', 'expr_by.csv', 'expr_ess.csv']),
            'results/neo4j/db_command'
    
    rule cluster:
        input:
            script = 'python/dbscan.py',
            umap   = 'results/umap/{sample}_umap.csv'
        output:
            umap = 'results/umap/img/{sample}_umap.png',
            clusters_umap = 'results/clusters/umap/{sample}_umap_clusters.csv'
        shell:
            "python {input.script} -umap_data {input.umap} -min_cluster_size {config[dbscan][min_cluster_size]} -img_umap {output.umap} -clusters_umap {output.clusters_umap}"
    

    dbscan.py 看起来像:

    import numpy as np
    import matplotlib.pyplot as plt
    plt.switch_backend('agg')
    from hdbscan import HDBSCAN
    import pathlib
    import os
    import nice_service as ns
    
    def run_dbscan(args):
        print('running HDBSCAN')
    
        path_to_img = args['-img_umap']
        path_to_clusters = args['-clusters_umap']
        path_to_data = args['-umap_data']
    
        # If folders in paths do not exist, create them
        for path_to_save in path_to_img:
            img_dir = os.path.dirname(path_to_save)
            pathlib.Path(img_dir).mkdir(parents=True, exist_ok=True) 
    
        for path_to_save in path_to_clusters:
            cluster_dir = os.path.dirname(path_to_save)
            pathlib.Path(cluster_dir).mkdir(parents=True, exist_ok=True) 
    
        #for idx, path_to_data in enumerate(data_arr):
        data = np.loadtxt(open(path_to_data, "rb"), delimiter=",")
        db = HDBSCAN(min_cluster_size=int(args['-min_cluster_size'])).fit(data)
    
        # 'TRUE' where the point was assigned to cluster, 'FALSE' where not assigned
        # aka 'noise'
        core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
        core_samples_mask[db.labels_ != -1] = True
        labels = db.labels_
    
        # Number of clusters in labels, ignoring noise if present.
        n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
        print('Estimated number of clusters: %d' % n_clusters_)
        unique_labels = set(labels)
        colors = [plt.cm.Spectral(each)
              for each in np.linspace(0, 1, len(unique_labels))]
        for k, col in zip(unique_labels, colors):
            if k == -1:
                # Black used for noise.
                col = [0, 0, 0, 1]
            class_member_mask = (labels == k)
            xy = data[class_member_mask & core_samples_mask]
            plt.plot(xy[:, 0], xy[:, 1], '.', color=tuple(col), markersize=1)
                #plt.legend()
    
        plt.title('Estimated number of clusters: %d' % n_clusters_)
        plt.savefig(path_to_img, dpi = 500)
        np.savetxt(path_to_clusters, labels.astype(int), fmt='%i', delimiter=",")
        print('Finished running HDBSCAN algorithm')
    
    if __name__ == '__main__':
        from sys import argv
        myargs = ns.getopts(argv)
        print(myargs)
        run_dbscan(myargs)
    

    的输入文件 rule cluster

    1 回复  |  直到 6 年前
        1
  •  0
  •   Nikita Vlasenko    6 年前

    问题是,在最后一条规则的脚本中,我忘记输出一个文件。它生成了6个文件,而不是7个。它误导了我,因为 snakemake 不是为一个规则运行所有文件,然后为下一个规则运行,而是只为所有规则运行一个文件,然后被卡住。

    推荐文章