scanpy.pp

API: Preprocessing

sc.pp.filter_cells()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.filter_cells.html#scanpy.pp.filter_cells

Filter cell based on feature number and/or UMI number

sc.pp.filter_cells(data=, min_genes=, max_genes=, min_counts=, max_counts=, inplace=)

    1. data=: the AnnData object used
    1. min_genes=: minimum number of expressed gene to pass the filter; default is "None"
    1. max_genes=: maximum number of expressed gene to pass the filter; de ault is "None"
    1. min_counts=: minimum number of UMI to pass the filter; default is "None"
    1. max_counts=: maximum number of UMI to pass the filter; default is "None"
    1. inplace=: replace the raw data with new data; default is "True"

sc.pp.filter_genes()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.filter_genes.html#scanpy.pp.filter_genes

Filter gene based on cell number and/or UMI number

sc.pp.filter_genes(data=, min_cells=, max_cells=, min_counts=, max_counts=, inplace=)

    1. data=: the AnnData object used
    1. min_cells=: minimum number of expressed cell to pass the filter; default is "None"
    1. max_cells=: maximum number of expressed cell to pass the filter; default is "None"
    1. min_counts=: minimum number of UMI to pass the filter; default is "None"
    1. max_counts=: maximum number of UMI to pass the filter; default is "None"
    1. inplace=: replace the raw data with new data; default is "True"

sc.pp.calculate_qc_metrics()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.calculate_qc_metrics.html#scanpy.pp.calculate_qc_metrics

caculate the quality-control indices for cells

sc.pp.calculate_qc_metrics(scanpy_object_used, qc_vars=, percent_top=None, log1p=False, inplace=True)

    1. qc_vars=: the column name used to caculate
    1. percent_top=: top ranked number gene to caculate the expression percentage
    1. log1p=: log1p to the expression data; default is True
    1. inplace=: inplace the previous Scanpy object with the new one; default is False

计算所得数据以新列的形式存放在 .obs

sc.pp.normalize_total()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.normalize_total.html#scanpy.pp.normalize_total

normalize the expresssion data (expression matrix)

sc.pp.normalize_total(adata, target_sum=, inplace=True)

    1. target_sum=: 数据 normalize 后的程度; target_sum=1e6 是计算 CPM; target_sum=None: 所有基因的表达值最后乘以所有细胞 UMI Count 的中位数; default=None
    1. inplace=: 是否用新数据替换原有数据; default=True

sc.pp.log1p()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.log1p.html

log1p transformation to the expression data

sc.pp.log1p(data=)

sc.pp.scale()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.scale.html#scanpy.pp.scale

scale data by Z-score

sc.pp.scale(adata, zero_center=, layer=, max_value=)

    1. zero_center=: adjust the mean value to 0
    1. layer=: matrix used to scale; default=None, the .X will be scaled
    1. max_value=: after scaling, values over this cutoff will be truncated

sc.pp.highly_variable_genes()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html

identity the most highly expressed genes based on the expression dispersion (normalized variance); normalized and logarithmized matrix will be used for caculation

sc.pp.highly_variable_genes(adata, layer=, flavor=, n_top_genes=, min_mean=, max_mean=, min_disp=, max_disp=)

    1. layer=: appoint the used expression matrix; default is None, will use the AnnData.X for caculation
    1. flavor=: the caculation method of expression dispersion; default='seurat'; other choices includes 'cell_ranger', 'seurat_v3' and 'seurat_v3_paper'
    1. n_top_genes=: the number of top highly variable genes; with assignment of this parameters, all the following parameters will be ignored
    1. min_mean=: the minimum cutoff of mean expression; default=0.0125
    1. max_mean=: the maximum cutoff of mean expression; default=3
    1. min_disp=: the minimum cutoff of mean expression; default=0.5
    1. max_disp=: the maximum cutoff of mean expression; default=inf

After caculation, the results will be stored in AnnData.var

Note: 对于 flavor = 'seurat'/'cellranger',需要使用 normalized 并且 log1p 后的数据

sc.pp.regress_out()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.regress_out.html#scanpy.pp.regress_out

Regress out unwanted sources of variation

sc.pp.regress_out(adata, layer=, keys=)

    1. layer=: expression matrix used; default=None, .X will be used
    1. keys=: features to be regressed out; ['total_counts', 'pct_counts_mt']

sc.pp.pca()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.pca.html#scanpy.pp.pca

PCA, linear dimensional reduction

sc.pp.pca(data=, svd_solver="arpack")

    1. n_comps=: number of PC to be caculated
    1. layer=: expression matrix used; default=None, .X will be used
    1. svd_solver=: SVD solver to be used

结果存放在 .obsm 中,使用 .obsm['X_pca'] 来提取

sc.pp.neighbors()

https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.neighbors.html#

compute the KNN

This KNN will be wildly used in tSNE/UMAP, clustering and pseudotime trajectory

sc.pp.neighbors(adata, n_pcs=, n_neighbors=, method=, use_rep=)

    1. n_pcs=: the number of top PCs to use; default=None
    1. n_neighbors=: default=15
    1. method=: default='umap'
    1. use_rep=: the filed of .obsm used to caculate KNN; default=None, .obsm.X_pca will be used; in the case of data sets integration, obsm.X_pca_harmony should be used by assigning 'X_pca_harmony'

生成的数据存放在 .uns.obsp

独立数据: sc.pp.neighbors(scanpy_object, n_neighbors=10, n_pcs=40)
整合数据: sc.pp.neighbors(scanpy_object, n_neighbors=10, n_pcs=40, use_rep='X_pca_harmony')

sc.pp.scrublet()

https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.scrublet.html

Predict doublets using Scrublet

这一步最好使用 raw expression matrix,因此最好在 归一化之前 进行

sc.pp.scrublet(adata, expected_doublet_rate=, batch_key=)

    1. expected_doublet_rate=: 预期 双细胞 的比例,5%~10%default=0.06
    1. batch_key=: 按照此参数给定的信息,分批次运行此函数

运行结束后,.obs 会增加名为 doublet_score 的一列,数值越大,越有可能是 doublet;同时 .obs 中,会给出 scrublet 对所有细胞是否是 doublet 的判断,True双细胞False单细胞

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容