scanpy.pp

API: Preprocessing

sc.pp.filter_cells()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.filter_cells.html#scanpy.pp.filter_cells

Filter cell based on feature number and/or UMI number

sc.pp.filter_cells(data=, min_genes=, max_genes=, min_counts=, max_counts=, inplace=)

data=: the AnnData object used

min_genes=: minimum number of expressed gene to pass the filter; default is "None"

max_genes=: maximum number of expressed gene to pass the filter; de ault is "None"

min_counts=: minimum number of UMI to pass the filter; default is "None"

max_counts=: maximum number of UMI to pass the filter; default is "None"

inplace=: replace the raw data with new data; default is "True"

sc.pp.filter_genes()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.filter_genes.html#scanpy.pp.filter_genes

Filter gene based on cell number and/or UMI number

sc.pp.filter_genes(data=, min_cells=, max_cells=, min_counts=, max_counts=, inplace=)

data=: the AnnData object used

min_cells=: minimum number of expressed cell to pass the filter; default is "None"

max_cells=: maximum number of expressed cell to pass the filter; default is "None"

min_counts=: minimum number of UMI to pass the filter; default is "None"

max_counts=: maximum number of UMI to pass the filter; default is "None"

inplace=: replace the raw data with new data; default is "True"

sc.pp.calculate_qc_metrics()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.calculate_qc_metrics.html#scanpy.pp.calculate_qc_metrics

caculate the quality-control indices for cells

sc.pp.calculate_qc_metrics(scanpy_object_used, qc_vars=, percent_top=None, log1p=False, inplace=True)

qc_vars=: the column name used to caculate

percent_top=: top ranked number gene to caculate the expression percentage

log1p=: log1p to the expression data; default is True

inplace=: inplace the previous Scanpy object with the new one; default is False

计算所得数据以新列的形式存放在 .obs 中

sc.pp.normalize_total()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.normalize_total.html#scanpy.pp.normalize_total

normalize the expresssion data (expression matrix)

sc.pp.normalize_total(adata, target_sum=, inplace=True)

target_sum=: 数据 normalize 后的程度; target_sum=1e6 是计算 CPM; target_sum=None: 所有基因的表达值最后乘以所有细胞 UMI Count 的中位数; default=None

inplace=: 是否用新数据替换原有数据; default=True

sc.pp.log1p()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.log1p.html

log1p transformation to the expression data

sc.pp.log1p(data=)

sc.pp.scale()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.scale.html#scanpy.pp.scale

scale data by Z-score

sc.pp.scale(adata, zero_center=, layer=, max_value=)

zero_center=: adjust the mean value to 0

layer=: matrix used to scale; default=None, the .X will be scaled

max_value=: after scaling, values over this cutoff will be truncated

sc.pp.highly_variable_genes()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html

identity the most highly expressed genes based on the expression dispersion (normalized variance); normalized and logarithmized matrix will be used for caculation

sc.pp.highly_variable_genes(adata, layer=, flavor=, n_top_genes=, min_mean=, max_mean=, min_disp=, max_disp=)

layer=: appoint the used expression matrix; default is None, will use the AnnData.X for caculation

flavor=: the caculation method of expression dispersion; default='seurat'; other choices includes 'cell_ranger', 'seurat_v3' and 'seurat_v3_paper'

n_top_genes=: the number of top highly variable genes; with assignment of this parameters, all the following parameters will be ignored

min_mean=: the minimum cutoff of mean expression; default=0.0125

max_mean=: the maximum cutoff of mean expression; default=3

min_disp=: the minimum cutoff of mean expression; default=0.5

max_disp=: the maximum cutoff of mean expression; default=inf

After caculation, the results will be stored in AnnData.var

Note: 对于 flavor = 'seurat'/'cellranger'，需要使用 normalized 并且 log1p 后的数据

sc.pp.regress_out()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.regress_out.html#scanpy.pp.regress_out

Regress out unwanted sources of variation

sc.pp.regress_out(adata, layer=, keys=)

layer=: expression matrix used; default=None, .X will be used

keys=: features to be regressed out; ['total_counts', 'pct_counts_mt']

sc.pp.pca()

https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.pca.html#scanpy.pp.pca

PCA, linear dimensional reduction

sc.pp.pca(data=, svd_solver="arpack")

n_comps=: number of PC to be caculated

layer=: expression matrix used; default=None, .X will be used

svd_solver=: SVD solver to be used

结果存放在 .obsm 中，使用 .obsm['X_pca'] 来提取

sc.pp.neighbors()

https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.neighbors.html#

compute the KNN

This KNN will be wildly used in tSNE/UMAP, clustering and pseudotime trajectory

sc.pp.neighbors(adata, n_pcs=, n_neighbors=, method=, use_rep=)

n_pcs=: the number of top PCs to use; default=None

n_neighbors=: default=15

method=: default='umap'

use_rep=: the filed of .obsm used to caculate KNN; default=None, .obsm.X_pca will be used; in the case of data sets integration, obsm.X_pca_harmony should be used by assigning 'X_pca_harmony'

生成的数据存放在 .uns 和 .obsp 中

独立数据： sc.pp.neighbors(scanpy_object, n_neighbors=10, n_pcs=40)
整合数据： sc.pp.neighbors(scanpy_object, n_neighbors=10, n_pcs=40, use_rep='X_pca_harmony')

sc.pp.scrublet()

https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.scrublet.html

Predict doublets using Scrublet

这一步最好使用 raw expression matrix，因此最好在 归一化之前 进行

sc.pp.scrublet(adata, expected_doublet_rate=, batch_key=)

expected_doublet_rate=: 预期 双细胞 的比例，5%～10%；default=0.06

batch_key=: 按照此参数给定的信息，分批次运行此函数

运行结束后，.obs 会增加名为 doublet_score 的一列，数值越大，越有可能是 doublet；同时 .obs 中，会给出 scrublet 对所有细胞是否是 doublet 的判断，True 为 双细胞，False 为 单细胞

scanpy.pp

scanpy.pp

API: Preprocessing

sc.pp.filter_cells()

sc.pp.filter_genes()

sc.pp.calculate_qc_metrics()

sc.pp.normalize_total()

sc.pp.log1p()

sc.pp.scale()

sc.pp.highly_variable_genes()

sc.pp.regress_out()

sc.pp.pca()

sc.pp.neighbors()

sc.pp.scrublet()

相关阅读更多精彩内容

友情链接更多精彩内容