10  Practical 17: Exploring results (graphs)

10.1 The number of cells in each cell-type biases the null distributions and statistics of the Z-scores method

This study compares Z-scores, derived from the Pearson R value of cis-links between ATACseq peaks and nearby genes, to their matched trans-link null distribution. After analyzing PBMC multiomic data, significant differences were found between the Pearson R coefficients and Z-scores for many peak-gene links. An example is an ATACseq peak near the NOD2 gene, correlated with monocytes (R = 0.12), but showing no significant link using Z-scores (P = 0.07). However, a significant link was found with SNX20 (P = 0.02). Bimodal null distributions for trans-peaks at the NOD2 locus make Z-scores inaccurate. Excluding specific cell-type peaks leads to a unimodal distribution, highlighting the impact of peak selection on Z-scores.

Figure 10.1: The Z-scores method misses candidate regulatory sequences linked to NOD2 expression in peripheral blood mononuclear cells (PBMC). (A) The Z-scores model matches an ATACseq peak for GC content and coverage with ATACseq peaks in trans to create a scaled null distribution, producing Z-scores for each trans-links and the tested peak. (B) ATACseq tracks at the NOD2 locus identified in PBMC. The grey areas (labeled 1–2-3) highlight the top three ATACseq peaks correlations with NOD2 expression using the simple Pearson R method. Peak #1 (chr16-50,684,843–50,685,984) includes an eQTL for NOD2 that is also associated with leprosy and Crohn’s disease by GWAS. The loops highlighted in the “Links” row are identified using the Z-score method (P-value < 0.05); we note that there is no significant link between peak chr16-50,684,843–50,685,984 (peak #1) and NOD2. Loops are drawn from the middle of the ATACseq peaks to the transcription start site of the correlated gene(s). In the right column, we showed (top to bottom) RNAseq UMAP of cell-type annotations, SNX20 expression density, NOD2 expression density, and chr16-50,684,843–50,685,984 ATACseq accessibility density. The violin plots represent NOD2 expression levels in each cell-type. Leblanc & Lettre, et al. 2023. Scientific reports.

10.2 Single cell lineage determination

Figure 10.2: Lineage determination of single cells from skin, kidney, and peripheral blood mononuclear cells (PBMCs). (A) Clustering of cells (n = 899) by t-distributed stochastic neighbor embedding (t-SNE). Cells are colored based on tissue of origin from skin (blue), kidney cells (red), and PBMCs (yellow). (B) Six distinct clusters generated by t-SNE plotting. (C) Differentially expressed genes across 6 cell clusters. In this heat map, rows correspond to individual genes found to be selectively upregulated in individual clusters (P < 0.01). (D) Violin plots demonstrating expression of lineage markers that indicate the identity of the clusters generated by t-SNE plotting. Der, et al. 2017. JCI insight.

10.3 Single-cell chromatin accessibility reveals principles of regulatory variation

Figure 10.3: Structured cis-variability across single epigenomes. a, Per-cell deviations of expected fragments across a region within chromosome 1, For display, only large deviation cells are shown (n = 186 cells). b, Pearson correlation coefficient representing chromosome compartment signal of interaction frequency from a chromatin conformation capture assay (left, analysis carried out of data from ref. 27) or doubly correlated normalized deviations of scATAC-seq (right) from chromosome 1. Data in white represents masked regions due to highly repetitive regions. c, Permuted cis-correlation map for chromosome 1 (analysed identically to b). d, Box highlights a representative region depicting long-range covariability. Buenrostro, et al. 2015. Nature

11 CITE-seq and scATAC-seq

To learn more about how the antibody barcode matrix is computationally generated from the sequencing data, please visit CITE-seq-Count. To learn more about CITE-seq and feature barcoding, please visit the CITE-seq site.

11.1 References