Title: | An Accurate Simulator for Single-Cell RNA Sequencing Data |
---|---|
Description: | We provide a comprehensive scheme that is capable of simulating Single Cell RNA Sequencing data for various parameters of Biological Coefficient of Variation, busting kinetics, differential expression (DE), cell or sample groups, cell trajectory, batch effect and other experimental designs. 'SCRIP' proposed and compared two frameworks with Gamma-Poisson and Beta-Gamma-Poisson models for simulating Single Cell RNA Sequencing data. Other reference is available in Zappia et al. (2017) <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1305-0>. |
Authors: | Fei Qin [aut, cre, cph] |
Maintainer: | Fei Qin <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2024-11-26 05:09:11 UTC |
Source: | https://github.com/thecailab/scrip |
parameter files estimated from acinar.data using splatEstimate
acinar.data
acinar.data
parameters estimated using splatEstimate
Calculate a smoothed Brownian bridge between two points. A Brownian bridge is a random walk with fixed end points.
bridge(x = 0, y = 0, N = 5, n = 100, sigma.fac = 0.8)
bridge(x = 0, y = 0, N = 5, n = 100, sigma.fac = 0.8)
x |
starting value. |
y |
end value. |
N |
number of steps in random walk. |
n |
number of points in smoothed bridge. |
sigma.fac |
multiplier specifying how extreme each step can be. |
Vector of length n following a path from x to y.
Move selected items to the start of a list.
bringItemsForward(ll, items)
bringItemsForward(ll, items)
ll |
list to adjust item order. |
items |
vector of items to bring to the front. Any not in the list will be ignored. |
list with selected items first
Randomly generate multiplication factors from a log-normal distribution.
getLNormFactors(n.facs, sel.prob, neg.prob, fac.loc, fac.scale)
getLNormFactors(n.facs, sel.prob, neg.prob, fac.loc, fac.scale)
n.facs |
Number of factors to generate. |
sel.prob |
Probability that a factor will be selected to be different from 1. |
neg.prob |
Probability that a selected factor is less than one. |
fac.loc |
Location parameter for the log-normal distribution. |
fac.scale |
Scale factor for the log-normal distribution. |
Vector containing generated factors.
Identify the correct order to process paths so that preceding paths have already been simulated.
getPathOrder(path.from)
getPathOrder(path.from)
path.from |
vector giving the path endpoints that each path originates from. |
Vector giving the order to process paths in.
Implementation of the logistic function
logistic(x, x0, k)
logistic(x, x0, k)
x |
value to apply the function to. |
x0 |
midpoint parameter. Gives the centre of the function. |
k |
shape parameter. Gives the slope of the function. |
Value of logistic function with given parameters
A data frame with 1000 genes and 80 cells
params_acinar
params_acinar
A data frame with 1000 genes and 80 cells
Simulate a mean for each gene in each cell incorporating batch effect factors.
SCRIPsimBatchCellMeans(sim, params)
SCRIPsimBatchCellMeans(sim, params)
sim |
SingleCellExperiment to add batch means to. |
params |
SplatParams object with simulation parameters. |
SingleCellExperiment with simulated batch means.
Simulate batch effects. Batch effect factors for each batch are produced
using getLNormFactors
and these are added along with updated
means for each batch.
SCRIPsimBatchEffects(sim, params)
SCRIPsimBatchEffects(sim, params)
sim |
SingleCellExperiment to add batch effects to. |
params |
SplatParams object with simulation parameters. |
SingleCellExperiment with simulated batch effects.
Simulate means for each gene in each cell that are adjusted to follow a mean-variance trend using Biological Coefficient of Variation taken from and inverse gamma distribution.
SCRIPsimBCVMeans(data, sim, params)
SCRIPsimBCVMeans(data, sim, params)
data |
data are used to fit the mean-BCV trend for simulation |
sim |
SingleCellExperiment to add BCV means to. |
params |
SplatParams object with simulation parameters. |
SingleCellExperiment with simulated BCV means.
A logistic function is used to form a relationship between the expression level of a gene and the probability of dropout, giving a probability for each gene in each cell. These probabilities are used in a Bernoulli distribution to decide which counts should be dropped.
SCRIPsimDropout(sim, params)
SCRIPsimDropout(sim, params)
sim |
SingleCellExperiment to add dropout to. |
params |
SplatParams object with simulation parameters. |
SingleCellExperiment with simulated dropout and observed counts.
Simulate gene means from a gamma distribution. Also simulates outlier expression factors. Genes with an outlier factor not equal to 1 are replaced with the median mean expression multiplied by the outlier factor.
SCRIPsimGeneMeans(data, sim, params)
SCRIPsimGeneMeans(data, sim, params)
data |
raw dataset. |
sim |
SingleCellExperiment to add gene means to. |
params |
SplatParams object with simulation parameters. |
SingleCellExperiment with simulated gene means.
Simulate group cell means
SCRIPsimGroupCellMeans(sim, params)
SCRIPsimGroupCellMeans(sim, params)
sim |
SingleCellExperiment to add cell means to. |
params |
SplatParams object with simulation parameters. |
SingleCellExperiment with added cell means.
Simulate differential expression. Differential expression factors for each
group are produced using getLNormFactors
and these are added
along with updated means for each group. For paths care is taken to make sure
they are simulated in the correct order.
SCRIPsimGroupDE(sim, params)
SCRIPsimGroupDE(sim, params)
sim |
SingleCellExperiment to add differential expression to. |
params |
splatParams object with simulation parameters. |
SingleCellExperiment with simulated differential expression.
Simulate expected library sizes. Typically a log-normal distribution is used but there is also the option to use a normal distribution. In this case any negative values are set to half the minimum non-zero value.
SCRIPsimLibSizes(sim, params, libsize)
SCRIPsimLibSizes(sim, params, libsize)
sim |
SingleCellExperiment to add library size to. |
params |
SplatParams object with simulation parameters. |
libsize |
Provide the library size directly instread of using parameters to estimate |
SingleCellExperiment with simulated library sizes.
simulate cell means for path
SCRIPsimPathCellMeans(sim, params)
SCRIPsimPathCellMeans(sim, params)
sim |
SingleCellExperiment to add dropout to. |
params |
SplatParams object with simulation parameters. |
SingleCellExperiment with cell means for path simulation.
simulate DE factors for path
SCRIPsimPathDE(sim, params)
SCRIPsimPathDE(sim, params)
sim |
SingleCellExperiment to add dropout to. |
params |
SplatParams object with simulation parameters. |
SingleCellExperiment with DE for path simulation.
Simulate a gene by cell matrix giving the mean expression for each gene in each cell. Cells start with the mean expression for the group they belong to (when simulating groups) or cells are assigned the mean expression from a random position on the appropriate path (when simulating paths). The selected means are adjusted for each cell's expected library size.
SCRIPsimSingleCellMeans(sim, params)
SCRIPsimSingleCellMeans(sim, params)
sim |
SingleCellExperiment to add cell means to. |
params |
SplatParams object with simulation parameters. |
SingleCellExperiment with added cell means.
Simulate a true counts matrix. Counts are simulated from a poisson distribution where Each gene in each cell has it's own mean based on the group (or path position), expected library size and BCV.
SCRIPsimTrueCounts(sim, params)
SCRIPsimTrueCounts(sim, params)
sim |
SingleCellExperiment to add true counts to. |
params |
SplatParams object with simulation parameters. |
SingleCellExperiment with simulated true counts.
Simulate count data for single cell RNA-sequencing using SCIRP method
SCRIPsimu( data, params, method = "single", base_allcellmeans_SC = NULL, pre.bcv.df = NULL, libsize = NULL, bcv.shrink = 1, Dropout_rate = NULL, mode = "GP-trendedBCV", de.prob = NULL, de.downProb = NULL, de.facLoc = NULL, de.facScale = NULL, path.skew = NULL, batch.facLoc = NULL, batch.facScale = NULL, path.nSteps = NULL, ... )
SCRIPsimu( data, params, method = "single", base_allcellmeans_SC = NULL, pre.bcv.df = NULL, libsize = NULL, bcv.shrink = 1, Dropout_rate = NULL, mode = "GP-trendedBCV", de.prob = NULL, de.downProb = NULL, de.facLoc = NULL, de.facScale = NULL, path.skew = NULL, batch.facLoc = NULL, batch.facScale = NULL, path.nSteps = NULL, ... )
data |
data matrix required to fit the mean-BCV trend for simulation |
params |
SplatParams object containing parameters for the simulation |
method |
"single", "groups" or "paths" |
base_allcellmeans_SC |
base mean vector provided to help setting DE analysis |
pre.bcv.df |
BCV.df enables us to change the variation of BCV values |
libsize |
library size can be provided directly |
bcv.shrink |
factor to control the BCV levels |
Dropout_rate |
factor to control the dropout rate directly |
mode |
"GP-commonBCV", "BP-commonBCV", "BP", "BGP-commonBCV" and "BGP-trendedBCV" |
de.prob |
the proportion of DE genes |
de.downProb |
the proportion of down-regulated DE genes |
de.facLoc |
DE location factor |
de.facScale |
DE scale factor |
path.skew |
Controls how likely cells are from the start or end point of the path |
batch.facLoc |
DE location factor in batch |
batch.facScale |
DE scale factor in batch |
path.nSteps |
number of steps between the start point and end point for each path |
... |
Other parameters |
SingleCellExperiment file
data(params_acinar) data(acinar.data) sim_trend = SCRIPsimu(data=acinar.data, params=params_acinar, mode="GP-trendedBCV")
data(params_acinar) data(acinar.data) sim_trend = SCRIPsimu(data=acinar.data, params=params_acinar, mode="GP-trendedBCV")
Simulate count data for clustering analysis by preserving variably expressed genes with multiple cell types
simu_cluster(expre_data, pheno_data, CTlist, mode, nfeatures, seed = 2021)
simu_cluster(expre_data, pheno_data, CTlist, mode, nfeatures, seed = 2021)
expre_data |
data matrix required for simulation |
pheno_data |
phenotype data information |
CTlist |
cell types used for simulation |
mode |
"GP-commonBCV", "BP-commonBCV", "BP", "BGP-commonBCV" and "BGP-trendedBCV" |
nfeatures |
parameter required for FinalVariable function in Seurat package |
seed |
seed used for simulation |
simulated read counts data with cell type information
Simulate count data for differential expression analysis using SCRIP
simu_DE( expre_data, params, nGenes = NULL, nDE, ncells = NULL, FC, Dropout_rate = NULL, libsize = NULL, pre.bcv.df = NULL, bcv.shrink = 1, seed = 2021 )
simu_DE( expre_data, params, nGenes = NULL, nDE, ncells = NULL, FC, Dropout_rate = NULL, libsize = NULL, pre.bcv.df = NULL, bcv.shrink = 1, seed = 2021 )
expre_data |
data matrix required for simulation |
params |
SplatParams object containing parameters for the simulation |
nGenes |
number of genes simulated |
nDE |
number of differentially expressed genes simulated |
ncells |
number of cells simulated |
FC |
fold change rate simulated between two groups |
Dropout_rate |
factor to control the dropout rate directly |
libsize |
library size used for simulation |
pre.bcv.df |
BCV.df enables us to change the variation of BCV values |
bcv.shrink |
factor to control the BCV levels |
seed |
seed for simulation |
SummarizedExperiment files from both groups for DE analysis and DE genes index
Simulate count data for clustering analysis by preserving variably expressed genes
simu.VEGs( counts.matrix, params = params, base_allcellmeans, mode = "GP-trendedBCV", nCells, nfeatures = 1000 )
simu.VEGs( counts.matrix, params = params, base_allcellmeans, mode = "GP-trendedBCV", nCells, nfeatures = 1000 )
counts.matrix |
data matrix required for simulation |
params |
SplatParams object containing parameters for the simulation |
base_allcellmeans |
base cell means specified directly for simulating counts |
mode |
"GP-commonBCV", "BP-commonBCV", "BP", "BGP-commonBCV" and "BGP-trendedBCV" |
nCells |
number of cells simulated |
nfeatures |
parameter required for FinalVariable function in Seurat package |
simulated read counts data