Performs the Wilcoxon rank sum test to identify differentially expressed genes between two groups of cells.
calculate_markers(
expression_matrix,
cells1,
cells2,
logfc_threshold = 0,
min_pct_threshold = 0.1,
avg_expr_threshold_group1 = 0,
min_diff_pct_threshold = -Inf,
rank_matrix = NULL,
feature_names = NULL,
used_slot = "data",
norm_method = "LogNormalize",
pseudocount_use = 1,
base = 2,
adjust_pvals = TRUE,
check_cells_set_diff = TRUE
)A matrix of gene expression values having genes in rows and cells in columns.
A vector of cell indices for the first group of cells.
A vector of cell indices for the second group of cells.
The minimum absolute log fold change to consider a
gene as differentially expressed. Defaults to 0, meaning all genes are
taken into considereation.
The minimum fraction of cells expressing a gene
form each cell population to consider the gene as differentially expressed.
Increasing the value will speed up the function. Defaults to 0.1.
The minimum average expression that a gene
should have in the first group of cells to be considered as differentially
expressed. Defaults to 0.
The minimum difference in the fraction of cells
expressing a gene between the two cell populations to consider the gene as
differentially expressed. Defaults to -Inf.
A matrix where the cells are ranked based on their
expression levels with respect to each gene. Defaults to NULL, in which
case the function will calculate the rank matrix. We recommend calculating
the rank matrix beforehand and passing it to the function to speed up the
computation.
A vector of gene names. Defaults to NULL, in which
case the function will use the row names of the expression matrix as gene
names.
Parameter that provides additional information about the
expression matrix, whether it was scaled or not. The value of this parameter
impacts the calculation of the fold change. If data, the function will
calculates the fold change as the fraction between the log value of the
average of the expression raised to exponential for the two cell groups. If
scale.data, the function will calculate the fold change as the fraction
between the average of the expression values for the two cell groups.
Other options will default to calculating the fold change as the fraction
between the log value of the average of the expression values for the two
cell groups. Defaults to data.
The normalization method used to normalize the expression
matrix. The value of this parameter impacts the calculation of the average
expression of the genes when used_slot = "data". If LogNormalize, the
log fold change will be calculated as described for the used_slot
parameter. Otherwise, the log fold change will be calculated as the fraction
between the log value of the average of the expression values for the two
cell groups. Defaults to LogNormalize.
The pseudocount to add to the expression values when
calculating the average expression of the genes, to avoid the 0 value for
the denominator. Defaults to 1.
The base of the logharithm. Defaults to 2.
A logical value indicating whether to adjust the p-values
for multiple testing using the Bonferonni method. Defaults to TRUE.
A logical value indicating whether to check if
thw two cell groups are disjoint or not. Defaults to TRUE.
A data frame containing the following columns:
gene: The gene name.
avg_log2FC: The average log fold change between the two cell groups.
p_val: The p-value of the Wilcoxon rank sum test.
p_val_adj: The adjusted p-value of the Wilcoxon rank sum test.
pct.1: The fraction of cells expressing the gene in the first cell group.
pct.2: The fraction of cells expressing the gene in the second cell group.
avg_expr_group1: The average expression of the gene in the first cell group.
set.seed(2024)
# create an artificial expression matrix
expr_matrix <- matrix(
c(runif(100 * 50), runif(100 * 50, min = 3, max = 4)),
ncol = 200, byrow = FALSE
)
colnames(expr_matrix) <- as.character(1:200)
rownames(expr_matrix) <- paste("feature", 1:50)
calculate_markers(
expression_matrix = expr_matrix,
cells1 = 101:200,
cells2 = 1:100
)
#> gene avg_log2FC pct.1 pct.2 p_val p_val_adj
#> feature 1 feature 1 4.239524 1 1 2.562144e-34 1.281072e-32
#> feature 2 feature 2 4.352219 1 1 2.562144e-34 1.281072e-32
#> feature 3 feature 3 4.175922 1 1 2.562144e-34 1.281072e-32
#> feature 4 feature 4 4.378631 1 1 2.562144e-34 1.281072e-32
#> feature 5 feature 5 4.463490 1 1 2.562144e-34 1.281072e-32
#> feature 6 feature 6 4.331126 1 1 2.562144e-34 1.281072e-32
#> feature 7 feature 7 4.302631 1 1 2.562144e-34 1.281072e-32
#> feature 8 feature 8 4.358705 1 1 2.562144e-34 1.281072e-32
#> feature 9 feature 9 4.330728 1 1 2.562144e-34 1.281072e-32
#> feature 10 feature 10 4.274740 1 1 2.562144e-34 1.281072e-32
#> feature 11 feature 11 4.366064 1 1 2.562144e-34 1.281072e-32
#> feature 12 feature 12 4.297543 1 1 2.562144e-34 1.281072e-32
#> feature 13 feature 13 4.268649 1 1 2.562144e-34 1.281072e-32
#> feature 14 feature 14 4.334088 1 1 2.562144e-34 1.281072e-32
#> feature 15 feature 15 4.314004 1 1 2.562144e-34 1.281072e-32
#> feature 16 feature 16 4.333369 1 1 2.562144e-34 1.281072e-32
#> feature 17 feature 17 4.320455 1 1 2.562144e-34 1.281072e-32
#> feature 18 feature 18 4.344738 1 1 2.562144e-34 1.281072e-32
#> feature 19 feature 19 4.335349 1 1 2.562144e-34 1.281072e-32
#> feature 20 feature 20 4.228360 1 1 2.562144e-34 1.281072e-32
#> feature 21 feature 21 4.223689 1 1 2.562144e-34 1.281072e-32
#> feature 22 feature 22 4.299128 1 1 2.562144e-34 1.281072e-32
#> feature 23 feature 23 4.308610 1 1 2.562144e-34 1.281072e-32
#> feature 24 feature 24 4.392178 1 1 2.562144e-34 1.281072e-32
#> feature 25 feature 25 4.332750 1 1 2.562144e-34 1.281072e-32
#> feature 26 feature 26 4.263657 1 1 2.562144e-34 1.281072e-32
#> feature 27 feature 27 4.296146 1 1 2.562144e-34 1.281072e-32
#> feature 28 feature 28 4.282556 1 1 2.562144e-34 1.281072e-32
#> feature 29 feature 29 4.324883 1 1 2.562144e-34 1.281072e-32
#> feature 30 feature 30 4.317534 1 1 2.562144e-34 1.281072e-32
#> feature 31 feature 31 4.374629 1 1 2.562144e-34 1.281072e-32
#> feature 32 feature 32 4.337736 1 1 2.562144e-34 1.281072e-32
#> feature 33 feature 33 4.256014 1 1 2.562144e-34 1.281072e-32
#> feature 34 feature 34 4.282877 1 1 2.562144e-34 1.281072e-32
#> feature 35 feature 35 4.312373 1 1 2.562144e-34 1.281072e-32
#> feature 36 feature 36 4.348016 1 1 2.562144e-34 1.281072e-32
#> feature 37 feature 37 4.358399 1 1 2.562144e-34 1.281072e-32
#> feature 38 feature 38 4.309778 1 1 2.562144e-34 1.281072e-32
#> feature 39 feature 39 4.348881 1 1 2.562144e-34 1.281072e-32
#> feature 40 feature 40 4.378765 1 1 2.562144e-34 1.281072e-32
#> feature 41 feature 41 4.226791 1 1 2.562144e-34 1.281072e-32
#> feature 42 feature 42 4.425555 1 1 2.562144e-34 1.281072e-32
#> feature 43 feature 43 4.451202 1 1 2.562144e-34 1.281072e-32
#> feature 44 feature 44 4.374020 1 1 2.562144e-34 1.281072e-32
#> feature 45 feature 45 4.337123 1 1 2.562144e-34 1.281072e-32
#> feature 46 feature 46 4.285579 1 1 2.562144e-34 1.281072e-32
#> feature 47 feature 47 4.344734 1 1 2.562144e-34 1.281072e-32
#> feature 48 feature 48 4.379857 1 1 2.562144e-34 1.281072e-32
#> feature 49 feature 49 4.338409 1 1 2.562144e-34 1.281072e-32
#> feature 50 feature 50 4.311828 1 1 2.562144e-34 1.281072e-32
#> avg_expr_group1
#> feature 1 3.497559
#> feature 2 3.496166
#> feature 3 3.445801
#> feature 4 3.504912
#> feature 5 3.577405
#> feature 6 3.525528
#> feature 7 3.485442
#> feature 8 3.506795
#> feature 9 3.476262
#> feature 10 3.484119
#> feature 11 3.498119
#> feature 12 3.493887
#> feature 13 3.471559
#> feature 14 3.498410
#> feature 15 3.511520
#> feature 16 3.537302
#> feature 17 3.482097
#> feature 18 3.507179
#> feature 19 3.493560
#> feature 20 3.431861
#> feature 21 3.465087
#> feature 22 3.477709
#> feature 23 3.487408
#> feature 24 3.527158
#> feature 25 3.513766
#> feature 26 3.469997
#> feature 27 3.462232
#> feature 28 3.501975
#> feature 29 3.491901
#> feature 30 3.515717
#> feature 31 3.466661
#> feature 32 3.534487
#> feature 33 3.483160
#> feature 34 3.481944
#> feature 35 3.504899
#> feature 36 3.521041
#> feature 37 3.508432
#> feature 38 3.471637
#> feature 39 3.521095
#> feature 40 3.505936
#> feature 41 3.480344
#> feature 42 3.528489
#> feature 43 3.516821
#> feature 44 3.487346
#> feature 45 3.502275
#> feature 46 3.484552
#> feature 47 3.528281
#> feature 48 3.517302
#> feature 49 3.513613
#> feature 50 3.482948