Polyase API
Submodules
polyase.allele_utils module
Utilities for calculating and analyzing allelic ratios.
- class polyase.allele_utils.AlleleRatioCalculator(adata=None)
Bases:
objectClass for calculating and managing allelic ratios in AnnData objects.
- calculate_multiple_ratios(counts_layers=None)
Calculate allelic ratios for multiple count layers.
Parameters:
- counts_layerslist of str, optional
List of layer names to calculate ratios for. If None, calculates for all layers with ‘counts’ in their name.
Returns:
- adataAnnData
Updated AnnData object with allelic ratio layers added
- calculate_ratios(counts_layer='unique_counts', output_suffix=None)
Calculate allelic ratios for each transcript grouped by Synt_id, computing ratios independently for each column (gene/feature).
Parameters:
- counts_layerstr, optional (default: ‘unique_counts’)
Layer containing counts to use for ratio calculations
- output_suffixstr, optional
Custom suffix for output layer name. If None, uses counts_layer name
Returns:
- adataAnnData
Updated AnnData object with allelic ratio layer added
- get_ratios_for_synt_id(synt_id, ratio_layer='allelic_ratio_unique_counts')
Get allelic ratios for a specific Synt_id.
Parameters:
- synt_idint or str
The Synt_id to get ratios for
- ratio_layerstr, optional
Name of the layer containing the ratio data
Returns:
- ratiosnumpy array
Array of ratio values for the specified Synt_id
- polyase.allele_utils.calculate_allelic_ratios(adata, counts_layer='unique_counts')
Calculate allelic ratios for each transcript grouped by Synt_id.
Parameters:
- adataAnnData
AnnData object containing transcript data
- counts_layerstr, optional (default: ‘unique_counts’)
Layer containing counts to use for ratio calculations
Returns:
- adataAnnData
Updated AnnData object with ‘allelic_ratio’ layer added
polyase.ase_data_loader module
- polyase.ase_data_loader.aggregate_transcripts_to_genes(adata_tx)
Aggregate transcript-level AnnData to gene-level AnnData.
- Parameters:
adata_tx (anndata.AnnData) – Transcript-level AnnData object with ‘gene_id’ in adata_tx.var. Must contain layers: ‘unique_counts’, ‘ambiguous_counts’, ‘em_counts’, ‘unique_cpm’,
- Returns:
Gene-level AnnData object with aggregated counts and CPM layers.
- Return type:
anndata.AnnData
- polyase.ase_data_loader.load_ase_data(var_obs_file, isoform_counts_dir, tx_to_gene_file, sample_info=None, counts_file=None, fillna=0, calculate_cpm=True, quant_dir=None, n_jobs=4)
Load allele-specific expression data from long-read RNAseq at isoform level.
- Parameters:
var_obs_file (str) – Path to the variant observations file.
isoform_counts_dir (str) – Directory containing the isoform counts files.
tx_to_gene_file (str) – Path to TSV file mapping transcript_id to gene_id.
sample_info (dict, optional) – Dictionary mapping sample IDs to their conditions.
counts_file (str, optional) – Path to additional counts file (salmon merged transcript counts).
fillna (int or float, optional) – Value to fill NA values with.
calculate_cpm (bool, optional) – Whether to calculate CPM (Counts Per Million) from EM counts, by default True.
quant_dir (str, optional) – Directory containing quantification files with EM counts.
n_jobs (int, optional) – Number of parallel jobs for loading samples, by default 4.
- Returns:
AnnData object containing the processed isoform-level data with EM counts and CPM layers. Includes all transcripts from expression matrix and tx2gene mapping, with NaN values for var_obs data when genes are not found in var_obs_file.
- Return type:
anndata.AnnData
polyase.filter module
AnnData Filtering Module
This module provides functions for filtering AnnData objects based on group expression patterns. It supports filtering by group expression levels with various normalization methods and thresholds.
- polyase.filter.filter_low_expressed_genes(adata: AnnData, min_expression: float | Dict[str, float] | Callable[[float], float] = 1.0, library_size_dependent: bool = False, lib_size_normalization: str | None = None, layer: str | None = None, group_col: str | Tuple[str, int] = 'Synt_id', group_source: str = 'var', mode: str = 'any', return_dropped: bool = False, copy: bool = True, filter_axis: Literal[0, 1] = 1, verbose: bool = True) AnnData | Tuple[AnnData, List[str]]
Filter an AnnData object to remove groups with low expression across samples.
- Parameters:
adata (AnnData) – AnnData object with group IDs and expression data
min_expression (float, dict, or callable, default=1.0) –
Minimum expression threshold for groups:
float: same threshold applied to all samples/features
dict: {name: threshold} for sample/feature-specific thresholds
callable: function that takes library size and returns threshold (e.g., lambda lib_size: lib_size * 1e-6 for 0.0001% of lib size)
library_size_dependent (bool, default=False) – If True, scale thresholds by library size for each sample
lib_size_normalization (str or None, default=None) –
How to normalize for library size:
’cpm’: Counts Per Million (divide by lib_size/1e6)
None: No normalization
layer (str or None, default=None) – Layer to use for expression values. If None, use .X
group_col (str or tuple, default='Synt_id') – Column name containing group IDs. For obsm/varm, should be a tuple (key, column_index)
group_source (str, default='var') – Location of the group column in AnnData (‘obs’, ‘var’, ‘obsm’, ‘varm’)
mode (str, default='any') –
‘any’: Keep groups that pass threshold in any sample/feature
’all’: Keep groups that pass threshold in all samples/features
’mean’: Keep groups that pass threshold on average across samples/features
return_dropped (bool, default=False) – If True, also return list of dropped group IDs
copy (bool, default=True) – If True, return a copy of the filtered AnnData object. If False, filter the AnnData object in place
filter_axis (int, default=1) –
1: Filter rows (obs) based on group expression across columns (var)
0: Filter columns (var) based on group expression across rows (obs)
verbose (bool, default=True) – Whether to print additional information during filtering
- Returns:
Filtered AnnData object, and optionally a list of dropped group IDs
- Return type:
AnnData or tuple
- Raises:
ValueError – If parameters are invalid or required data is missing
polyase.multimapping module
Utilities for calculating multimapping ratios per syntelog.
- class polyase.multimapping.MultimappingRatioCalculator(adata=None)
Bases:
objectClass for calculating multimapping ratios in AnnData objects.
- calculate_ratios(unique_layer='unique_counts', multi_layer='ambiguous_counts')
Calculate multimapping ratios for each transcript grouped by Synt_id.
Parameters:
- unique_layerstr, optional (default: ‘unique_counts’)
Layer containing unique counts to use for ratio calculations
- multi_layerstr, optional (default: ‘ambiguous_counts’)
Layer containing multimapping counts to use for ratio calculations
Returns:
- adataAnnData
Updated AnnData object with multimapping ratio layer added
- get_ratios_for_synt_id(synt_id, multi_layer='multimapping_ratio')
Get multimapping ratios for a specific Synt_id.
Parameters:
- synt_idint or str
The Synt_id to get ratios for
- multi_layerstr, optional
Name of the layer containing the ratio data
Returns:
- ratiosnumpy array
Array of multimapping values for the specified Synt_id
- polyase.multimapping.calculate_multi_ratios(adata, unique_layer='unique_counts', multi_layer='ambiguous_counts')
Calculate multimapping ratios for each transcript grouped by Synt_id.
Parameters:
- adataAnnData
AnnData object containing transcript data
- unique_layerstr, optional (default: ‘unique_counts’)
Layer containing counts to use for ratio calculations
- multi_layerstr, optional (default: ‘ambiguous_counts’)
Layer containing counts to use for ratio calculations
Returns:
- adataAnnData
Updated AnnData object with ‘multimapping_ratio’ layer added
- polyase.multimapping.calculate_per_allele_ratios(adata, unique_layer='unique_counts', multi_layer='ambiguous_counts', gene_grouping_column='gene_id', inplace=True, count_scaling=True, min_counts_threshold=10, scaling_method='weighted_average')
Calculate multimapping ratios for each individual allele/transcript. For transcripts from the same gene, assigns ratios based on count-weighted calculations.
Parameters:
- adataAnnData
AnnData object containing transcript data
- unique_layerstr, optional (default: ‘unique_counts’)
Layer containing unique counts to use for ratio calculations
- multi_layerstr, optional (default: ‘ambiguous_counts’)
Layer containing multimapping counts to use for ratio calculations
- gene_grouping_columnstr, optional (default: ‘gene_id’)
Column in adata.var to group transcripts by (e.g., ‘Synt_id’, ‘gene_id’)
- inplacebool, optional (default: True)
Whether to modify the input AnnData object or return a copy
- count_scalingbool, optional (default: True)
Whether to scale multimapping ratios by count abundance
- min_counts_thresholdint, optional (default: 10)
Minimum total counts threshold - transcripts below this get reduced weight
- scaling_methodstr, optional (default: ‘weighted_average’)
Method for combining ratios within genes: - ‘weighted_average’: Weight by total counts - ‘max_weighted’: Take max ratio but weight by counts - ‘abundance_filtered’: Only consider transcripts above threshold
Returns:
- adataAnnData
Updated AnnData object with per-allele multimapping ratio added to var
polyase.plotting module
- polyase.plotting.convert_pvalue_to_asterisks(pvalue)
Convert p-values to significance asterisks notation.
- polyase.plotting.plot_allele_specific_isoform_structure(results_df, annotation_df, ratio_difference_threshold=0.2, width=1300, height=350, template='simple_white')
Visualize allele-specific isoform structure differences.
Filtering is performed at the syntelog level: if any isoform of a syntelog meets the significance criteria, all isoforms of that syntelog will be plotted.
- Parameters:
results_df (pd.DataFrame) – DataFrame containing allele-specific results with columns: - Synt_id: Syntelog identifier - isoform_id: Isoform/transcript identifier - gene_id: Gene identifier - haplotype: Haplotype identifier - reference_haplotype: Reference haplotype identifier - sample: Sample identifier - isoform_counts: Isoform counts (will be detected dynamically) - isoform_ratio: Isoform usage ratio - ratio_difference: Difference in ratios between haplotypes
annotation_df (polars.DataFrame) – Polars DataFrame containing GTF annotation
ratio_difference_threshold (float, default=0.2) – Minimum ratio difference threshold for identifying significant syntelogs (applied at syntelog level)
width (int, default=1300) – Plot width in pixels
height (int, default=350) – Plot height in pixels
template (str, default="simple_white") – Plotly template for styling
- Returns:
List of generated plotly figures
- Return type:
list
- Raises:
ImportError – If required packages are not available
ValueError – If required columns are missing
- polyase.plotting.plot_allelic_ratios(adata, synteny_category: str, sample: str | List[str] = 'all', multimapping_threshold: float = 0.5, ratio_type: str = 'both', bins: int = 30, figsize: Tuple[int, int] = (12, 6), kde: bool = True, save_path: str | None = None)
Plot allelic ratios for transcripts in a specific synteny category. Filters out Synt_ids where all allelic ratios are 0 for a given sample.
- Parameters:
adata (AnnData) – AnnData object containing transcript data
synteny_category (str) – Synteny category to filter for
sample (str or List[str], default="all") – Sample(s) to plot. If list, will plot each sample separately
multimapping_threshold (float, default=0.5) – Threshold for high multimapping ratio
ratio_type (str, default="both") – Type of ratio to plot: “unique”, “em”, or “both”
bins (int, default=30) – Number of bins for the histogram
figsize (tuple, default=(12, 6)) – Figure size (width, height) in inches
kde (bool, default=True) – Whether to show KDE curve on histogram
save_path (str, optional) – Path to save the plot. If None, plot is shown but not saved
- Returns:
The figure object containing the plot(s)
- Return type:
matplotlib.figure.Figure
- polyase.plotting.plot_differential_isoform_usage(results_df, annotation_df, fdr_threshold=0.05, ratio_difference_threshold=0.2, width=1300, height=350, template='simple_white')
Visualize differential isoform usage results using transcript structure plots.
Filtering is performed at the gene level: if any isoform of a gene meets the significance criteria, all isoforms of that gene will be plotted.
- Parameters:
results_df (pd.DataFrame) – DataFrame containing differential isoform usage results with columns: - gene_id: Gene identifier - transcript_id: Transcript identifier - functional_annotation: Functional annotation (optional) - sample_name: Sample name - {layer}_cpm: CPM normalized counts for the specified layer - isoform_ratio: Isoform usage ratio - condition: Experimental condition - FDR: False discovery rate (optional for filtering) - ratio_difference: Difference in ratios (optional for filtering)
annotation_df (polars.DataFrame) – Polars DataFrame containing GTF annotation with: - gene_id: Gene identifier Plus standard GTF columns for RNApysoforms
fdr_threshold (float, default=0.05) – FDR threshold for identifying significant genes (applied at gene level)
ratio_difference_threshold (float, default=0.2) – Minimum ratio difference threshold for identifying significant genes (applied at gene level)
width (int, default=1300) – Plot width in pixels
height (int, default=350) – Plot height in pixels
template (str, default="simple_white") – Plotly template for styling
- Returns:
List of generated plotly figures
- Return type:
list
- Raises:
ImportError – If required packages (RNApysoforms) are not available
ValueError – If required columns are missing from input DataFrames
- polyase.plotting.plot_top_differential_isoforms(results_df, n=5, figsize=(16, 12), palette=None, jitter=0.2, alpha=0.7, ylim=(0, 1), sort_by='p_value', output_file=None, sig_threshold=0.05, ratio_difference_threshold=0.2, sig_color='red')
Plot the top n genes with differential isoform usage in a grid layout (3 plots per row). Genes with significant differences will have their titles highlighted in red.
- Parameters:
results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios function
n (int, optional) – Number of top genes to plot (default: 5)
figsize (tuple, optional) – Figure size as (width, height) in inches (default: (16, 12))
palette (dict or None, optional) – Color palette for conditions (default: None, uses seaborn defaults)
jitter (float, optional) – Amount of jitter for strip plot (default: 0.2)
alpha (float, optional) – Transparency of points (default: 0.7)
ylim (tuple, optional) – Y-axis limits (default: (0, 1))
sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)
output_file (str, optional) – Path to save the figure (default: None, displays figure but doesn’t save)
sig_threshold (float, optional) – Significance threshold for p-value or FDR (default: 0.05)
ratio_difference_threshold (float, optional) – Ratio difference threshold for significance (default: 0.2)
sig_color (str, optional) – Color for titles of genes with significant differences (default: ‘red’)
- Returns:
fig – The generated figure
- Return type:
matplotlib.figure.Figure
- polyase.plotting.plot_top_differential_syntelogs(results_df, n=5, figsize=(16, 12), palette=None, jitter=0.2, alpha=0.7, ylim=None, sort_by='p_value', output_file=None, sig_threshold=0.05, ratio_difference_threshold=0.1, sig_color='red', plot_type='ratios')
Plot the top n syntelogs with differential allelic ratios or CPM values in a grid layout (6 plots per row). Syntelogs with significant differences will have their titles highlighted in red.
- Parameters:
results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios function
n (int, optional) – Number of top syntelogs to plot (default: 5)
figsize (tuple, optional) – Figure size as (width, height) in inches (default: (16, 12))
palette (dict or None, optional) – Color palette for conditions (default: None, uses seaborn defaults)
jitter (float, optional) – Amount of jitter for strip plot (default: 0.2)
alpha (float, optional) – Transparency of points (default: 0.7)
ylim (tuple, optional) – Y-axis limits (default: None, auto-determined based on plot_type)
sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)
output_file (str, optional) – Path to save the figure (default: None, displays figure but doesn’t save)
sig_threshold (float, optional) – Significance threshold for p-value or FDR (default: 0.05)
ratio_difference_threshold (float, optional) – Ratio difference threshold for significance (default: 0.1)
sig_color (str, optional) – Color for titles of syntelogs with significant differences (default: ‘red’)
plot_type (str, optional) – What to plot: ‘ratios’ for allelic ratios or ‘cpm’ for CPM values (default: ‘ratios’)
- Returns:
fig – The generated figure
- Return type:
matplotlib.figure.Figure
- polyase.plotting.plot_top_differential_syntelogs_annotated(results_df, n=5, figsize=(16, 12), palette=None, jitter=0.2, alpha=0.7, ylim=None, sort_by='p_value', output_file=None, sig_threshold=0.05, ratio_difference_threshold=0.1, sig_color='red', plot_type='ratios', log2fc_threshold=1.0)
Plot the top n syntelogs with significance brackets between alleles based on test results.
Handles two input formats: - Per-allele format (from test_allelic_ratios_within_conditions): one row per allele,
columns ‘allele’, ‘gene_id’, ‘ratios_rep_{condition}’. Brackets are drawn from allele 0 to each significantly differential allele using the per-allele FDR.
Pairwise format (from test_allelic_ratios_pairwise): one row per allele pair, columns ‘allele_i’, ‘allele_j’, ‘gene_id_i’, ‘gene_id_j’, ‘ratios_rep_allele_{hap}’. Brackets are drawn between each significantly different allele pair using the pair FDR.
- Parameters:
results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios_within_conditions or test_allelic_ratios_pairwise
n (int, optional) – Number of top syntelogs to plot (default: 5)
figsize (tuple, optional) – Figure size as (width, height) in inches (default: (16, 12))
palette (dict or None, optional) – Color palette for conditions (default: None, uses seaborn defaults)
jitter (float, optional) – Amount of jitter for strip plot (default: 0.2)
alpha (float, optional) – Transparency of points (default: 0.7)
ylim (tuple, optional) – Y-axis limits (default: None, auto-determined based on plot_type)
sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)
output_file (str, optional) – Path to save the figure (default: None, displays figure but doesn’t save)
sig_threshold (float, optional) – Significance threshold for FDR or p_value (default: 0.05)
ratio_difference_threshold (float, optional) – Ratio difference threshold for significance (default: 0.1)
sig_color (str, optional) – Color for titles of syntelogs with significant differences (default: ‘red’)
plot_type (str, optional) – What to plot: ‘ratios’ for allelic ratios or ‘cpm’ for CPM values (default: ‘ratios’)
log2fc_threshold (float or None, optional) – For pairwise between-conditions results: annotate brackets with the log2FC difference when abs(log2FC_difference) exceeds this value (default: 1.0). Set to None to disable.
- Returns:
fig – The generated figure
- Return type:
matplotlib.figure.Figure
polyase.stats module
stats.py
The stats module of polyase project
- polyase.stats.get_top_differential_syntelogs(results_df, n=5, sort_by='p_value', fdr_threshold=0.05, ratio_threshold=0.1)
Get the top n syntelogs with differential allelic ratios.
- Parameters:
results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios function
n (int, optional) – Number of top syntelogs to return (default: 5)
sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)
fdr_threshold (float, optional) – Maximum FDR to consider a result significant (default: 0.05)
- Returns:
Filtered dataframe containing only the top n syntelogs
- Return type:
pd.DataFrame
- polyase.stats.test_allelic_ratios_between_conditions(adata, layer='unique_counts', group_key='condition', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)
Test if allelic ratios change between conditions and store results in AnnData object.
- Parameters:
adata (AnnData) – AnnData object containing expression data
layer (str, optional) – Layer containing count data (default: “unique_counts”)
group_key (str, optional) – Variable column name containing condition information (default: “condition”)
inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)
FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)
ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)
- Returns:
AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘allelic_ratio_test’]: Complete test results as DataFrame - adata.var[‘allelic_ratio_pval’]: P-values for each allele - adata.var[‘allelic_ratio_FDR’]: FDR-corrected p-values for each allele
pd.DataFrame – Results of statistical tests for each syntelog
- polyase.stats.test_allelic_ratios_pairwise(adata, layer='unique_counts', test_condition='control', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)
Test all pairwise combinations of alleles within a syntelog for differential expression.
For each syntelog, every pair of alleles is tested against each other using a beta-binomial likelihood ratio test. This is in contrast to test_allelic_ratios_within_conditions, which tests each allele against a balanced (equal) expectation.
- Parameters:
adata (AnnData) – AnnData object containing expression data
layer (str, optional) – Layer containing count data (default: “unique_counts”)
test_condition (str, optional) – Condition to subset samples for testing, or “all” to use all samples (default: “control”)
inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)
FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)
ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)
- Returns:
If inplace=False, returns modified copy of AnnData; otherwise returns results DataFrame. Results are stored in: - adata.uns[‘allelic_ratio_pairwise_test’]: Complete pairwise test results as DataFrame - adata.var[‘allelic_ratio_pairwise_min_pval’]: Minimum p-value across all pairs per allele - adata.var[‘allelic_ratio_pairwise_min_FDR’]: Corresponding FDR-corrected value
- Return type:
AnnData or pd.DataFrame
- polyase.stats.test_allelic_ratios_within_conditions(adata, layer='unique_counts', test_condition='control', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)
Test if alleles of a gene have unequal expression and store results in AnnData object.
- Parameters:
adata (AnnData) – AnnData object containing expression data
layer (str, optional) – Layer containing count data (default: “unique_counts”)
test_condition (str, optional) – Variable column name containing condition for testing within (default: “control”)
inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)
FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)
ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)
- Returns:
AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘allelic_ratio_test’]: Complete test results as DataFrame - adata.var[‘allelic_ratio_pval’]: P-values for each allele - adata.var[‘allelic_ratio_FDR’]: FDR-corrected p-values for each allele
pd.DataFrame – Results of statistical tests for each syntelog
- polyase.stats.test_differential_isoform_structure(adata, layer='unique_counts', test_condition='control', min_similarity_for_matching=0.9, use_introns=True, exon_weight=0.6, intron_weight=0.4, inplace=True, verbose=False, return_plotting_data=True, ratio_diff_cutoff=0.2, FDR_cutoff=0.05)
Test for DIU between alleles with intelligent major/minor isoform fallback.
Includes isoforms in plotting data even if they have zero expression in reference haplotype
Matches zero-expressed reference isoforms with corresponding isoforms in other haplotypes
- Parameters:
adata (AnnData) – AnnData object containing expression data
layer (str, optional) – Layer containing count data (default: “unique_counts”)
test_condition (str, optional) – Condition to test for differential isoform usage (default: “control”)
min_similarity_for_matching (float, optional) – Minimum similarity score for matching isoforms (default: 0.9)
use_introns (bool, optional) – Whether to include intron structures in similarity calculations (default: True)
exon_weight (float, optional) – Weight for exon similarity in overall similarity score (default: 0.6)
intron_weight (float, optional) – Weight for intron similarity in overall similarity score (default: 0.4)
inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)
verbose (bool, optional) – Whether to print progress messages (default: False)
return_plotting_data (bool, optional) – Whether to return plotting data along with results (default: True)
ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.2)
FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)
- Returns:
If return_plotting_data=True: returns (results_df, plotting_df) If return_plotting_data=False: returns results_df only
- Return type:
tuple or pd.DataFrame
- polyase.stats.test_isoform_DIU_between_conditions(adata, layer='unique_counts', group_key='condition', gene_id_key='gene_id', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.2)
Test if isoform usage ratios change between conditions and store results in AnnData object.
- Parameters:
adata (AnnData) – AnnData object containing expression data
layer (str, optional) – Layer containing count data (default: “unique_counts”)
group_key (str, optional) – Variable column name containing condition information (default: “condition”)
gene_id_key (str, optional) – Variable column name containing gene ID information (default: “gene_id”)
inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)
FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)
ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.2)
- Returns:
AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘isoform_usage_test’]: Complete test results as DataFrame - adata.var[‘isoform_usage_pval’]: P-values for each isoform - adata.var[‘isoform_usage_FDR’]: FDR-corrected p-values for each isoform
pd.DataFrame – Results of statistical tests for each gene
pd.DataFrame – Plotting results table with one row per replicate, condition, isoform ratio, and transcript
- polyase.stats.test_pairwise_allele_response_between_conditions(adata, layer='unique_counts', group_key='condition', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)
Test if pairs of alleles within a syntelog respond differently between conditions.
For each pair of alleles (i, j) within a syntelog, tests whether the ratio count_i / (count_i + count_j) changes between conditions using a beta-binomial likelihood ratio test. A significant result means the two alleles have divergent fold-changes between conditions (e.g., allele1 up-regulated, allele2 down-regulated).
- Parameters:
adata (AnnData) – AnnData object containing expression data
layer (str, optional) – Layer containing count data (default: “unique_counts”)
group_key (str, optional) – Variable column name containing condition information (default: “condition”)
inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)
FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)
ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)
- Returns:
AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘pairwise_allele_response_test’]: Complete test results as DataFrame - adata.var[‘pairwise_response_min_pval’]: Min p-value across all pairs per allele - adata.var[‘pairwise_response_min_fdr’]: Min FDR across all pairs per allele
pd.DataFrame – Results of pairwise statistical tests
polyase.structure module
Module for adding exon and intron structure information to AnnData objects.
- polyase.structure.add_exon_structure(adata: AnnData, gtf_file: str | None = None, gtf_df: DataFrame | None = None, transcript_id_col: str = 'transcript_id', include_introns: bool = True, inplace: bool = True, verbose: bool = True) AnnData | None
Add exon and intron structure information to AnnData.var from GTF/GFF data.
- Parameters:
adata (AnnData) – AnnData object containing transcript data
gtf_file (str, optional) – Path to GTF/GFF file. Either gtf_file or gtf_df must be provided.
gtf_df (pd.DataFrame, optional) – DataFrame with GTF/GFF data. Either gtf_file or gtf_df must be provided.
transcript_id_col (str, default='transcript_id') – Column name in GTF data containing transcript identifiers
include_introns (bool, default=True) – Whether to calculate and include intron structure information
inplace (bool, default=True) – If True, modify the AnnData object in place. If False, return a copy.
verbose (bool, default=True) – Whether to print progress information
- Returns:
If inplace=False, returns modified copy of AnnData object. If inplace=True, returns None and modifies the input object.
- Return type:
AnnData or None
- Raises:
ValueError – If neither gtf_file nor gtf_df is provided, or if required columns are missing
- polyase.structure.add_structure_from_gtf(adata: AnnData, gtf_file: str, include_introns: bool = True, inplace: bool = True, verbose: bool = True) AnnData | None
Convenience function to add exon and intron structure from GTF file.
- Parameters:
adata (AnnData) – AnnData object containing transcript data
gtf_file (str) – Path to GTF/GFF file
include_introns (bool, default=True) – Whether to calculate and include intron structures
inplace (bool, default=True) – If True, modify the AnnData object in place
verbose (bool, default=True) – Whether to print progress information
- Returns:
Modified AnnData object if inplace=False, otherwise None
- Return type:
AnnData or None
- polyase.structure.calculate_combined_structure_similarity(exon_structure1: List[int], exon_structure2: List[int], intron_structure1: List[int], intron_structure2: List[int], exon_weight: float = 0.6, intron_weight: float = 0.4) float
Calculate combined similarity using both exon and intron structures.
- Parameters:
exon_structure1 (List[int]) – First transcript’s exon structure
exon_structure2 (List[int]) – Second transcript’s exon structure
intron_structure1 (List[int]) – First transcript’s intron structure
intron_structure2 (List[int]) – Second transcript’s intron structure
exon_weight (float, default=0.6) – Weight for exon similarity (must sum with intron_weight to 1.0)
intron_weight (float, default=0.4) – Weight for intron similarity (must sum with exon_weight to 1.0)
- Returns:
Combined similarity score between 0 and 1
- Return type:
float
- polyase.structure.calculate_structure_similarity(structure1: List[int], structure2: List[int], mode: str = 'exon') float
Calculate similarity between two transcript structures.
- Parameters:
structure1 (List[int]) – First structure as list of exon/intron lengths
structure2 (List[int]) – Second structure as list of exon/intron lengths
mode (str, default='exon') – Type of structure (‘exon’ or ‘intron’)
- Returns:
Similarity score between 0 and 1 (Jaccard index)
- Return type:
float
Module contents
PolyASE: A package for analyzing allele-specific expression in polyploid plants.
This package provides tools for calculating and analyzing allelic ratios, visualizing allele-specific expression patterns, and statistical testing of allelic imbalance in polyploid plant genomes.
- class polyase.AlleleRatioCalculator(adata=None)
Bases:
objectClass for calculating and managing allelic ratios in AnnData objects.
- calculate_multiple_ratios(counts_layers=None)
Calculate allelic ratios for multiple count layers.
Parameters:
- counts_layerslist of str, optional
List of layer names to calculate ratios for. If None, calculates for all layers with ‘counts’ in their name.
Returns:
- adataAnnData
Updated AnnData object with allelic ratio layers added
- calculate_ratios(counts_layer='unique_counts', output_suffix=None)
Calculate allelic ratios for each transcript grouped by Synt_id, computing ratios independently for each column (gene/feature).
Parameters:
- counts_layerstr, optional (default: ‘unique_counts’)
Layer containing counts to use for ratio calculations
- output_suffixstr, optional
Custom suffix for output layer name. If None, uses counts_layer name
Returns:
- adataAnnData
Updated AnnData object with allelic ratio layer added
- get_ratios_for_synt_id(synt_id, ratio_layer='allelic_ratio_unique_counts')
Get allelic ratios for a specific Synt_id.
Parameters:
- synt_idint or str
The Synt_id to get ratios for
- ratio_layerstr, optional
Name of the layer containing the ratio data
Returns:
- ratiosnumpy array
Array of ratio values for the specified Synt_id
- class polyase.MultimappingRatioCalculator(adata=None)
Bases:
objectClass for calculating multimapping ratios in AnnData objects.
- calculate_ratios(unique_layer='unique_counts', multi_layer='ambiguous_counts')
Calculate multimapping ratios for each transcript grouped by Synt_id.
Parameters:
- unique_layerstr, optional (default: ‘unique_counts’)
Layer containing unique counts to use for ratio calculations
- multi_layerstr, optional (default: ‘ambiguous_counts’)
Layer containing multimapping counts to use for ratio calculations
Returns:
- adataAnnData
Updated AnnData object with multimapping ratio layer added
- get_ratios_for_synt_id(synt_id, multi_layer='multimapping_ratio')
Get multimapping ratios for a specific Synt_id.
Parameters:
- synt_idint or str
The Synt_id to get ratios for
- multi_layerstr, optional
Name of the layer containing the ratio data
Returns:
- ratiosnumpy array
Array of multimapping values for the specified Synt_id
- polyase.add_exon_structure(adata: AnnData, gtf_file: str | None = None, gtf_df: DataFrame | None = None, transcript_id_col: str = 'transcript_id', include_introns: bool = True, inplace: bool = True, verbose: bool = True) AnnData | None
Add exon and intron structure information to AnnData.var from GTF/GFF data.
- Parameters:
adata (AnnData) – AnnData object containing transcript data
gtf_file (str, optional) – Path to GTF/GFF file. Either gtf_file or gtf_df must be provided.
gtf_df (pd.DataFrame, optional) – DataFrame with GTF/GFF data. Either gtf_file or gtf_df must be provided.
transcript_id_col (str, default='transcript_id') – Column name in GTF data containing transcript identifiers
include_introns (bool, default=True) – Whether to calculate and include intron structure information
inplace (bool, default=True) – If True, modify the AnnData object in place. If False, return a copy.
verbose (bool, default=True) – Whether to print progress information
- Returns:
If inplace=False, returns modified copy of AnnData object. If inplace=True, returns None and modifies the input object.
- Return type:
AnnData or None
- Raises:
ValueError – If neither gtf_file nor gtf_df is provided, or if required columns are missing
- polyase.add_structure_from_gtf(adata: AnnData, gtf_file: str, include_introns: bool = True, inplace: bool = True, verbose: bool = True) AnnData | None
Convenience function to add exon and intron structure from GTF file.
- Parameters:
adata (AnnData) – AnnData object containing transcript data
gtf_file (str) – Path to GTF/GFF file
include_introns (bool, default=True) – Whether to calculate and include intron structures
inplace (bool, default=True) – If True, modify the AnnData object in place
verbose (bool, default=True) – Whether to print progress information
- Returns:
Modified AnnData object if inplace=False, otherwise None
- Return type:
AnnData or None
- polyase.aggregate_transcripts_to_genes(adata_tx)
Aggregate transcript-level AnnData to gene-level AnnData.
- Parameters:
adata_tx (anndata.AnnData) – Transcript-level AnnData object with ‘gene_id’ in adata_tx.var. Must contain layers: ‘unique_counts’, ‘ambiguous_counts’, ‘em_counts’, ‘unique_cpm’,
- Returns:
Gene-level AnnData object with aggregated counts and CPM layers.
- Return type:
anndata.AnnData
- polyase.calculate_allelic_ratios(adata, counts_layer='unique_counts')
Calculate allelic ratios for each transcript grouped by Synt_id.
Parameters:
- adataAnnData
AnnData object containing transcript data
- counts_layerstr, optional (default: ‘unique_counts’)
Layer containing counts to use for ratio calculations
Returns:
- adataAnnData
Updated AnnData object with ‘allelic_ratio’ layer added
- polyase.calculate_multi_ratios(adata, unique_layer='unique_counts', multi_layer='ambiguous_counts')
Calculate multimapping ratios for each transcript grouped by Synt_id.
Parameters:
- adataAnnData
AnnData object containing transcript data
- unique_layerstr, optional (default: ‘unique_counts’)
Layer containing counts to use for ratio calculations
- multi_layerstr, optional (default: ‘ambiguous_counts’)
Layer containing counts to use for ratio calculations
Returns:
- adataAnnData
Updated AnnData object with ‘multimapping_ratio’ layer added
- polyase.calculate_per_allele_ratios(adata, unique_layer='unique_counts', multi_layer='ambiguous_counts', gene_grouping_column='gene_id', inplace=True, count_scaling=True, min_counts_threshold=10, scaling_method='weighted_average')
Calculate multimapping ratios for each individual allele/transcript. For transcripts from the same gene, assigns ratios based on count-weighted calculations.
Parameters:
- adataAnnData
AnnData object containing transcript data
- unique_layerstr, optional (default: ‘unique_counts’)
Layer containing unique counts to use for ratio calculations
- multi_layerstr, optional (default: ‘ambiguous_counts’)
Layer containing multimapping counts to use for ratio calculations
- gene_grouping_columnstr, optional (default: ‘gene_id’)
Column in adata.var to group transcripts by (e.g., ‘Synt_id’, ‘gene_id’)
- inplacebool, optional (default: True)
Whether to modify the input AnnData object or return a copy
- count_scalingbool, optional (default: True)
Whether to scale multimapping ratios by count abundance
- min_counts_thresholdint, optional (default: 10)
Minimum total counts threshold - transcripts below this get reduced weight
- scaling_methodstr, optional (default: ‘weighted_average’)
Method for combining ratios within genes: - ‘weighted_average’: Weight by total counts - ‘max_weighted’: Take max ratio but weight by counts - ‘abundance_filtered’: Only consider transcripts above threshold
Returns:
- adataAnnData
Updated AnnData object with per-allele multimapping ratio added to var
- polyase.filter_low_expressed_genes(adata: AnnData, min_expression: float | Dict[str, float] | Callable[[float], float] = 1.0, library_size_dependent: bool = False, lib_size_normalization: str | None = None, layer: str | None = None, group_col: str | Tuple[str, int] = 'Synt_id', group_source: str = 'var', mode: str = 'any', return_dropped: bool = False, copy: bool = True, filter_axis: Literal[0, 1] = 1, verbose: bool = True) AnnData | Tuple[AnnData, List[str]]
Filter an AnnData object to remove groups with low expression across samples.
- Parameters:
adata (AnnData) – AnnData object with group IDs and expression data
min_expression (float, dict, or callable, default=1.0) –
Minimum expression threshold for groups:
float: same threshold applied to all samples/features
dict: {name: threshold} for sample/feature-specific thresholds
callable: function that takes library size and returns threshold (e.g., lambda lib_size: lib_size * 1e-6 for 0.0001% of lib size)
library_size_dependent (bool, default=False) – If True, scale thresholds by library size for each sample
lib_size_normalization (str or None, default=None) –
How to normalize for library size:
’cpm’: Counts Per Million (divide by lib_size/1e6)
None: No normalization
layer (str or None, default=None) – Layer to use for expression values. If None, use .X
group_col (str or tuple, default='Synt_id') – Column name containing group IDs. For obsm/varm, should be a tuple (key, column_index)
group_source (str, default='var') – Location of the group column in AnnData (‘obs’, ‘var’, ‘obsm’, ‘varm’)
mode (str, default='any') –
‘any’: Keep groups that pass threshold in any sample/feature
’all’: Keep groups that pass threshold in all samples/features
’mean’: Keep groups that pass threshold on average across samples/features
return_dropped (bool, default=False) – If True, also return list of dropped group IDs
copy (bool, default=True) – If True, return a copy of the filtered AnnData object. If False, filter the AnnData object in place
filter_axis (int, default=1) –
1: Filter rows (obs) based on group expression across columns (var)
0: Filter columns (var) based on group expression across rows (obs)
verbose (bool, default=True) – Whether to print additional information during filtering
- Returns:
Filtered AnnData object, and optionally a list of dropped group IDs
- Return type:
AnnData or tuple
- Raises:
ValueError – If parameters are invalid or required data is missing
- polyase.get_top_differential_syntelogs(results_df, n=5, sort_by='p_value', fdr_threshold=0.05, ratio_threshold=0.1)
Get the top n syntelogs with differential allelic ratios.
- Parameters:
results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios function
n (int, optional) – Number of top syntelogs to return (default: 5)
sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)
fdr_threshold (float, optional) – Maximum FDR to consider a result significant (default: 0.05)
- Returns:
Filtered dataframe containing only the top n syntelogs
- Return type:
pd.DataFrame
- polyase.load_ase_data(var_obs_file, isoform_counts_dir, tx_to_gene_file, sample_info=None, counts_file=None, fillna=0, calculate_cpm=True, quant_dir=None, n_jobs=4)
Load allele-specific expression data from long-read RNAseq at isoform level.
- Parameters:
var_obs_file (str) – Path to the variant observations file.
isoform_counts_dir (str) – Directory containing the isoform counts files.
tx_to_gene_file (str) – Path to TSV file mapping transcript_id to gene_id.
sample_info (dict, optional) – Dictionary mapping sample IDs to their conditions.
counts_file (str, optional) – Path to additional counts file (salmon merged transcript counts).
fillna (int or float, optional) – Value to fill NA values with.
calculate_cpm (bool, optional) – Whether to calculate CPM (Counts Per Million) from EM counts, by default True.
quant_dir (str, optional) – Directory containing quantification files with EM counts.
n_jobs (int, optional) – Number of parallel jobs for loading samples, by default 4.
- Returns:
AnnData object containing the processed isoform-level data with EM counts and CPM layers. Includes all transcripts from expression matrix and tx2gene mapping, with NaN values for var_obs data when genes are not found in var_obs_file.
- Return type:
anndata.AnnData
- polyase.plot_allele_specific_isoform_structure(results_df, annotation_df, ratio_difference_threshold=0.2, width=1300, height=350, template='simple_white')
Visualize allele-specific isoform structure differences.
Filtering is performed at the syntelog level: if any isoform of a syntelog meets the significance criteria, all isoforms of that syntelog will be plotted.
- Parameters:
results_df (pd.DataFrame) – DataFrame containing allele-specific results with columns: - Synt_id: Syntelog identifier - isoform_id: Isoform/transcript identifier - gene_id: Gene identifier - haplotype: Haplotype identifier - reference_haplotype: Reference haplotype identifier - sample: Sample identifier - isoform_counts: Isoform counts (will be detected dynamically) - isoform_ratio: Isoform usage ratio - ratio_difference: Difference in ratios between haplotypes
annotation_df (polars.DataFrame) – Polars DataFrame containing GTF annotation
ratio_difference_threshold (float, default=0.2) – Minimum ratio difference threshold for identifying significant syntelogs (applied at syntelog level)
width (int, default=1300) – Plot width in pixels
height (int, default=350) – Plot height in pixels
template (str, default="simple_white") – Plotly template for styling
- Returns:
List of generated plotly figures
- Return type:
list
- Raises:
ImportError – If required packages are not available
ValueError – If required columns are missing
- polyase.plot_allelic_ratios(adata, synteny_category: str, sample: str | List[str] = 'all', multimapping_threshold: float = 0.5, ratio_type: str = 'both', bins: int = 30, figsize: Tuple[int, int] = (12, 6), kde: bool = True, save_path: str | None = None)
Plot allelic ratios for transcripts in a specific synteny category. Filters out Synt_ids where all allelic ratios are 0 for a given sample.
- Parameters:
adata (AnnData) – AnnData object containing transcript data
synteny_category (str) – Synteny category to filter for
sample (str or List[str], default="all") – Sample(s) to plot. If list, will plot each sample separately
multimapping_threshold (float, default=0.5) – Threshold for high multimapping ratio
ratio_type (str, default="both") – Type of ratio to plot: “unique”, “em”, or “both”
bins (int, default=30) – Number of bins for the histogram
figsize (tuple, default=(12, 6)) – Figure size (width, height) in inches
kde (bool, default=True) – Whether to show KDE curve on histogram
save_path (str, optional) – Path to save the plot. If None, plot is shown but not saved
- Returns:
The figure object containing the plot(s)
- Return type:
matplotlib.figure.Figure
- polyase.plot_differential_isoform_usage(results_df, annotation_df, fdr_threshold=0.05, ratio_difference_threshold=0.2, width=1300, height=350, template='simple_white')
Visualize differential isoform usage results using transcript structure plots.
Filtering is performed at the gene level: if any isoform of a gene meets the significance criteria, all isoforms of that gene will be plotted.
- Parameters:
results_df (pd.DataFrame) – DataFrame containing differential isoform usage results with columns: - gene_id: Gene identifier - transcript_id: Transcript identifier - functional_annotation: Functional annotation (optional) - sample_name: Sample name - {layer}_cpm: CPM normalized counts for the specified layer - isoform_ratio: Isoform usage ratio - condition: Experimental condition - FDR: False discovery rate (optional for filtering) - ratio_difference: Difference in ratios (optional for filtering)
annotation_df (polars.DataFrame) – Polars DataFrame containing GTF annotation with: - gene_id: Gene identifier Plus standard GTF columns for RNApysoforms
fdr_threshold (float, default=0.05) – FDR threshold for identifying significant genes (applied at gene level)
ratio_difference_threshold (float, default=0.2) – Minimum ratio difference threshold for identifying significant genes (applied at gene level)
width (int, default=1300) – Plot width in pixels
height (int, default=350) – Plot height in pixels
template (str, default="simple_white") – Plotly template for styling
- Returns:
List of generated plotly figures
- Return type:
list
- Raises:
ImportError – If required packages (RNApysoforms) are not available
ValueError – If required columns are missing from input DataFrames
- polyase.plot_top_differential_isoforms(results_df, n=5, figsize=(16, 12), palette=None, jitter=0.2, alpha=0.7, ylim=(0, 1), sort_by='p_value', output_file=None, sig_threshold=0.05, ratio_difference_threshold=0.2, sig_color='red')
Plot the top n genes with differential isoform usage in a grid layout (3 plots per row). Genes with significant differences will have their titles highlighted in red.
- Parameters:
results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios function
n (int, optional) – Number of top genes to plot (default: 5)
figsize (tuple, optional) – Figure size as (width, height) in inches (default: (16, 12))
palette (dict or None, optional) – Color palette for conditions (default: None, uses seaborn defaults)
jitter (float, optional) – Amount of jitter for strip plot (default: 0.2)
alpha (float, optional) – Transparency of points (default: 0.7)
ylim (tuple, optional) – Y-axis limits (default: (0, 1))
sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)
output_file (str, optional) – Path to save the figure (default: None, displays figure but doesn’t save)
sig_threshold (float, optional) – Significance threshold for p-value or FDR (default: 0.05)
ratio_difference_threshold (float, optional) – Ratio difference threshold for significance (default: 0.2)
sig_color (str, optional) – Color for titles of genes with significant differences (default: ‘red’)
- Returns:
fig – The generated figure
- Return type:
matplotlib.figure.Figure
- polyase.plot_top_differential_syntelogs(results_df, n=5, figsize=(16, 12), palette=None, jitter=0.2, alpha=0.7, ylim=None, sort_by='p_value', output_file=None, sig_threshold=0.05, ratio_difference_threshold=0.1, sig_color='red', plot_type='ratios')
Plot the top n syntelogs with differential allelic ratios or CPM values in a grid layout (6 plots per row). Syntelogs with significant differences will have their titles highlighted in red.
- Parameters:
results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios function
n (int, optional) – Number of top syntelogs to plot (default: 5)
figsize (tuple, optional) – Figure size as (width, height) in inches (default: (16, 12))
palette (dict or None, optional) – Color palette for conditions (default: None, uses seaborn defaults)
jitter (float, optional) – Amount of jitter for strip plot (default: 0.2)
alpha (float, optional) – Transparency of points (default: 0.7)
ylim (tuple, optional) – Y-axis limits (default: None, auto-determined based on plot_type)
sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)
output_file (str, optional) – Path to save the figure (default: None, displays figure but doesn’t save)
sig_threshold (float, optional) – Significance threshold for p-value or FDR (default: 0.05)
ratio_difference_threshold (float, optional) – Ratio difference threshold for significance (default: 0.1)
sig_color (str, optional) – Color for titles of syntelogs with significant differences (default: ‘red’)
plot_type (str, optional) – What to plot: ‘ratios’ for allelic ratios or ‘cpm’ for CPM values (default: ‘ratios’)
- Returns:
fig – The generated figure
- Return type:
matplotlib.figure.Figure
- polyase.plot_top_differential_syntelogs_annotated(results_df, n=5, figsize=(16, 12), palette=None, jitter=0.2, alpha=0.7, ylim=None, sort_by='p_value', output_file=None, sig_threshold=0.05, ratio_difference_threshold=0.1, sig_color='red', plot_type='ratios', log2fc_threshold=1.0)
Plot the top n syntelogs with significance brackets between alleles based on test results.
Handles two input formats: - Per-allele format (from test_allelic_ratios_within_conditions): one row per allele,
columns ‘allele’, ‘gene_id’, ‘ratios_rep_{condition}’. Brackets are drawn from allele 0 to each significantly differential allele using the per-allele FDR.
Pairwise format (from test_allelic_ratios_pairwise): one row per allele pair, columns ‘allele_i’, ‘allele_j’, ‘gene_id_i’, ‘gene_id_j’, ‘ratios_rep_allele_{hap}’. Brackets are drawn between each significantly different allele pair using the pair FDR.
- Parameters:
results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios_within_conditions or test_allelic_ratios_pairwise
n (int, optional) – Number of top syntelogs to plot (default: 5)
figsize (tuple, optional) – Figure size as (width, height) in inches (default: (16, 12))
palette (dict or None, optional) – Color palette for conditions (default: None, uses seaborn defaults)
jitter (float, optional) – Amount of jitter for strip plot (default: 0.2)
alpha (float, optional) – Transparency of points (default: 0.7)
ylim (tuple, optional) – Y-axis limits (default: None, auto-determined based on plot_type)
sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)
output_file (str, optional) – Path to save the figure (default: None, displays figure but doesn’t save)
sig_threshold (float, optional) – Significance threshold for FDR or p_value (default: 0.05)
ratio_difference_threshold (float, optional) – Ratio difference threshold for significance (default: 0.1)
sig_color (str, optional) – Color for titles of syntelogs with significant differences (default: ‘red’)
plot_type (str, optional) – What to plot: ‘ratios’ for allelic ratios or ‘cpm’ for CPM values (default: ‘ratios’)
log2fc_threshold (float or None, optional) – For pairwise between-conditions results: annotate brackets with the log2FC difference when abs(log2FC_difference) exceeds this value (default: 1.0). Set to None to disable.
- Returns:
fig – The generated figure
- Return type:
matplotlib.figure.Figure
- polyase.test_allelic_ratios_between_conditions(adata, layer='unique_counts', group_key='condition', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)
Test if allelic ratios change between conditions and store results in AnnData object.
- Parameters:
adata (AnnData) – AnnData object containing expression data
layer (str, optional) – Layer containing count data (default: “unique_counts”)
group_key (str, optional) – Variable column name containing condition information (default: “condition”)
inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)
FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)
ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)
- Returns:
AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘allelic_ratio_test’]: Complete test results as DataFrame - adata.var[‘allelic_ratio_pval’]: P-values for each allele - adata.var[‘allelic_ratio_FDR’]: FDR-corrected p-values for each allele
pd.DataFrame – Results of statistical tests for each syntelog
- polyase.test_allelic_ratios_pairwise(adata, layer='unique_counts', test_condition='control', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)
Test all pairwise combinations of alleles within a syntelog for differential expression.
For each syntelog, every pair of alleles is tested against each other using a beta-binomial likelihood ratio test. This is in contrast to test_allelic_ratios_within_conditions, which tests each allele against a balanced (equal) expectation.
- Parameters:
adata (AnnData) – AnnData object containing expression data
layer (str, optional) – Layer containing count data (default: “unique_counts”)
test_condition (str, optional) – Condition to subset samples for testing, or “all” to use all samples (default: “control”)
inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)
FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)
ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)
- Returns:
If inplace=False, returns modified copy of AnnData; otherwise returns results DataFrame. Results are stored in: - adata.uns[‘allelic_ratio_pairwise_test’]: Complete pairwise test results as DataFrame - adata.var[‘allelic_ratio_pairwise_min_pval’]: Minimum p-value across all pairs per allele - adata.var[‘allelic_ratio_pairwise_min_FDR’]: Corresponding FDR-corrected value
- Return type:
AnnData or pd.DataFrame
- polyase.test_allelic_ratios_within_conditions(adata, layer='unique_counts', test_condition='control', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)
Test if alleles of a gene have unequal expression and store results in AnnData object.
- Parameters:
adata (AnnData) – AnnData object containing expression data
layer (str, optional) – Layer containing count data (default: “unique_counts”)
test_condition (str, optional) – Variable column name containing condition for testing within (default: “control”)
inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)
FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)
ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)
- Returns:
AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘allelic_ratio_test’]: Complete test results as DataFrame - adata.var[‘allelic_ratio_pval’]: P-values for each allele - adata.var[‘allelic_ratio_FDR’]: FDR-corrected p-values for each allele
pd.DataFrame – Results of statistical tests for each syntelog
- polyase.test_differential_isoform_structure(adata, layer='unique_counts', test_condition='control', min_similarity_for_matching=0.9, use_introns=True, exon_weight=0.6, intron_weight=0.4, inplace=True, verbose=False, return_plotting_data=True, ratio_diff_cutoff=0.2, FDR_cutoff=0.05)
Test for DIU between alleles with intelligent major/minor isoform fallback.
Includes isoforms in plotting data even if they have zero expression in reference haplotype
Matches zero-expressed reference isoforms with corresponding isoforms in other haplotypes
- Parameters:
adata (AnnData) – AnnData object containing expression data
layer (str, optional) – Layer containing count data (default: “unique_counts”)
test_condition (str, optional) – Condition to test for differential isoform usage (default: “control”)
min_similarity_for_matching (float, optional) – Minimum similarity score for matching isoforms (default: 0.9)
use_introns (bool, optional) – Whether to include intron structures in similarity calculations (default: True)
exon_weight (float, optional) – Weight for exon similarity in overall similarity score (default: 0.6)
intron_weight (float, optional) – Weight for intron similarity in overall similarity score (default: 0.4)
inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)
verbose (bool, optional) – Whether to print progress messages (default: False)
return_plotting_data (bool, optional) – Whether to return plotting data along with results (default: True)
ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.2)
FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)
- Returns:
If return_plotting_data=True: returns (results_df, plotting_df) If return_plotting_data=False: returns results_df only
- Return type:
tuple or pd.DataFrame
- polyase.test_isoform_DIU_between_conditions(adata, layer='unique_counts', group_key='condition', gene_id_key='gene_id', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.2)
Test if isoform usage ratios change between conditions and store results in AnnData object.
- Parameters:
adata (AnnData) – AnnData object containing expression data
layer (str, optional) – Layer containing count data (default: “unique_counts”)
group_key (str, optional) – Variable column name containing condition information (default: “condition”)
gene_id_key (str, optional) – Variable column name containing gene ID information (default: “gene_id”)
inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)
FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)
ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.2)
- Returns:
AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘isoform_usage_test’]: Complete test results as DataFrame - adata.var[‘isoform_usage_pval’]: P-values for each isoform - adata.var[‘isoform_usage_FDR’]: FDR-corrected p-values for each isoform
pd.DataFrame – Results of statistical tests for each gene
pd.DataFrame – Plotting results table with one row per replicate, condition, isoform ratio, and transcript
- polyase.test_pairwise_allele_response_between_conditions(adata, layer='unique_counts', group_key='condition', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)
Test if pairs of alleles within a syntelog respond differently between conditions.
For each pair of alleles (i, j) within a syntelog, tests whether the ratio count_i / (count_i + count_j) changes between conditions using a beta-binomial likelihood ratio test. A significant result means the two alleles have divergent fold-changes between conditions (e.g., allele1 up-regulated, allele2 down-regulated).
- Parameters:
adata (AnnData) – AnnData object containing expression data
layer (str, optional) – Layer containing count data (default: “unique_counts”)
group_key (str, optional) – Variable column name containing condition information (default: “condition”)
inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)
FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)
ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)
- Returns:
AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘pairwise_allele_response_test’]: Complete test results as DataFrame - adata.var[‘pairwise_response_min_pval’]: Min p-value across all pairs per allele - adata.var[‘pairwise_response_min_fdr’]: Min FDR across all pairs per allele
pd.DataFrame – Results of pairwise statistical tests