Polyase API

Submodules

polyase.allele_utils module

Utilities for calculating and analyzing allelic ratios.

class polyase.allele_utils.AlleleRatioCalculator(adata=None)

Bases: object

Class for calculating and managing allelic ratios in AnnData objects.

calculate_multiple_ratios(counts_layers=None)

Calculate allelic ratios for multiple count layers.

Parameters:

counts_layerslist of str, optional

List of layer names to calculate ratios for. If None, calculates for all layers with ‘counts’ in their name.

Returns:

adataAnnData

Updated AnnData object with allelic ratio layers added

calculate_ratios(counts_layer='unique_counts', output_suffix=None)

Calculate allelic ratios for each transcript grouped by Synt_id, computing ratios independently for each column (gene/feature).

Parameters:

counts_layerstr, optional (default: ‘unique_counts’)

Layer containing counts to use for ratio calculations

output_suffixstr, optional

Custom suffix for output layer name. If None, uses counts_layer name

Returns:

adataAnnData

Updated AnnData object with allelic ratio layer added

get_ratios_for_synt_id(synt_id, ratio_layer='allelic_ratio_unique_counts')

Get allelic ratios for a specific Synt_id.

Parameters:

synt_idint or str

The Synt_id to get ratios for

ratio_layerstr, optional

Name of the layer containing the ratio data

Returns:

ratiosnumpy array

Array of ratio values for the specified Synt_id

set_data(adata)

Set or update the AnnData object.

Parameters:

adataAnnData

AnnData object containing transcript data

polyase.allele_utils.calculate_allelic_ratios(adata, counts_layer='unique_counts')

Calculate allelic ratios for each transcript grouped by Synt_id.

Parameters:

adataAnnData

AnnData object containing transcript data

counts_layerstr, optional (default: ‘unique_counts’)

Layer containing counts to use for ratio calculations

Returns:

adataAnnData

Updated AnnData object with ‘allelic_ratio’ layer added

polyase.ase_data_loader module

polyase.ase_data_loader.aggregate_transcripts_to_genes(adata_tx)

Aggregate transcript-level AnnData to gene-level AnnData.

Parameters:

adata_tx (anndata.AnnData) – Transcript-level AnnData object with ‘gene_id’ in adata_tx.var. Must contain layers: ‘unique_counts’, ‘ambiguous_counts’, ‘em_counts’, ‘unique_cpm’,

Returns:

Gene-level AnnData object with aggregated counts and CPM layers.

Return type:

anndata.AnnData

polyase.ase_data_loader.load_ase_data(var_obs_file, isoform_counts_dir, tx_to_gene_file, sample_info=None, counts_file=None, fillna=0, calculate_cpm=True, quant_dir=None, n_jobs=4)

Load allele-specific expression data from long-read RNAseq at isoform level.

Parameters:
  • var_obs_file (str) – Path to the variant observations file.

  • isoform_counts_dir (str) – Directory containing the isoform counts files.

  • tx_to_gene_file (str) – Path to TSV file mapping transcript_id to gene_id.

  • sample_info (dict, optional) – Dictionary mapping sample IDs to their conditions.

  • counts_file (str, optional) – Path to additional counts file (salmon merged transcript counts).

  • fillna (int or float, optional) – Value to fill NA values with.

  • calculate_cpm (bool, optional) – Whether to calculate CPM (Counts Per Million) from EM counts, by default True.

  • quant_dir (str, optional) – Directory containing quantification files with EM counts.

  • n_jobs (int, optional) – Number of parallel jobs for loading samples, by default 4.

Returns:

AnnData object containing the processed isoform-level data with EM counts and CPM layers. Includes all transcripts from expression matrix and tx2gene mapping, with NaN values for var_obs data when genes are not found in var_obs_file.

Return type:

anndata.AnnData

polyase.filter module

AnnData Filtering Module

This module provides functions for filtering AnnData objects based on group expression patterns. It supports filtering by group expression levels with various normalization methods and thresholds.

polyase.filter.filter_low_expressed_genes(adata: AnnData, min_expression: float | Dict[str, float] | Callable[[float], float] = 1.0, library_size_dependent: bool = False, lib_size_normalization: str | None = None, layer: str | None = None, group_col: str | Tuple[str, int] = 'Synt_id', group_source: str = 'var', mode: str = 'any', return_dropped: bool = False, copy: bool = True, filter_axis: Literal[0, 1] = 1, verbose: bool = True) AnnData | Tuple[AnnData, List[str]]

Filter an AnnData object to remove groups with low expression across samples.

Parameters:
  • adata (AnnData) – AnnData object with group IDs and expression data

  • min_expression (float, dict, or callable, default=1.0) –

    Minimum expression threshold for groups:

    • float: same threshold applied to all samples/features

    • dict: {name: threshold} for sample/feature-specific thresholds

    • callable: function that takes library size and returns threshold (e.g., lambda lib_size: lib_size * 1e-6 for 0.0001% of lib size)

  • library_size_dependent (bool, default=False) – If True, scale thresholds by library size for each sample

  • lib_size_normalization (str or None, default=None) –

    How to normalize for library size:

    • ’cpm’: Counts Per Million (divide by lib_size/1e6)

    • None: No normalization

  • layer (str or None, default=None) – Layer to use for expression values. If None, use .X

  • group_col (str or tuple, default='Synt_id') – Column name containing group IDs. For obsm/varm, should be a tuple (key, column_index)

  • group_source (str, default='var') – Location of the group column in AnnData (‘obs’, ‘var’, ‘obsm’, ‘varm’)

  • mode (str, default='any') –

    • ‘any’: Keep groups that pass threshold in any sample/feature

    • ’all’: Keep groups that pass threshold in all samples/features

    • ’mean’: Keep groups that pass threshold on average across samples/features

  • return_dropped (bool, default=False) – If True, also return list of dropped group IDs

  • copy (bool, default=True) – If True, return a copy of the filtered AnnData object. If False, filter the AnnData object in place

  • filter_axis (int, default=1) –

    • 1: Filter rows (obs) based on group expression across columns (var)

    • 0: Filter columns (var) based on group expression across rows (obs)

  • verbose (bool, default=True) – Whether to print additional information during filtering

Returns:

Filtered AnnData object, and optionally a list of dropped group IDs

Return type:

AnnData or tuple

Raises:

ValueError – If parameters are invalid or required data is missing

polyase.multimapping module

Utilities for calculating multimapping ratios per syntelog.

class polyase.multimapping.MultimappingRatioCalculator(adata=None)

Bases: object

Class for calculating multimapping ratios in AnnData objects.

calculate_ratios(unique_layer='unique_counts', multi_layer='ambiguous_counts')

Calculate multimapping ratios for each transcript grouped by Synt_id.

Parameters:

unique_layerstr, optional (default: ‘unique_counts’)

Layer containing unique counts to use for ratio calculations

multi_layerstr, optional (default: ‘ambiguous_counts’)

Layer containing multimapping counts to use for ratio calculations

Returns:

adataAnnData

Updated AnnData object with multimapping ratio layer added

get_ratios_for_synt_id(synt_id, multi_layer='multimapping_ratio')

Get multimapping ratios for a specific Synt_id.

Parameters:

synt_idint or str

The Synt_id to get ratios for

multi_layerstr, optional

Name of the layer containing the ratio data

Returns:

ratiosnumpy array

Array of multimapping values for the specified Synt_id

set_data(adata)

Set or update the AnnData object.

Parameters:

adataAnnData

AnnData object containing transcript data

polyase.multimapping.calculate_multi_ratios(adata, unique_layer='unique_counts', multi_layer='ambiguous_counts')

Calculate multimapping ratios for each transcript grouped by Synt_id.

Parameters:

adataAnnData

AnnData object containing transcript data

unique_layerstr, optional (default: ‘unique_counts’)

Layer containing counts to use for ratio calculations

multi_layerstr, optional (default: ‘ambiguous_counts’)

Layer containing counts to use for ratio calculations

Returns:

adataAnnData

Updated AnnData object with ‘multimapping_ratio’ layer added

polyase.multimapping.calculate_per_allele_ratios(adata, unique_layer='unique_counts', multi_layer='ambiguous_counts', gene_grouping_column='gene_id', inplace=True, count_scaling=True, min_counts_threshold=10, scaling_method='weighted_average')

Calculate multimapping ratios for each individual allele/transcript. For transcripts from the same gene, assigns ratios based on count-weighted calculations.

Parameters:

adataAnnData

AnnData object containing transcript data

unique_layerstr, optional (default: ‘unique_counts’)

Layer containing unique counts to use for ratio calculations

multi_layerstr, optional (default: ‘ambiguous_counts’)

Layer containing multimapping counts to use for ratio calculations

gene_grouping_columnstr, optional (default: ‘gene_id’)

Column in adata.var to group transcripts by (e.g., ‘Synt_id’, ‘gene_id’)

inplacebool, optional (default: True)

Whether to modify the input AnnData object or return a copy

count_scalingbool, optional (default: True)

Whether to scale multimapping ratios by count abundance

min_counts_thresholdint, optional (default: 10)

Minimum total counts threshold - transcripts below this get reduced weight

scaling_methodstr, optional (default: ‘weighted_average’)

Method for combining ratios within genes: - ‘weighted_average’: Weight by total counts - ‘max_weighted’: Take max ratio but weight by counts - ‘abundance_filtered’: Only consider transcripts above threshold

Returns:

adataAnnData

Updated AnnData object with per-allele multimapping ratio added to var

polyase.plotting module

polyase.plotting.convert_pvalue_to_asterisks(pvalue)

Convert p-values to significance asterisks notation.

polyase.plotting.plot_allele_specific_isoform_structure(results_df, annotation_df, ratio_difference_threshold=0.2, width=1300, height=350, template='simple_white')

Visualize allele-specific isoform structure differences.

Filtering is performed at the syntelog level: if any isoform of a syntelog meets the significance criteria, all isoforms of that syntelog will be plotted.

Parameters:
  • results_df (pd.DataFrame) – DataFrame containing allele-specific results with columns: - Synt_id: Syntelog identifier - isoform_id: Isoform/transcript identifier - gene_id: Gene identifier - haplotype: Haplotype identifier - reference_haplotype: Reference haplotype identifier - sample: Sample identifier - isoform_counts: Isoform counts (will be detected dynamically) - isoform_ratio: Isoform usage ratio - ratio_difference: Difference in ratios between haplotypes

  • annotation_df (polars.DataFrame) – Polars DataFrame containing GTF annotation

  • ratio_difference_threshold (float, default=0.2) – Minimum ratio difference threshold for identifying significant syntelogs (applied at syntelog level)

  • width (int, default=1300) – Plot width in pixels

  • height (int, default=350) – Plot height in pixels

  • template (str, default="simple_white") – Plotly template for styling

Returns:

List of generated plotly figures

Return type:

list

Raises:
  • ImportError – If required packages are not available

  • ValueError – If required columns are missing

polyase.plotting.plot_allelic_ratios(adata, synteny_category: str, sample: str | List[str] = 'all', multimapping_threshold: float = 0.5, ratio_type: str = 'both', bins: int = 30, figsize: Tuple[int, int] = (12, 6), kde: bool = True, save_path: str | None = None)

Plot allelic ratios for transcripts in a specific synteny category. Filters out Synt_ids where all allelic ratios are 0 for a given sample.

Parameters:
  • adata (AnnData) – AnnData object containing transcript data

  • synteny_category (str) – Synteny category to filter for

  • sample (str or List[str], default="all") – Sample(s) to plot. If list, will plot each sample separately

  • multimapping_threshold (float, default=0.5) – Threshold for high multimapping ratio

  • ratio_type (str, default="both") – Type of ratio to plot: “unique”, “em”, or “both”

  • bins (int, default=30) – Number of bins for the histogram

  • figsize (tuple, default=(12, 6)) – Figure size (width, height) in inches

  • kde (bool, default=True) – Whether to show KDE curve on histogram

  • save_path (str, optional) – Path to save the plot. If None, plot is shown but not saved

Returns:

The figure object containing the plot(s)

Return type:

matplotlib.figure.Figure

polyase.plotting.plot_differential_isoform_usage(results_df, annotation_df, fdr_threshold=0.05, ratio_difference_threshold=0.2, width=1300, height=350, template='simple_white')

Visualize differential isoform usage results using transcript structure plots.

Filtering is performed at the gene level: if any isoform of a gene meets the significance criteria, all isoforms of that gene will be plotted.

Parameters:
  • results_df (pd.DataFrame) – DataFrame containing differential isoform usage results with columns: - gene_id: Gene identifier - transcript_id: Transcript identifier - functional_annotation: Functional annotation (optional) - sample_name: Sample name - {layer}_cpm: CPM normalized counts for the specified layer - isoform_ratio: Isoform usage ratio - condition: Experimental condition - FDR: False discovery rate (optional for filtering) - ratio_difference: Difference in ratios (optional for filtering)

  • annotation_df (polars.DataFrame) – Polars DataFrame containing GTF annotation with: - gene_id: Gene identifier Plus standard GTF columns for RNApysoforms

  • fdr_threshold (float, default=0.05) – FDR threshold for identifying significant genes (applied at gene level)

  • ratio_difference_threshold (float, default=0.2) – Minimum ratio difference threshold for identifying significant genes (applied at gene level)

  • width (int, default=1300) – Plot width in pixels

  • height (int, default=350) – Plot height in pixels

  • template (str, default="simple_white") – Plotly template for styling

Returns:

List of generated plotly figures

Return type:

list

Raises:
  • ImportError – If required packages (RNApysoforms) are not available

  • ValueError – If required columns are missing from input DataFrames

polyase.plotting.plot_top_differential_isoforms(results_df, n=5, figsize=(16, 12), palette=None, jitter=0.2, alpha=0.7, ylim=(0, 1), sort_by='p_value', output_file=None, sig_threshold=0.05, ratio_difference_threshold=0.2, sig_color='red')

Plot the top n genes with differential isoform usage in a grid layout (3 plots per row). Genes with significant differences will have their titles highlighted in red.

Parameters:
  • results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios function

  • n (int, optional) – Number of top genes to plot (default: 5)

  • figsize (tuple, optional) – Figure size as (width, height) in inches (default: (16, 12))

  • palette (dict or None, optional) – Color palette for conditions (default: None, uses seaborn defaults)

  • jitter (float, optional) – Amount of jitter for strip plot (default: 0.2)

  • alpha (float, optional) – Transparency of points (default: 0.7)

  • ylim (tuple, optional) – Y-axis limits (default: (0, 1))

  • sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)

  • output_file (str, optional) – Path to save the figure (default: None, displays figure but doesn’t save)

  • sig_threshold (float, optional) – Significance threshold for p-value or FDR (default: 0.05)

  • ratio_difference_threshold (float, optional) – Ratio difference threshold for significance (default: 0.2)

  • sig_color (str, optional) – Color for titles of genes with significant differences (default: ‘red’)

Returns:

fig – The generated figure

Return type:

matplotlib.figure.Figure

polyase.plotting.plot_top_differential_syntelogs(results_df, n=5, figsize=(16, 12), palette=None, jitter=0.2, alpha=0.7, ylim=None, sort_by='p_value', output_file=None, sig_threshold=0.05, ratio_difference_threshold=0.1, sig_color='red', plot_type='ratios')

Plot the top n syntelogs with differential allelic ratios or CPM values in a grid layout (6 plots per row). Syntelogs with significant differences will have their titles highlighted in red.

Parameters:
  • results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios function

  • n (int, optional) – Number of top syntelogs to plot (default: 5)

  • figsize (tuple, optional) – Figure size as (width, height) in inches (default: (16, 12))

  • palette (dict or None, optional) – Color palette for conditions (default: None, uses seaborn defaults)

  • jitter (float, optional) – Amount of jitter for strip plot (default: 0.2)

  • alpha (float, optional) – Transparency of points (default: 0.7)

  • ylim (tuple, optional) – Y-axis limits (default: None, auto-determined based on plot_type)

  • sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)

  • output_file (str, optional) – Path to save the figure (default: None, displays figure but doesn’t save)

  • sig_threshold (float, optional) – Significance threshold for p-value or FDR (default: 0.05)

  • ratio_difference_threshold (float, optional) – Ratio difference threshold for significance (default: 0.1)

  • sig_color (str, optional) – Color for titles of syntelogs with significant differences (default: ‘red’)

  • plot_type (str, optional) – What to plot: ‘ratios’ for allelic ratios or ‘cpm’ for CPM values (default: ‘ratios’)

Returns:

fig – The generated figure

Return type:

matplotlib.figure.Figure

polyase.plotting.plot_top_differential_syntelogs_annotated(results_df, n=5, figsize=(16, 12), palette=None, jitter=0.2, alpha=0.7, ylim=None, sort_by='p_value', output_file=None, sig_threshold=0.05, ratio_difference_threshold=0.1, sig_color='red', plot_type='ratios', log2fc_threshold=1.0)

Plot the top n syntelogs with significance brackets between alleles based on test results.

Handles two input formats: - Per-allele format (from test_allelic_ratios_within_conditions): one row per allele,

columns ‘allele’, ‘gene_id’, ‘ratios_rep_{condition}’. Brackets are drawn from allele 0 to each significantly differential allele using the per-allele FDR.

  • Pairwise format (from test_allelic_ratios_pairwise): one row per allele pair, columns ‘allele_i’, ‘allele_j’, ‘gene_id_i’, ‘gene_id_j’, ‘ratios_rep_allele_{hap}’. Brackets are drawn between each significantly different allele pair using the pair FDR.

Parameters:
  • results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios_within_conditions or test_allelic_ratios_pairwise

  • n (int, optional) – Number of top syntelogs to plot (default: 5)

  • figsize (tuple, optional) – Figure size as (width, height) in inches (default: (16, 12))

  • palette (dict or None, optional) – Color palette for conditions (default: None, uses seaborn defaults)

  • jitter (float, optional) – Amount of jitter for strip plot (default: 0.2)

  • alpha (float, optional) – Transparency of points (default: 0.7)

  • ylim (tuple, optional) – Y-axis limits (default: None, auto-determined based on plot_type)

  • sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)

  • output_file (str, optional) – Path to save the figure (default: None, displays figure but doesn’t save)

  • sig_threshold (float, optional) – Significance threshold for FDR or p_value (default: 0.05)

  • ratio_difference_threshold (float, optional) – Ratio difference threshold for significance (default: 0.1)

  • sig_color (str, optional) – Color for titles of syntelogs with significant differences (default: ‘red’)

  • plot_type (str, optional) – What to plot: ‘ratios’ for allelic ratios or ‘cpm’ for CPM values (default: ‘ratios’)

  • log2fc_threshold (float or None, optional) – For pairwise between-conditions results: annotate brackets with the log2FC difference when abs(log2FC_difference) exceeds this value (default: 1.0). Set to None to disable.

Returns:

fig – The generated figure

Return type:

matplotlib.figure.Figure

polyase.stats module

stats.py

The stats module of polyase project

polyase.stats.get_top_differential_syntelogs(results_df, n=5, sort_by='p_value', fdr_threshold=0.05, ratio_threshold=0.1)

Get the top n syntelogs with differential allelic ratios.

Parameters:
  • results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios function

  • n (int, optional) – Number of top syntelogs to return (default: 5)

  • sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)

  • fdr_threshold (float, optional) – Maximum FDR to consider a result significant (default: 0.05)

Returns:

Filtered dataframe containing only the top n syntelogs

Return type:

pd.DataFrame

polyase.stats.test_allelic_ratios_between_conditions(adata, layer='unique_counts', group_key='condition', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)

Test if allelic ratios change between conditions and store results in AnnData object.

Parameters:
  • adata (AnnData) – AnnData object containing expression data

  • layer (str, optional) – Layer containing count data (default: “unique_counts”)

  • group_key (str, optional) – Variable column name containing condition information (default: “condition”)

  • inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)

  • FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)

  • ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)

Returns:

  • AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘allelic_ratio_test’]: Complete test results as DataFrame - adata.var[‘allelic_ratio_pval’]: P-values for each allele - adata.var[‘allelic_ratio_FDR’]: FDR-corrected p-values for each allele

  • pd.DataFrame – Results of statistical tests for each syntelog

polyase.stats.test_allelic_ratios_pairwise(adata, layer='unique_counts', test_condition='control', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)

Test all pairwise combinations of alleles within a syntelog for differential expression.

For each syntelog, every pair of alleles is tested against each other using a beta-binomial likelihood ratio test. This is in contrast to test_allelic_ratios_within_conditions, which tests each allele against a balanced (equal) expectation.

Parameters:
  • adata (AnnData) – AnnData object containing expression data

  • layer (str, optional) – Layer containing count data (default: “unique_counts”)

  • test_condition (str, optional) – Condition to subset samples for testing, or “all” to use all samples (default: “control”)

  • inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)

  • FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)

  • ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)

Returns:

If inplace=False, returns modified copy of AnnData; otherwise returns results DataFrame. Results are stored in: - adata.uns[‘allelic_ratio_pairwise_test’]: Complete pairwise test results as DataFrame - adata.var[‘allelic_ratio_pairwise_min_pval’]: Minimum p-value across all pairs per allele - adata.var[‘allelic_ratio_pairwise_min_FDR’]: Corresponding FDR-corrected value

Return type:

AnnData or pd.DataFrame

polyase.stats.test_allelic_ratios_within_conditions(adata, layer='unique_counts', test_condition='control', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)

Test if alleles of a gene have unequal expression and store results in AnnData object.

Parameters:
  • adata (AnnData) – AnnData object containing expression data

  • layer (str, optional) – Layer containing count data (default: “unique_counts”)

  • test_condition (str, optional) – Variable column name containing condition for testing within (default: “control”)

  • inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)

  • FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)

  • ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)

Returns:

  • AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘allelic_ratio_test’]: Complete test results as DataFrame - adata.var[‘allelic_ratio_pval’]: P-values for each allele - adata.var[‘allelic_ratio_FDR’]: FDR-corrected p-values for each allele

  • pd.DataFrame – Results of statistical tests for each syntelog

polyase.stats.test_differential_isoform_structure(adata, layer='unique_counts', test_condition='control', min_similarity_for_matching=0.9, use_introns=True, exon_weight=0.6, intron_weight=0.4, inplace=True, verbose=False, return_plotting_data=True, ratio_diff_cutoff=0.2, FDR_cutoff=0.05)

Test for DIU between alleles with intelligent major/minor isoform fallback.

  • Includes isoforms in plotting data even if they have zero expression in reference haplotype

  • Matches zero-expressed reference isoforms with corresponding isoforms in other haplotypes

Parameters:
  • adata (AnnData) – AnnData object containing expression data

  • layer (str, optional) – Layer containing count data (default: “unique_counts”)

  • test_condition (str, optional) – Condition to test for differential isoform usage (default: “control”)

  • min_similarity_for_matching (float, optional) – Minimum similarity score for matching isoforms (default: 0.9)

  • use_introns (bool, optional) – Whether to include intron structures in similarity calculations (default: True)

  • exon_weight (float, optional) – Weight for exon similarity in overall similarity score (default: 0.6)

  • intron_weight (float, optional) – Weight for intron similarity in overall similarity score (default: 0.4)

  • inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)

  • verbose (bool, optional) – Whether to print progress messages (default: False)

  • return_plotting_data (bool, optional) – Whether to return plotting data along with results (default: True)

  • ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.2)

  • FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)

Returns:

If return_plotting_data=True: returns (results_df, plotting_df) If return_plotting_data=False: returns results_df only

Return type:

tuple or pd.DataFrame

polyase.stats.test_isoform_DIU_between_conditions(adata, layer='unique_counts', group_key='condition', gene_id_key='gene_id', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.2)

Test if isoform usage ratios change between conditions and store results in AnnData object.

Parameters:
  • adata (AnnData) – AnnData object containing expression data

  • layer (str, optional) – Layer containing count data (default: “unique_counts”)

  • group_key (str, optional) – Variable column name containing condition information (default: “condition”)

  • gene_id_key (str, optional) – Variable column name containing gene ID information (default: “gene_id”)

  • inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)

  • FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)

  • ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.2)

Returns:

  • AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘isoform_usage_test’]: Complete test results as DataFrame - adata.var[‘isoform_usage_pval’]: P-values for each isoform - adata.var[‘isoform_usage_FDR’]: FDR-corrected p-values for each isoform

  • pd.DataFrame – Results of statistical tests for each gene

  • pd.DataFrame – Plotting results table with one row per replicate, condition, isoform ratio, and transcript

polyase.stats.test_pairwise_allele_response_between_conditions(adata, layer='unique_counts', group_key='condition', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)

Test if pairs of alleles within a syntelog respond differently between conditions.

For each pair of alleles (i, j) within a syntelog, tests whether the ratio count_i / (count_i + count_j) changes between conditions using a beta-binomial likelihood ratio test. A significant result means the two alleles have divergent fold-changes between conditions (e.g., allele1 up-regulated, allele2 down-regulated).

Parameters:
  • adata (AnnData) – AnnData object containing expression data

  • layer (str, optional) – Layer containing count data (default: “unique_counts”)

  • group_key (str, optional) – Variable column name containing condition information (default: “condition”)

  • inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)

  • FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)

  • ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)

Returns:

  • AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘pairwise_allele_response_test’]: Complete test results as DataFrame - adata.var[‘pairwise_response_min_pval’]: Min p-value across all pairs per allele - adata.var[‘pairwise_response_min_fdr’]: Min FDR across all pairs per allele

  • pd.DataFrame – Results of pairwise statistical tests

polyase.structure module

Module for adding exon and intron structure information to AnnData objects.

polyase.structure.add_exon_structure(adata: AnnData, gtf_file: str | None = None, gtf_df: DataFrame | None = None, transcript_id_col: str = 'transcript_id', include_introns: bool = True, inplace: bool = True, verbose: bool = True) AnnData | None

Add exon and intron structure information to AnnData.var from GTF/GFF data.

Parameters:
  • adata (AnnData) – AnnData object containing transcript data

  • gtf_file (str, optional) – Path to GTF/GFF file. Either gtf_file or gtf_df must be provided.

  • gtf_df (pd.DataFrame, optional) – DataFrame with GTF/GFF data. Either gtf_file or gtf_df must be provided.

  • transcript_id_col (str, default='transcript_id') – Column name in GTF data containing transcript identifiers

  • include_introns (bool, default=True) – Whether to calculate and include intron structure information

  • inplace (bool, default=True) – If True, modify the AnnData object in place. If False, return a copy.

  • verbose (bool, default=True) – Whether to print progress information

Returns:

If inplace=False, returns modified copy of AnnData object. If inplace=True, returns None and modifies the input object.

Return type:

AnnData or None

Raises:

ValueError – If neither gtf_file nor gtf_df is provided, or if required columns are missing

polyase.structure.add_structure_from_gtf(adata: AnnData, gtf_file: str, include_introns: bool = True, inplace: bool = True, verbose: bool = True) AnnData | None

Convenience function to add exon and intron structure from GTF file.

Parameters:
  • adata (AnnData) – AnnData object containing transcript data

  • gtf_file (str) – Path to GTF/GFF file

  • include_introns (bool, default=True) – Whether to calculate and include intron structures

  • inplace (bool, default=True) – If True, modify the AnnData object in place

  • verbose (bool, default=True) – Whether to print progress information

Returns:

Modified AnnData object if inplace=False, otherwise None

Return type:

AnnData or None

polyase.structure.calculate_combined_structure_similarity(exon_structure1: List[int], exon_structure2: List[int], intron_structure1: List[int], intron_structure2: List[int], exon_weight: float = 0.6, intron_weight: float = 0.4) float

Calculate combined similarity using both exon and intron structures.

Parameters:
  • exon_structure1 (List[int]) – First transcript’s exon structure

  • exon_structure2 (List[int]) – Second transcript’s exon structure

  • intron_structure1 (List[int]) – First transcript’s intron structure

  • intron_structure2 (List[int]) – Second transcript’s intron structure

  • exon_weight (float, default=0.6) – Weight for exon similarity (must sum with intron_weight to 1.0)

  • intron_weight (float, default=0.4) – Weight for intron similarity (must sum with exon_weight to 1.0)

Returns:

Combined similarity score between 0 and 1

Return type:

float

polyase.structure.calculate_structure_similarity(structure1: List[int], structure2: List[int], mode: str = 'exon') float

Calculate similarity between two transcript structures.

Parameters:
  • structure1 (List[int]) – First structure as list of exon/intron lengths

  • structure2 (List[int]) – Second structure as list of exon/intron lengths

  • mode (str, default='exon') – Type of structure (‘exon’ or ‘intron’)

Returns:

Similarity score between 0 and 1 (Jaccard index)

Return type:

float

Module contents

PolyASE: A package for analyzing allele-specific expression in polyploid plants.

This package provides tools for calculating and analyzing allelic ratios, visualizing allele-specific expression patterns, and statistical testing of allelic imbalance in polyploid plant genomes.

class polyase.AlleleRatioCalculator(adata=None)

Bases: object

Class for calculating and managing allelic ratios in AnnData objects.

calculate_multiple_ratios(counts_layers=None)

Calculate allelic ratios for multiple count layers.

Parameters:

counts_layerslist of str, optional

List of layer names to calculate ratios for. If None, calculates for all layers with ‘counts’ in their name.

Returns:

adataAnnData

Updated AnnData object with allelic ratio layers added

calculate_ratios(counts_layer='unique_counts', output_suffix=None)

Calculate allelic ratios for each transcript grouped by Synt_id, computing ratios independently for each column (gene/feature).

Parameters:

counts_layerstr, optional (default: ‘unique_counts’)

Layer containing counts to use for ratio calculations

output_suffixstr, optional

Custom suffix for output layer name. If None, uses counts_layer name

Returns:

adataAnnData

Updated AnnData object with allelic ratio layer added

get_ratios_for_synt_id(synt_id, ratio_layer='allelic_ratio_unique_counts')

Get allelic ratios for a specific Synt_id.

Parameters:

synt_idint or str

The Synt_id to get ratios for

ratio_layerstr, optional

Name of the layer containing the ratio data

Returns:

ratiosnumpy array

Array of ratio values for the specified Synt_id

set_data(adata)

Set or update the AnnData object.

Parameters:

adataAnnData

AnnData object containing transcript data

class polyase.MultimappingRatioCalculator(adata=None)

Bases: object

Class for calculating multimapping ratios in AnnData objects.

calculate_ratios(unique_layer='unique_counts', multi_layer='ambiguous_counts')

Calculate multimapping ratios for each transcript grouped by Synt_id.

Parameters:

unique_layerstr, optional (default: ‘unique_counts’)

Layer containing unique counts to use for ratio calculations

multi_layerstr, optional (default: ‘ambiguous_counts’)

Layer containing multimapping counts to use for ratio calculations

Returns:

adataAnnData

Updated AnnData object with multimapping ratio layer added

get_ratios_for_synt_id(synt_id, multi_layer='multimapping_ratio')

Get multimapping ratios for a specific Synt_id.

Parameters:

synt_idint or str

The Synt_id to get ratios for

multi_layerstr, optional

Name of the layer containing the ratio data

Returns:

ratiosnumpy array

Array of multimapping values for the specified Synt_id

set_data(adata)

Set or update the AnnData object.

Parameters:

adataAnnData

AnnData object containing transcript data

polyase.add_exon_structure(adata: AnnData, gtf_file: str | None = None, gtf_df: DataFrame | None = None, transcript_id_col: str = 'transcript_id', include_introns: bool = True, inplace: bool = True, verbose: bool = True) AnnData | None

Add exon and intron structure information to AnnData.var from GTF/GFF data.

Parameters:
  • adata (AnnData) – AnnData object containing transcript data

  • gtf_file (str, optional) – Path to GTF/GFF file. Either gtf_file or gtf_df must be provided.

  • gtf_df (pd.DataFrame, optional) – DataFrame with GTF/GFF data. Either gtf_file or gtf_df must be provided.

  • transcript_id_col (str, default='transcript_id') – Column name in GTF data containing transcript identifiers

  • include_introns (bool, default=True) – Whether to calculate and include intron structure information

  • inplace (bool, default=True) – If True, modify the AnnData object in place. If False, return a copy.

  • verbose (bool, default=True) – Whether to print progress information

Returns:

If inplace=False, returns modified copy of AnnData object. If inplace=True, returns None and modifies the input object.

Return type:

AnnData or None

Raises:

ValueError – If neither gtf_file nor gtf_df is provided, or if required columns are missing

polyase.add_structure_from_gtf(adata: AnnData, gtf_file: str, include_introns: bool = True, inplace: bool = True, verbose: bool = True) AnnData | None

Convenience function to add exon and intron structure from GTF file.

Parameters:
  • adata (AnnData) – AnnData object containing transcript data

  • gtf_file (str) – Path to GTF/GFF file

  • include_introns (bool, default=True) – Whether to calculate and include intron structures

  • inplace (bool, default=True) – If True, modify the AnnData object in place

  • verbose (bool, default=True) – Whether to print progress information

Returns:

Modified AnnData object if inplace=False, otherwise None

Return type:

AnnData or None

polyase.aggregate_transcripts_to_genes(adata_tx)

Aggregate transcript-level AnnData to gene-level AnnData.

Parameters:

adata_tx (anndata.AnnData) – Transcript-level AnnData object with ‘gene_id’ in adata_tx.var. Must contain layers: ‘unique_counts’, ‘ambiguous_counts’, ‘em_counts’, ‘unique_cpm’,

Returns:

Gene-level AnnData object with aggregated counts and CPM layers.

Return type:

anndata.AnnData

polyase.calculate_allelic_ratios(adata, counts_layer='unique_counts')

Calculate allelic ratios for each transcript grouped by Synt_id.

Parameters:

adataAnnData

AnnData object containing transcript data

counts_layerstr, optional (default: ‘unique_counts’)

Layer containing counts to use for ratio calculations

Returns:

adataAnnData

Updated AnnData object with ‘allelic_ratio’ layer added

polyase.calculate_multi_ratios(adata, unique_layer='unique_counts', multi_layer='ambiguous_counts')

Calculate multimapping ratios for each transcript grouped by Synt_id.

Parameters:

adataAnnData

AnnData object containing transcript data

unique_layerstr, optional (default: ‘unique_counts’)

Layer containing counts to use for ratio calculations

multi_layerstr, optional (default: ‘ambiguous_counts’)

Layer containing counts to use for ratio calculations

Returns:

adataAnnData

Updated AnnData object with ‘multimapping_ratio’ layer added

polyase.calculate_per_allele_ratios(adata, unique_layer='unique_counts', multi_layer='ambiguous_counts', gene_grouping_column='gene_id', inplace=True, count_scaling=True, min_counts_threshold=10, scaling_method='weighted_average')

Calculate multimapping ratios for each individual allele/transcript. For transcripts from the same gene, assigns ratios based on count-weighted calculations.

Parameters:

adataAnnData

AnnData object containing transcript data

unique_layerstr, optional (default: ‘unique_counts’)

Layer containing unique counts to use for ratio calculations

multi_layerstr, optional (default: ‘ambiguous_counts’)

Layer containing multimapping counts to use for ratio calculations

gene_grouping_columnstr, optional (default: ‘gene_id’)

Column in adata.var to group transcripts by (e.g., ‘Synt_id’, ‘gene_id’)

inplacebool, optional (default: True)

Whether to modify the input AnnData object or return a copy

count_scalingbool, optional (default: True)

Whether to scale multimapping ratios by count abundance

min_counts_thresholdint, optional (default: 10)

Minimum total counts threshold - transcripts below this get reduced weight

scaling_methodstr, optional (default: ‘weighted_average’)

Method for combining ratios within genes: - ‘weighted_average’: Weight by total counts - ‘max_weighted’: Take max ratio but weight by counts - ‘abundance_filtered’: Only consider transcripts above threshold

Returns:

adataAnnData

Updated AnnData object with per-allele multimapping ratio added to var

polyase.filter_low_expressed_genes(adata: AnnData, min_expression: float | Dict[str, float] | Callable[[float], float] = 1.0, library_size_dependent: bool = False, lib_size_normalization: str | None = None, layer: str | None = None, group_col: str | Tuple[str, int] = 'Synt_id', group_source: str = 'var', mode: str = 'any', return_dropped: bool = False, copy: bool = True, filter_axis: Literal[0, 1] = 1, verbose: bool = True) AnnData | Tuple[AnnData, List[str]]

Filter an AnnData object to remove groups with low expression across samples.

Parameters:
  • adata (AnnData) – AnnData object with group IDs and expression data

  • min_expression (float, dict, or callable, default=1.0) –

    Minimum expression threshold for groups:

    • float: same threshold applied to all samples/features

    • dict: {name: threshold} for sample/feature-specific thresholds

    • callable: function that takes library size and returns threshold (e.g., lambda lib_size: lib_size * 1e-6 for 0.0001% of lib size)

  • library_size_dependent (bool, default=False) – If True, scale thresholds by library size for each sample

  • lib_size_normalization (str or None, default=None) –

    How to normalize for library size:

    • ’cpm’: Counts Per Million (divide by lib_size/1e6)

    • None: No normalization

  • layer (str or None, default=None) – Layer to use for expression values. If None, use .X

  • group_col (str or tuple, default='Synt_id') – Column name containing group IDs. For obsm/varm, should be a tuple (key, column_index)

  • group_source (str, default='var') – Location of the group column in AnnData (‘obs’, ‘var’, ‘obsm’, ‘varm’)

  • mode (str, default='any') –

    • ‘any’: Keep groups that pass threshold in any sample/feature

    • ’all’: Keep groups that pass threshold in all samples/features

    • ’mean’: Keep groups that pass threshold on average across samples/features

  • return_dropped (bool, default=False) – If True, also return list of dropped group IDs

  • copy (bool, default=True) – If True, return a copy of the filtered AnnData object. If False, filter the AnnData object in place

  • filter_axis (int, default=1) –

    • 1: Filter rows (obs) based on group expression across columns (var)

    • 0: Filter columns (var) based on group expression across rows (obs)

  • verbose (bool, default=True) – Whether to print additional information during filtering

Returns:

Filtered AnnData object, and optionally a list of dropped group IDs

Return type:

AnnData or tuple

Raises:

ValueError – If parameters are invalid or required data is missing

polyase.get_top_differential_syntelogs(results_df, n=5, sort_by='p_value', fdr_threshold=0.05, ratio_threshold=0.1)

Get the top n syntelogs with differential allelic ratios.

Parameters:
  • results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios function

  • n (int, optional) – Number of top syntelogs to return (default: 5)

  • sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)

  • fdr_threshold (float, optional) – Maximum FDR to consider a result significant (default: 0.05)

Returns:

Filtered dataframe containing only the top n syntelogs

Return type:

pd.DataFrame

polyase.load_ase_data(var_obs_file, isoform_counts_dir, tx_to_gene_file, sample_info=None, counts_file=None, fillna=0, calculate_cpm=True, quant_dir=None, n_jobs=4)

Load allele-specific expression data from long-read RNAseq at isoform level.

Parameters:
  • var_obs_file (str) – Path to the variant observations file.

  • isoform_counts_dir (str) – Directory containing the isoform counts files.

  • tx_to_gene_file (str) – Path to TSV file mapping transcript_id to gene_id.

  • sample_info (dict, optional) – Dictionary mapping sample IDs to their conditions.

  • counts_file (str, optional) – Path to additional counts file (salmon merged transcript counts).

  • fillna (int or float, optional) – Value to fill NA values with.

  • calculate_cpm (bool, optional) – Whether to calculate CPM (Counts Per Million) from EM counts, by default True.

  • quant_dir (str, optional) – Directory containing quantification files with EM counts.

  • n_jobs (int, optional) – Number of parallel jobs for loading samples, by default 4.

Returns:

AnnData object containing the processed isoform-level data with EM counts and CPM layers. Includes all transcripts from expression matrix and tx2gene mapping, with NaN values for var_obs data when genes are not found in var_obs_file.

Return type:

anndata.AnnData

polyase.plot_allele_specific_isoform_structure(results_df, annotation_df, ratio_difference_threshold=0.2, width=1300, height=350, template='simple_white')

Visualize allele-specific isoform structure differences.

Filtering is performed at the syntelog level: if any isoform of a syntelog meets the significance criteria, all isoforms of that syntelog will be plotted.

Parameters:
  • results_df (pd.DataFrame) – DataFrame containing allele-specific results with columns: - Synt_id: Syntelog identifier - isoform_id: Isoform/transcript identifier - gene_id: Gene identifier - haplotype: Haplotype identifier - reference_haplotype: Reference haplotype identifier - sample: Sample identifier - isoform_counts: Isoform counts (will be detected dynamically) - isoform_ratio: Isoform usage ratio - ratio_difference: Difference in ratios between haplotypes

  • annotation_df (polars.DataFrame) – Polars DataFrame containing GTF annotation

  • ratio_difference_threshold (float, default=0.2) – Minimum ratio difference threshold for identifying significant syntelogs (applied at syntelog level)

  • width (int, default=1300) – Plot width in pixels

  • height (int, default=350) – Plot height in pixels

  • template (str, default="simple_white") – Plotly template for styling

Returns:

List of generated plotly figures

Return type:

list

Raises:
  • ImportError – If required packages are not available

  • ValueError – If required columns are missing

polyase.plot_allelic_ratios(adata, synteny_category: str, sample: str | List[str] = 'all', multimapping_threshold: float = 0.5, ratio_type: str = 'both', bins: int = 30, figsize: Tuple[int, int] = (12, 6), kde: bool = True, save_path: str | None = None)

Plot allelic ratios for transcripts in a specific synteny category. Filters out Synt_ids where all allelic ratios are 0 for a given sample.

Parameters:
  • adata (AnnData) – AnnData object containing transcript data

  • synteny_category (str) – Synteny category to filter for

  • sample (str or List[str], default="all") – Sample(s) to plot. If list, will plot each sample separately

  • multimapping_threshold (float, default=0.5) – Threshold for high multimapping ratio

  • ratio_type (str, default="both") – Type of ratio to plot: “unique”, “em”, or “both”

  • bins (int, default=30) – Number of bins for the histogram

  • figsize (tuple, default=(12, 6)) – Figure size (width, height) in inches

  • kde (bool, default=True) – Whether to show KDE curve on histogram

  • save_path (str, optional) – Path to save the plot. If None, plot is shown but not saved

Returns:

The figure object containing the plot(s)

Return type:

matplotlib.figure.Figure

polyase.plot_differential_isoform_usage(results_df, annotation_df, fdr_threshold=0.05, ratio_difference_threshold=0.2, width=1300, height=350, template='simple_white')

Visualize differential isoform usage results using transcript structure plots.

Filtering is performed at the gene level: if any isoform of a gene meets the significance criteria, all isoforms of that gene will be plotted.

Parameters:
  • results_df (pd.DataFrame) – DataFrame containing differential isoform usage results with columns: - gene_id: Gene identifier - transcript_id: Transcript identifier - functional_annotation: Functional annotation (optional) - sample_name: Sample name - {layer}_cpm: CPM normalized counts for the specified layer - isoform_ratio: Isoform usage ratio - condition: Experimental condition - FDR: False discovery rate (optional for filtering) - ratio_difference: Difference in ratios (optional for filtering)

  • annotation_df (polars.DataFrame) – Polars DataFrame containing GTF annotation with: - gene_id: Gene identifier Plus standard GTF columns for RNApysoforms

  • fdr_threshold (float, default=0.05) – FDR threshold for identifying significant genes (applied at gene level)

  • ratio_difference_threshold (float, default=0.2) – Minimum ratio difference threshold for identifying significant genes (applied at gene level)

  • width (int, default=1300) – Plot width in pixels

  • height (int, default=350) – Plot height in pixels

  • template (str, default="simple_white") – Plotly template for styling

Returns:

List of generated plotly figures

Return type:

list

Raises:
  • ImportError – If required packages (RNApysoforms) are not available

  • ValueError – If required columns are missing from input DataFrames

polyase.plot_top_differential_isoforms(results_df, n=5, figsize=(16, 12), palette=None, jitter=0.2, alpha=0.7, ylim=(0, 1), sort_by='p_value', output_file=None, sig_threshold=0.05, ratio_difference_threshold=0.2, sig_color='red')

Plot the top n genes with differential isoform usage in a grid layout (3 plots per row). Genes with significant differences will have their titles highlighted in red.

Parameters:
  • results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios function

  • n (int, optional) – Number of top genes to plot (default: 5)

  • figsize (tuple, optional) – Figure size as (width, height) in inches (default: (16, 12))

  • palette (dict or None, optional) – Color palette for conditions (default: None, uses seaborn defaults)

  • jitter (float, optional) – Amount of jitter for strip plot (default: 0.2)

  • alpha (float, optional) – Transparency of points (default: 0.7)

  • ylim (tuple, optional) – Y-axis limits (default: (0, 1))

  • sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)

  • output_file (str, optional) – Path to save the figure (default: None, displays figure but doesn’t save)

  • sig_threshold (float, optional) – Significance threshold for p-value or FDR (default: 0.05)

  • ratio_difference_threshold (float, optional) – Ratio difference threshold for significance (default: 0.2)

  • sig_color (str, optional) – Color for titles of genes with significant differences (default: ‘red’)

Returns:

fig – The generated figure

Return type:

matplotlib.figure.Figure

polyase.plot_top_differential_syntelogs(results_df, n=5, figsize=(16, 12), palette=None, jitter=0.2, alpha=0.7, ylim=None, sort_by='p_value', output_file=None, sig_threshold=0.05, ratio_difference_threshold=0.1, sig_color='red', plot_type='ratios')

Plot the top n syntelogs with differential allelic ratios or CPM values in a grid layout (6 plots per row). Syntelogs with significant differences will have their titles highlighted in red.

Parameters:
  • results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios function

  • n (int, optional) – Number of top syntelogs to plot (default: 5)

  • figsize (tuple, optional) – Figure size as (width, height) in inches (default: (16, 12))

  • palette (dict or None, optional) – Color palette for conditions (default: None, uses seaborn defaults)

  • jitter (float, optional) – Amount of jitter for strip plot (default: 0.2)

  • alpha (float, optional) – Transparency of points (default: 0.7)

  • ylim (tuple, optional) – Y-axis limits (default: None, auto-determined based on plot_type)

  • sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)

  • output_file (str, optional) – Path to save the figure (default: None, displays figure but doesn’t save)

  • sig_threshold (float, optional) – Significance threshold for p-value or FDR (default: 0.05)

  • ratio_difference_threshold (float, optional) – Ratio difference threshold for significance (default: 0.1)

  • sig_color (str, optional) – Color for titles of syntelogs with significant differences (default: ‘red’)

  • plot_type (str, optional) – What to plot: ‘ratios’ for allelic ratios or ‘cpm’ for CPM values (default: ‘ratios’)

Returns:

fig – The generated figure

Return type:

matplotlib.figure.Figure

polyase.plot_top_differential_syntelogs_annotated(results_df, n=5, figsize=(16, 12), palette=None, jitter=0.2, alpha=0.7, ylim=None, sort_by='p_value', output_file=None, sig_threshold=0.05, ratio_difference_threshold=0.1, sig_color='red', plot_type='ratios', log2fc_threshold=1.0)

Plot the top n syntelogs with significance brackets between alleles based on test results.

Handles two input formats: - Per-allele format (from test_allelic_ratios_within_conditions): one row per allele,

columns ‘allele’, ‘gene_id’, ‘ratios_rep_{condition}’. Brackets are drawn from allele 0 to each significantly differential allele using the per-allele FDR.

  • Pairwise format (from test_allelic_ratios_pairwise): one row per allele pair, columns ‘allele_i’, ‘allele_j’, ‘gene_id_i’, ‘gene_id_j’, ‘ratios_rep_allele_{hap}’. Brackets are drawn between each significantly different allele pair using the pair FDR.

Parameters:
  • results_df (pd.DataFrame) – Results dataframe from test_allelic_ratios_within_conditions or test_allelic_ratios_pairwise

  • n (int, optional) – Number of top syntelogs to plot (default: 5)

  • figsize (tuple, optional) – Figure size as (width, height) in inches (default: (16, 12))

  • palette (dict or None, optional) – Color palette for conditions (default: None, uses seaborn defaults)

  • jitter (float, optional) – Amount of jitter for strip plot (default: 0.2)

  • alpha (float, optional) – Transparency of points (default: 0.7)

  • ylim (tuple, optional) – Y-axis limits (default: None, auto-determined based on plot_type)

  • sort_by (str, optional) – Column to sort results by (‘p_value’, ‘FDR’, or ‘ratio_difference’) (default: ‘p_value’)

  • output_file (str, optional) – Path to save the figure (default: None, displays figure but doesn’t save)

  • sig_threshold (float, optional) – Significance threshold for FDR or p_value (default: 0.05)

  • ratio_difference_threshold (float, optional) – Ratio difference threshold for significance (default: 0.1)

  • sig_color (str, optional) – Color for titles of syntelogs with significant differences (default: ‘red’)

  • plot_type (str, optional) – What to plot: ‘ratios’ for allelic ratios or ‘cpm’ for CPM values (default: ‘ratios’)

  • log2fc_threshold (float or None, optional) – For pairwise between-conditions results: annotate brackets with the log2FC difference when abs(log2FC_difference) exceeds this value (default: 1.0). Set to None to disable.

Returns:

fig – The generated figure

Return type:

matplotlib.figure.Figure

polyase.test_allelic_ratios_between_conditions(adata, layer='unique_counts', group_key='condition', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)

Test if allelic ratios change between conditions and store results in AnnData object.

Parameters:
  • adata (AnnData) – AnnData object containing expression data

  • layer (str, optional) – Layer containing count data (default: “unique_counts”)

  • group_key (str, optional) – Variable column name containing condition information (default: “condition”)

  • inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)

  • FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)

  • ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)

Returns:

  • AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘allelic_ratio_test’]: Complete test results as DataFrame - adata.var[‘allelic_ratio_pval’]: P-values for each allele - adata.var[‘allelic_ratio_FDR’]: FDR-corrected p-values for each allele

  • pd.DataFrame – Results of statistical tests for each syntelog

polyase.test_allelic_ratios_pairwise(adata, layer='unique_counts', test_condition='control', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)

Test all pairwise combinations of alleles within a syntelog for differential expression.

For each syntelog, every pair of alleles is tested against each other using a beta-binomial likelihood ratio test. This is in contrast to test_allelic_ratios_within_conditions, which tests each allele against a balanced (equal) expectation.

Parameters:
  • adata (AnnData) – AnnData object containing expression data

  • layer (str, optional) – Layer containing count data (default: “unique_counts”)

  • test_condition (str, optional) – Condition to subset samples for testing, or “all” to use all samples (default: “control”)

  • inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)

  • FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)

  • ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)

Returns:

If inplace=False, returns modified copy of AnnData; otherwise returns results DataFrame. Results are stored in: - adata.uns[‘allelic_ratio_pairwise_test’]: Complete pairwise test results as DataFrame - adata.var[‘allelic_ratio_pairwise_min_pval’]: Minimum p-value across all pairs per allele - adata.var[‘allelic_ratio_pairwise_min_FDR’]: Corresponding FDR-corrected value

Return type:

AnnData or pd.DataFrame

polyase.test_allelic_ratios_within_conditions(adata, layer='unique_counts', test_condition='control', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)

Test if alleles of a gene have unequal expression and store results in AnnData object.

Parameters:
  • adata (AnnData) – AnnData object containing expression data

  • layer (str, optional) – Layer containing count data (default: “unique_counts”)

  • test_condition (str, optional) – Variable column name containing condition for testing within (default: “control”)

  • inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)

  • FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)

  • ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)

Returns:

  • AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘allelic_ratio_test’]: Complete test results as DataFrame - adata.var[‘allelic_ratio_pval’]: P-values for each allele - adata.var[‘allelic_ratio_FDR’]: FDR-corrected p-values for each allele

  • pd.DataFrame – Results of statistical tests for each syntelog

polyase.test_differential_isoform_structure(adata, layer='unique_counts', test_condition='control', min_similarity_for_matching=0.9, use_introns=True, exon_weight=0.6, intron_weight=0.4, inplace=True, verbose=False, return_plotting_data=True, ratio_diff_cutoff=0.2, FDR_cutoff=0.05)

Test for DIU between alleles with intelligent major/minor isoform fallback.

  • Includes isoforms in plotting data even if they have zero expression in reference haplotype

  • Matches zero-expressed reference isoforms with corresponding isoforms in other haplotypes

Parameters:
  • adata (AnnData) – AnnData object containing expression data

  • layer (str, optional) – Layer containing count data (default: “unique_counts”)

  • test_condition (str, optional) – Condition to test for differential isoform usage (default: “control”)

  • min_similarity_for_matching (float, optional) – Minimum similarity score for matching isoforms (default: 0.9)

  • use_introns (bool, optional) – Whether to include intron structures in similarity calculations (default: True)

  • exon_weight (float, optional) – Weight for exon similarity in overall similarity score (default: 0.6)

  • intron_weight (float, optional) – Weight for intron similarity in overall similarity score (default: 0.4)

  • inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)

  • verbose (bool, optional) – Whether to print progress messages (default: False)

  • return_plotting_data (bool, optional) – Whether to return plotting data along with results (default: True)

  • ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.2)

  • FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)

Returns:

If return_plotting_data=True: returns (results_df, plotting_df) If return_plotting_data=False: returns results_df only

Return type:

tuple or pd.DataFrame

polyase.test_isoform_DIU_between_conditions(adata, layer='unique_counts', group_key='condition', gene_id_key='gene_id', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.2)

Test if isoform usage ratios change between conditions and store results in AnnData object.

Parameters:
  • adata (AnnData) – AnnData object containing expression data

  • layer (str, optional) – Layer containing count data (default: “unique_counts”)

  • group_key (str, optional) – Variable column name containing condition information (default: “condition”)

  • gene_id_key (str, optional) – Variable column name containing gene ID information (default: “gene_id”)

  • inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)

  • FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)

  • ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.2)

Returns:

  • AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘isoform_usage_test’]: Complete test results as DataFrame - adata.var[‘isoform_usage_pval’]: P-values for each isoform - adata.var[‘isoform_usage_FDR’]: FDR-corrected p-values for each isoform

  • pd.DataFrame – Results of statistical tests for each gene

  • pd.DataFrame – Plotting results table with one row per replicate, condition, isoform ratio, and transcript

polyase.test_pairwise_allele_response_between_conditions(adata, layer='unique_counts', group_key='condition', inplace=True, FDR_cutoff=0.05, ratio_diff_cutoff=0.1)

Test if pairs of alleles within a syntelog respond differently between conditions.

For each pair of alleles (i, j) within a syntelog, tests whether the ratio count_i / (count_i + count_j) changes between conditions using a beta-binomial likelihood ratio test. A significant result means the two alleles have divergent fold-changes between conditions (e.g., allele1 up-regulated, allele2 down-regulated).

Parameters:
  • adata (AnnData) – AnnData object containing expression data

  • layer (str, optional) – Layer containing count data (default: “unique_counts”)

  • group_key (str, optional) – Variable column name containing condition information (default: “condition”)

  • inplace (bool, optional) – Whether to modify the input AnnData object or return a copy (default: True)

  • FDR_cutoff (float, optional) – False discovery rate cutoff for significance (default: 0.05)

  • ratio_diff_cutoff (float, optional) – Minimum ratio difference for significance (default: 0.1)

Returns:

  • AnnData or None – If inplace=False, returns modified copy of AnnData; otherwise returns None Results are stored in: - adata.uns[‘pairwise_allele_response_test’]: Complete test results as DataFrame - adata.var[‘pairwise_response_min_pval’]: Min p-value across all pairs per allele - adata.var[‘pairwise_response_min_fdr’]: Min FDR across all pairs per allele

  • pd.DataFrame – Results of pairwise statistical tests