phylox.generators.heath.heath
This module contains the code to generate phylogenetic networks using the Heath model as it was called in the paper: “Comparing the topology of phylogenetic network generators” by Remie Janssen and Pengy Liu (2021).
This heath model is an extention of the model by Heath et al for generating trees: “Taxon sampling affects inferences of macro-evolutionary processes from phylogenetic trees” by Heath TA, Zwickl DJ, Kim J, Hillis DM (2008).
The main difference is that hybridization and HGT events are allowed, the rates of these events are handled similarly to the speciation and extinction rates in the original model.
Functions
|
A reparameterization of the Gamma Distribution from scipy stats, with mean parameter instead of a shape parameter. |
Returns a hybdridization rate for a given distance. The distance represents the evolutionary distance between two extant species. The function is used in the Heath generator to determine the hybdridization rate between two current taxa, where the distance is a weighthed sum of all up-down distances between the two taxa in the current network. The dependency on the distance takes the following shape: - hybridization_left_bound: l - hybridization_right_bound: r - hybridization_left_rate: rl - hybridization_right_rate: rr. |
|
|
Removes all leaves in the network that are not in the leaves_to_keep container. |
|
Suppresses degree 2 (in-degree 1 out-degree 1) nodes of the network in place and returns the network. |
- phylox.generators.heath.heath.gamma_distribution_pdf(value, mean, shape)
A reparameterization of the Gamma Distribution from scipy stats, with mean parameter instead of a shape parameter.
- phylox.generators.heath.heath.update_rate(parent_rate, prior_mean, prior_shape, update_shape, seed=None)
- Generates the new value for a rate used in the Heath generator based on
the current value,
the multiplicative update factor np.random.gamma(<update_shape>, 1 / <update_shape>)
the prior Gamma Distribution for the rate type GammaDistributionPDF(rate_value, prior_mean, prior_shape)
- phylox.generators.heath.heath.update_all_rates(parent_rates, update_shape, speciation_rate_mean, speciation_rate_shape, ext_used, extinction_rate_mean, extinction_rate_shape, hgt_used, hgt_rate_mean, hgt_rate_shape, seed=None)
Updates all rates for the Heath generator using the update_rate function.
- phylox.generators.heath.heath.graph_distance_to_hybridization_rate(distance, hybridization_left_bound, hybridization_right_bound, hybridization_left_rate, hybridization_right_rate)
Returns a hybdridization rate for a given distance. The distance represents the evolutionary distance between two extant species. The function is used in the Heath generator to determine the hybdridization rate between two current taxa, where the distance is a weighthed sum of all up-down distances between the two taxa in the current network. The dependency on the distance takes the following shape:
hybridization_left_bound: l
hybridization_right_bound: r
hybridization_left_rate: rl
hybridization_right_rate: rr.
| lr +--- | \ | \ | \ rr + ----- | 0 +---+----+----- 0 l r
where the distance is on the x-axis, and the hybridization rate on the y-axis.
- phylox.generators.heath.heath.restrict_network_to_leaf_set(network, leaves_to_keep)
Removes all leaves in the network that are not in the leaves_to_keep container. Then cleans up the network, by iteratively removing out-degree 0 nodes that are not in the leaves_to_keep set, and suppressing in-degree 1 out-degree 1 nodes. Modifies the network in place and returns it.
- phylox.generators.heath.heath.suppress_degree_two_nodes(network)
Suppresses degree 2 (in-degree 1 out-degree 1) nodes of the network in place and returns the network. The length of the new edge is the sum of the lengths of the old two edges. If the bottom edge had a probability, then this probability is given to the new edge.
- phylox.generators.heath.heath.generate_heath_network(time_limit=1.0, taxa_limit=None, update_shape=2.0, speciation_rate_mean=2.0, speciation_rate_shape=2.0, ext_used=True, count_extinct=False, extinction_rate_mean=1.0, extinction_rate_shape=1.0, hgt_used=False, hgt_rate_mean=None, hgt_rate_shape=None, hgt_inheritance=0.05, hyb_used=False, hybridization_left_bound=None, hybridization_right_bound=None, hybridization_left_rate=None, hybridization_right_rate=None, simple_output=False, seed=None)
Runs a speciation-extinction-HGT-hybridization model for the given time (time_limit) or until a certain number of extant taxa (taxa_limit) is reached. If all lineages go extinct before the given time is reached, another attempt is made. Each extant taxon has its own speciation, HGT, and extinction rates (rate=1/mean_time_until_next_event). Hybridization rates are evolutionary distance dependent, with a function determined by global parameters
There are prior speciation/extinction/HGT rate distributions: gamma distributions with a given mean and a shape parameter for speciation (speciation_rate_mean and speciation_rate_shape) and extinction (extinction_rate_mean and extinction_rate_shape), HGT (Horizontal gene transfer) is turned off by default, bu can be turned on by setting hgt_used=True. This also requires you to set the paramaters for the HGT rate distribution (hgt_rate_mean and hgt_rate_shape). Extinction can be turned off by setting ext_used=False.
If an HGT event happens for a given taxon, another taxon (including itself) is chosen uniformly at random to donate genetic material (uniformly distributed contribution in [0,max_hgt] where max_hgt is determined by the hgt_inheritance).
If hybridization is turned on (hyb_used=True), the hybdridization rate between two taxa is calculated as a function of the distance between those taxa. This distance is a weighthed sum of all up-down distances between the two taxa in the current network. The dependency on the distance takes the following shape:
hybridization_left_bound: l
hybridization_right_bound: r
hybridization_left_rate: rl
hybridization_right_rate: rr.
| lr +--- | \ | \ | \ rr + ----- | 0 +---+----+----- 0 l r
where the distance is on the x-axis, and the hybridization rate on the y-axis.
After speciation or hybridization, each rate of the new lineages is set by multiplying the (weighted mean) rate of the parent lineage(s) by a gamma-distributed factor with mean 1 and a shape parameter (update_shape), and then accepting this rate with a probability proportional to the prior distribution for this rate. This gives an ultrametric network on the extant species.
The random seed can be set with the seed parameter.
Returns a network without leaf labels, the set of hybrid nodes, the set of extant taxa, and the number of extinct taxa.