API¶
Species definitions¶
The Catalog contains a large number of species and simulation
model definitions, which are built using a number of classes defined here.
These are usually not intended to be instantiated directly, but should be
accessed through the main entry point, get_species()
.
- class stdpopsim.Species[source]¶
Class representing a species in the catalog.
- Variables
id (str) – The unique identifier for this species. The species ID is the three first letters of the genus name followed by the first three letters of the species name, and does not contain any spaces or punctuation. The usual scheme is to use the first three letters of the genus and species (similar to the approach used in the UCSC genome browser), e.g., “HomSap” is the ID for Homo Sapiens.
name (str) – The full name of this species in binominal nomenclature as it would be used in written text, e.g., “Homo sapiens”.
common_name (str) – The name of this species as it would most often be used informally in written text, e.g., “human”, or “Orang-utan”. Where no common name for the species exist, use the most common abbreviation, e.g., “E. Coli”.
genome (stdpopsim.Genome) – The
Genome
instance describing the details of this species’ genome.generation_time (float) – The current best estimate for the generation time of this species in years. Note that individual demographic models in the catalog may or may not use this estimate: each model uses the generation time that was used in the original publication(s).
population_size (float) – The current best estimate for the population size of this species. Note that individual demographic models in the catalog may or may not use this estimate: each model uses the population sizes defined in the original publication(s).
citations (list) – A list of
Citation
objects providing the source for the generation time and population size estimates.demographic_models (list) – This list of
DemographicModel
instances in the catalog for this species.dfes (list) – This list of
DFE
instances in the catalog for this species.ensembl_id (str) – The ensembl id for the species which is used by maintenance scripts to query ensembl’s database.
- get_annotations(id)[source]¶
Returns a set of annotations with the specified
id
.- Parameters
id (str) – The string identifier for the set of annotations A complete list of IDs for each species can be found in the “Annotations” subsection for the species in the Catalog.
- Return type
- Returns
A
Annotation
that holds genome annotation information from Ensembl
- get_contig(chromosome=None, genetic_map=None, length_multiplier=1, length=None, mutation_rate=None, inclusion_mask=None, exclusion_mask=None)[source]¶
Returns a
Contig
instance describing a section of genome that is to be simulated based on empirical information for a given species and chromosome.- Parameters
chromosome (str) – The ID of the chromosome to simulate. A complete list of chromosome IDs for each species can be found in the “Genome” subsection for the species in the Catalog. If the chromosome is not given, we specify a “generic” contig with given
length
.genetic_map (str) – If specified, obtain recombination rate information from the genetic map with the specified ID. If None, simulate using a default uniform recombination rate on a region with the length of the specified chromosome. The default rates are species- and chromosome- specific, and can be found in the Catalog. (Default: None)
length_multiplier (float) – If specified, simulate a region of length length_multiplier times the length of the specified chromosome with the same chromosome-specific mutation and recombination rates. This option cannot currently be used in conjunction with the
genetic_map
argument.mutation_rate (float) – The per-base mutation rate. If none is given, the mutation rate defaults to the rate specified by species chromosomes.
inclusion_mask – If specified, simulated genomes are subset to only inlude regions given by the mask. The mask can be specified by the path and file name of a bed file or as a list or array of intervals given by the left and right end points of the intervals.
exclusion_mask – If specified, simulated genomes are subset to exclude regions given by the mask. The mask can be specified by the path and file name of a bed file or as a list or array of intervals given by the left and right end points of the intervals.
length (float) – Used with a “generic” contig, specifies the length of genome sequence for this contig. For a generic contig, mutation and recombination rates are equal to the genome-wide average across all autosomal chromosomes.
- Return type
- Returns
A
Contig
describing the section of the genome.
- get_demographic_model(id)[source]¶
Returns a demographic model with the specified
id
.- Parameters
id (str) – The string identifier for the demographic model. A complete list of IDs for each species can be found in the “Demographic Models” subsection for the species in the Catalog.
- Return type
- Returns
A
DemographicModel
that defines the requested model.
- class stdpopsim.Genome[source]¶
Class representing the genome for a species.
- Variables
chromosomes (list) – A list of
Chromosome
objects.citations (list) – A list of
Citation
objects providing the source for the genome assembly, mutation rate and recombination rate estimates.length (int) – The total length of the genome.
- static from_data(genome_data, *, recombination_rate, mutation_rate, citations)[source]¶
Construct a Genome object from the specified dictionary of genome information from Ensembl, recombination_rate and mutation_rate dictionaries.
This method is for internal use only.
- get_chromosome(id)[source]¶
Returns the chromosome with the specified
id
.- Parameters
id (str) – The string ID of the chromosome. A complete list of chromosome IDs for each species can be found in the “Genome” subsection for the species in the Catalog.
- Return type
- Returns
A
Chromosome
that defines properties of the chromosome such as length, mutation rate, and recombination rate.
- property mean_mutation_rate¶
The length-weighted mean mutation rate across all chromosomes.
- property mean_recombination_rate¶
The length-weighted mean recombination rate across all chromosomes.
- class stdpopsim.Chromosome[source]¶
Class representing a single chromosome for a species.
- Variables
id (str) – The string identifier for the chromosome.
length (int) – The length of the chromosome.
mutation_rate (float) – The mutation rate used when simulating this chromosome.
recombination_rate (float) – The recombination rate used when simulating this chromosome (if not using a genetic map).
synonyms (list of str) – List of synonyms that may be used when requesting this chromosome by ID, e.g. from the command line interface.
- class stdpopsim.Contig[source]¶
Class representing a contiguous region of genome that is to be simulated. This contains the information about mutation rates and recombination rates that are needed to simulate this region. Genomic element types can be provided.
- Variables
mutation_rate (float) – The rate of mutation per base per generation.
recombination_map (msprime.RateMap) – The recombination map for the region. See the
msprime.RateMap
for details.mask_intervals (array-like (?)) – Intervals to keep in simulated tree sequence, as a list of (left_position, right_position), such that intervals are non-overlapping and in ascending order. Should have shape Nx2, where N is the number of intervals.
exclude (bool) – If True,
mask_intervals
specify regions to exclude. If False,mask_intervals
specify regions in keep.genomic_element_types (list) – a list of
GenomicElementType
objects. By default, the only element is a single genomic element type that spans the entire contiguous region.mutation_types – a list of
ext.MutationType
object. By default, the only mutation type is neutral. Seeext.MutationType
for more details.
Note
To run stdpopsim simulations with alternative, user-specified mutation or recombination rates, a new contig can be created based on an existing one. For instance, the following will create a
new_contig
that, when simulated, will have double the mutation rate of theold_contig
:new_contig = stdpopsim.Contig( mutation_rate=old_contig.mutation_rate * 2, recombination_map=old_contig.recombination_map, genetic_map=old_contig.genetic_map, )
- add_DFE(intervals, DFE)[source]¶
Adds the provided DFE to the intervals specified, by making a new genomic element type with the mutation types and proportions specified by the DFE.
- Parameters
intervals (array) – A valid set of intervals.
dfe (DFE) – A DFE object.
- add_mutation_types(mutation_types, proportions, genomic_element_type_id)[source]¶
Adds mutation types with their respective proportions to the genomic element type provided.
- property all_intervals_array¶
Returns an (n, 3) numpy array with the intervals across all genomic element types in the form (left, right, type).
- class stdpopsim.Citation[source]¶
A reference to the literature that should be acknowledged by users of stdpopsim.
- Variables
- class stdpopsim.Annotation[source]¶
Class representing an annotation track.
- Variables
id (str) – String that uniquely identifies the annotation.
species (
Species
) – The species to which this annotation applies.url (str) – The URL where the packed and compressed GFF3 can be found.
intervals_url (str) – The URL of the intervals cache of the annotations.
intervals_sha256 (str) – The SHA256 checksum of the annotations cache.
description (str) – One line description of the annotation.
citations (list of
Citation
) – List of citations for the annotation.file_pattern – The pattern used to map individual chromosome id strings to files
Demographic Models¶
- class stdpopsim.DemographicModel[source]¶
Class representing a demographic model.
Instances of this class are constructed by model implementors, following the developer documentation. To instead obtain a pre-specified model as listed in the Catalog, see
Species.get_demographic_model
.- Variables
id (str) – The unique identifier for this model. DemographicModel IDs should be short and memorable, and conform to the stdpopsim naming conventions for demographic models.
description (str) – A short description of this model as it would be used in written text, e.g., “Three population Out-of-Africa”. This should describe the model itself and not contain author or year information.
long_description (str) – A concise, but detailed, summary of the model.
generation_time (int) – Mean inter-generation interval, in years.
citations (list of
Citation
) – A list ofCitation
, that describe the primary reference(s) for the model.mutation_rate (float) – The mutation rate associated with the demographic model. If no mutation rate is given, the species’ default mutation rate is used.
- get_samples(*args)[source]¶
Returns a list of msprime.Sample objects, with the number of samples from each population determined by the positional arguments. For instance,
model.get_samples(2, 5, 7)
would return a list of 14 samples, two of which are from the model’s first population (i.e., with population IDmodel.populations[0].id
), five are from the model’s second population, and seven are from the model’s third population. The number of of arguments must be less than or equal to the number of “sampling” populations,model.num_sampling_populations
; if the number of arguments is less than the number of sampling populations, then remaining numbers are treated as zero.Todo
This documentation is broken. We’re now returning msprime SampleSet objects.
- class stdpopsim.Population[source]¶
Class recording metadata representing a population in a simulation.
- Variables
id (str) – The id of the population.
description (str) – a short description of the population
sampling_time (int) – an integer value which indicates how many generations prior to the present individuals should samples should be drawn from this population. If None, sampling not allowed from this population (default = 0).
Generic models¶
The Catalog contains simulation models from the literature that are defined for particular species. It is also useful to be able to simulate more generic models, which are documented here. Please see the Running a generic model for examples of using these models.
- class stdpopsim.PiecewiseConstantSize(N0, *args)[source]¶
Class representing a generic simulation model that can be run to output a tree sequence. This is a piecewise constant size model, which allows for instantaneous population size change over multiple epochs in a single population.
- Parameters
N0 (float) – The initial effective population size
args – Each subsequent argument is a tuple (t, N) which gives the time at which the size change takes place and the population size.
The usage is best illustrated by an example:
model1 = stdpopsim.PiecewiseConstantSize(N0, (t1, N1)) # One change model2 = stdpopsim.PiecewiseConstantSize(N0, (t1, N1), (t2, N2)) # Two changes
- class stdpopsim.IsolationWithMigration(NA, N1, N2, T, M12, M21)[source]¶
Class representing a generic simulation model that can be run to output a tree sequence. A generic isolation with migration model where a single ancestral population of size NA splits into two populations of constant size N1 and N2 time T generations ago, with migration rates M12 and M21 between the split populations. Sampling is disallowed in population index 0, as this is the ancestral population.
- Parameters
NA (float) – The initial ancestral effective population size
N1 (float) – The effective population size of population 1
N2 (float) – The effective population size of population 2
T (float) – Time of split between populations 1 and 2 (in generations)
M12 (float) – Migration rate from population 1 to 2
M21 (float) – Migration rate from population 2 to 1
Example usage:
model1 = stdpopsim.IsolationWithMigration(NA, N1, N2, T, M12, M21)
Simulation Engines¶
Support for additional simulation engines can be implemented by subclassing
the abstract Engine
class, and registering an instance of the
subclass with register_engine()
.
These are usually not intended to be instantiated directly, but should be
accessed through the main entry point, get_engine()
.
- stdpopsim.get_default_engine()[source]¶
Returns the default simulation engine (msprime).
- Return type
- stdpopsim.register_engine(engine)[source]¶
Registers the specified simulation engine.
- Parameters
engine (
Engine
) – The simulation engine object to register.
- class stdpopsim.Engine[source]¶
Abstract class representing a simulation engine.
To implement a new simulation engine, one should inherit from this class. At a minimum, the
id
,description
andcitations
attributes must be set, and thesimulate()
andget_version()
methods must be implemented. See msprime example inengines.py
.- Variables
- simulate(demographic_model, contig, samples, *, seed=None, dry_run=False)[source]¶
Simulates the model for the specified contig and samples.
demographic_model
,contig
, andsamples
must be specified.- Parameters
demographic_model (
DemographicModel
) – The demographic model to simulate.contig (
Contig
) – The contig, defining the length, mutation rate, and recombination rate(s).samples (list of
msprime.simulations.Sample
) – The samples to be obtained from the simulation.seed (int) – The seed for the random number generator.
dry_run (bool) – If True, the simulation engine will return None without running the simulation.
- Returns
A succinct tree sequence.
- Return type
tskit.trees.TreeSequence
or None
- class stdpopsim.engines._MsprimeEngine[source]¶
Bases:
stdpopsim.engines.Engine
- description = 'Msprime coalescent simulator'¶
- id = 'msprime'¶
- simulate(demographic_model, contig, samples, *, seed=None, msprime_model=None, msprime_change_model=None, dry_run=False, **kwargs)[source]¶
Simulate the demographic model using msprime. See
Engine.simulate()
for definitions of parameters defined for all engines.- Parameters
msprime_model (str) – The msprime simulation model to be used. One of
hudson
,dtwf
,smc
, orsmc_prime
. See msprime API documentation for details.msprime_change_model (list of (float, str) tuples) – A list of (time, model) tuples, which changes the simulation model to the new model at the time specified.
dry_run (bool) – If True,
end_time=0
is passed tomsprime.simulate()
to initialise the simulation and then immediately return.**kwargs – Further arguments passed to
msprime.sim_ancestry()
- class stdpopsim.slim_engine._SLiMEngine[source]¶
Bases:
stdpopsim.engines.Engine
- description = 'SLiM forward-time Wright-Fisher simulator'¶
- id = 'slim'¶
- recap_and_rescale(ts, demographic_model, contig, samples, extended_events=None, slim_scaling_factor=1.0, seed=None, **kwargs)[source]¶
Apply post-SLiM transformations to
ts
. This rescales node times, does recapitation, simplification, and adds neutral mutations.If the SLiM engine was used to output a SLiM script, and the script was run outside of stdpopsim, this function can be used to transform the SLiM tree sequence following the procedure that would have been used if stdpopsim had run SLiM itself. The parameters after
ts
have the same meaning as forsimulate()
, and the values fordemographic_model
,contig
,samples
, andslim_scaling_factor
should match those that were used to generate the SLiM script withsimulate()
.- Parameters
ts (
tskit.TreeSequence
) – The tree sequence output by SLiM.
Warning
The
recap_and_rescale()
function is provided in the hope that it will be useful. But as we can’t anticipate what changes you’ll make to the SLiM code before using it, the stdpopsim source code should be consulted to determine if the behaviour is appropriate for your case.
- simulate(demographic_model, contig, samples, *, seed=None, extended_events=None, slim_path=None, slim_script=False, slim_scaling_factor=1.0, slim_burn_in=10.0, dry_run=False, verbosity=None)[source]¶
Simulate the demographic model using SLiM. See
Engine.simulate()
for definitions of thedemographic_model
,contig
, andsamples
parameters.- Parameters
seed (int) – The seed for the random number generator.
slim_path (str) – The full path to the slim executable, or the name of a command in the current PATH.
slim_script (bool) – If true, the simulation will not be executed. Instead the generated SLiM script will be printed to stdout.
slim_scaling_factor (float) – Rescale model parameters by the given value, to speed up simulation. Population sizes and generation times are divided by this factor, whereas the mutation rate, recombination rate, and growth rates are multiplied by the factor. See SLiM manual: 5.5 Rescaling population sizes to improve simulation performance.
slim_burn_in (float) – Length of the burn-in phase, in units of N generations.
dry_run (bool) – If True, run the first generation setup and then end the simulation.