Modeling RNA polymerase interaction in mitochondria of chordates

Background In previous work, we introduced a concept, a mathematical model and its computer realization that describe the interaction between bacterial and phage type RNA polymerases, protein factors, DNA and RNA secondary structures during transcription, including transcription initiation and termination. The model accurately reproduces changes of gene transcription level observed in polymerase sigma-subunit knockout and heat shock experiments in plant plastids. The corresponding computer program and a user guide are available at http://lab6.iitp.ru/en/rivals. Here we apply the model to the analysis of transcription and (partially) translation processes in the mitochondria of frog, rat and human. Notably, mitochondria possess only phage-type polymerases. We consider the entire mitochondrial genome so that our model allows RNA polymerases to complete more than one circle on the DNA strand. Results Our model of RNA polymerase interaction during transcription initiation and elongation accurately reproduces experimental data obtained for plastids. Moreover, it also reproduces evidence on bulk RNA concentrations and RNA half-lives in the mitochondria of frog, human with or without the MELAS mutation, and rat with normal (euthyroid) or hyposecretion of thyroid hormone (hypothyroid). The transcription characteristics predicted by the model include: (i) the fraction of polymerases terminating at a protein-dependent terminator in both directions (the terminator polarization), (ii) the binding intensities of the regulatory protein factor (mTERF) with the termination site and, (iii) the transcription initiation intensities (initiation frequencies) of all promoters in all five conditions (frog, healthy human, human with MELAS syndrome, healthy rat, and hypothyroid rat with aberrant mtDNA methylation). Using the model, absolute levels of all gene transcription can be inferred from an arbitrary array of the three transcription characteristics, whereas, for selected genes only relative RNA concentrations have been experimentally determined. Conversely, these characteristics and absolute transcription levels can be obtained using relative RNA concentrations and RNA half-lives known from various experimental studies. In this case, the “inverse problem” is solved with multi-objective optimization. Conclusions In this study, we demonstrate that our model accurately reproduces all relevant experimental data available for plant plastids, as well as the mitochondria of chordates. Using experimental data, the model is applied to estimate binding intensities of phage-type RNA polymerases to their promoters as well as predicting terminator characteristics, including polarization. In addition, one can predict characteristics of phage-type RNA polymerases and the transcription process that are difficult to measure directly, e.g., the association between the promoter’s nucleotide composition and the intensity of polymerase binding. To illustrate the application of our model in functional predictions, we propose a possible mechanism for MELAS syndrome development in human involving a decrease of Phe-tRNA, Val-tRNA and rRNA concentrations in the cell. In addition, we describe how changes in methylation patterns of the mTERF binding site and three promoters in hypothyroid rat correlate with changes in intensities of the mTERF binding and transcription initiations. Finally, we introduce an auxiliary model to describe the interaction between polysomal mRNA and ribonucleases.


Conclusions:
In this study, we demonstrate that our model accurately reproduces all relevant experimental data available for plant plastids, as well as the mitochondria of chordates. Using experimental data, the model is applied to estimate binding intensities of phage-type RNA polymerases to their promoters as well as predicting terminator characteristics, including polarization. In addition, one can predict characteristics of phage-type RNA polymerases and the transcription process that are difficult to measure directly, e.g., the association between the promoter's nucleotide composition and the intensity of polymerase binding. To illustrate the application of our model in functional predictions, we propose a possible mechanism for MELAS syndrome development in human involving a decrease of Phe-tRNA, Val-tRNA and rRNA concentrations in the cell. In addition, we describe how changes in methylation patterns of the mTERF binding site and three promoters in hypothyroid rat correlate with changes in intensities of the mTERF binding and transcription initiations. Finally, we introduce an auxiliary model to describe the interaction between polysomal mRNA and ribonucleases.
Keywords: RNA polymerase interaction, RNA polymerase competition, Transcription, Circular DNA, mtDNA in chordates, MELAS syndrome, Impact of DNA methylation, Hyposecretion of hormones, RNA interaction model, Polysome and ribonuclease interaction model

Background
In this work, we use the model developed and applied to study plant plastids [1] in the analysis of transcription in the mitochondria of chordates. The model is generally applicable to plastids and mitochondria, prokaryotes, and nuclear DNA. We also propose an auxiliary model describing a specific component of translation.
Many eukaryotes possess mitochondria, semiautonomous organelles with a highly reduced genome. Animal mitochondria encode 22 tRNAs, 2 rRNAs and 13 proteins in a circular chromosome of 15-18 kbp. Transcription is conducted by phage-type RNA polymerases homologous to the polymerases of phages T7 and T3. Transcription initiation requires auxiliary protein factors and occurs at up to five distinct promoters, producing transcripts that can potentially be longer than the chromosome itself.
Among transcription factors are the mtTFA and mtTFB proteins, which bind the polymerase at each promoter [2,3]. These factors detach from the polymerase after transcription of 13 nucleotides. Several mtTFA isoforms are known that play a role in alternative splicing [4]. Two mtTFB isoforms, mtTFB1 and mtTFB2, are both involved in transcription initiation. However, the role of these transcription factors is not considered in this study.
Another essential initiation factor is mTERF [5], which also mediates transcription termination in the form of cooperative binding. This factor is an important component of our model. The properties of phage-type RNA polymerases have previously been studied [6][7][8][9]. In particular, in a head-on collision, two RNA polymerases approaching one another on the same DNA may pass by one another [6]. We assume that in this case the antisense mRNA forms a duplex and becomes inaccessible for the translation machinery.
Our study focuses on mitochondria of human [Homo sapiens, Genbank:NC_012920.1], rat [Rattus norvegicus, Genbank:NC_001665.2], and clawed frog [Xenopus laevis, Genbank:NC_001573.1]. Data on the mitochondrial genome of mouse [Mus musculus, Genbank: NC_005089.1] was also used as having the same gene order. These model organisms were chosen due to the availability of ample experimental evidence on RNA concentrations and half-lives that can be used to estimate gene transcription levels (transcription frequencies). The mitochondrial genomes of human, rat and frog are given in Figure 1, Figure 2, Figure 3, respectively. The RNA half-lives are described in Additional file 1 (Section 1).

Structure and arrangement of mitochondrial promoters
Mitochondrial promoter locations differ substantially among species. Table 1 shows experimentally determined locations of mitochondrial promoters in human, rat, and frog. Human mitochondria possess three promoters, HSP1, HSP2, and LSP. Both the HSP1 and LSP promoters contain the conserved box 5′-CANACC(G) CC(A)AAAGAYA-3′ [10]. The transcription initiation site is located inside the conserved box 6-8 bases before 3'-end. The transcription initiation site is located at position 561 in HSP1 (upstream from gene tRNA-Phe), at position 646 in HSP2 (upstream from gene 12 S rRNA) [11], and at position 407 in LSP [12]. The promoter quality is affected by the regions: -16 -+7 for HSP1, and −28 -+16 for LSP [13].
Rat mitochondria also possess three promoters [14]. The transcription initiation site is located at position 16298 in HSP1 (15 bases upstream from gene tRNA-Phe), at position 66 in HSP2 (upstream from gene 12 S rRNA), and at position 16193 in LSP.
Five promoters are known in frog mitochondria, HSP1, HSP2, LSP1, LSP2A, and LSP2B, all lying upstream of the tRNA-Phe gene [15,16]. Transcription is initiated at the conserved box ACRTTATA. Relative transcription initiation intensities have been estimated for the wild type and several mutant frog genotypes [17] (Table 2), and are illustrated in our Table 2 for the reader's convenience.  The entire circular DNA is presented sequentially as four rows. Genes in the heavy strand (H-strand) are shown by green arrows, and genes in the light strand (L-strand)by blue arrows. HSP1, HSP2 are two promoters in the H-strand. LSP1, LSP2A, LSP2B are three promoters in the L-strand. mTERF symbolizes a binding site of the protein factor mTERF acting as transcription terminator.

Effect of protein factors and mtDNA methylation on gene transcription levels
In early embryogenesis of frog a continuous increase of mtTFA concentrations is observed [18] associated with the elevation of gene transcription levels [19]. In the beginning of this period replication in mitochondria are almost arrested, and the initial pool of mitochondria is distributed between the dividing cells.
Neither changes in mitochondrial gene expression levels during development in human [21,22], nor mitochondrial gene expression levels in different tissues of rat under different conditions [23,24] are currently included in our model.

mTERF-dependent transcription termination
In human mitochondria, two terminators exist, each possessing different termination mechanisms. In the first mechanism, the mTERF protein binds to a 28 bp region immediately downstream of the 16 S rRNA gene; and located within the tRNA-Leu gene. This terminator is considered to be polarized and blocks almost all RNA polymerases on the DNA light strand but allows some to move through onto the heavy strand. A negligible portion of polymerases still passes through the terminator to transcribe the 16 S rRNA gene in the antisense direction [19]. The second mechanism, the mTERFindependent transcription termination due to G-quadruplex, is described in Additional file 1 (Section 2).   In mammals, there are two hypotheses about mechanism of transcription regulation on the heavy strand [5,14]. First, HSP1-initiated transcription may terminate after the rRNA gene, and longer transcripts are initiated at HSP2 only. Alternatively, all promoters may initiate longer transcripts, and some polymerases stop at mTERF regardless of the promoter.
Notably, in mammals mTERF binds cooperatively with the termination site and the mTERF activator site in the proximity of promoter HSP1, thus functioning as both terminator and activator [11].
The termination site is conserved and located downstream of the rRNA genes in many animals [25]. Similarly mTERF homologs have been detected in the nuclear genomes of many animals.

MELAS syndrome
The A ! G transition at position 3243, in the middle of the mTERF binding site, decreases mTERF's affinity for DNA, thus causes a mitochondrial disorder. In human, this mutation causes: (i) a small decrease of rRNA transcription (12 S and 16 S, at similar levels), (ii) a decrease of tRNA-Leu concentration of up to 20%, (iii) a decrease in tRNA-Lys concentration of up to 50%, (iv) a slight decrease in total mRNA, and (v) a noticeable change in the amount of protein products [26]. These changes have to be reproduced by any model.

Input data, main model and its parameters
Complete mitochondrial genomes of human, rat, and frog analyzed in the study were obtained from GenBank, NCBI, [27], Figure 1, Figure 2, Figure 3.
Here, the protein factor under consideration is the multifunctional regulatory protein mTERF. G-quadruplex implicates DNA regions, but cross-hairpins typical for plastid [1] and bacterial genome DNA [28] are likely absent from the mitochondrial genomes of the animals under study.
Let us recall our model of interaction between RNA polymerases themselves, factors, and structures in DNA or RNA [1]. RNA polymerases can potentially attempt to bind with all promoters at a given locus. Each promoter is characterized by real number λ, the intensity of attempts to bind one of the surrounding polymerases. Formally, λ is the parameter of a Poisson stochastic process. An attempt is "successful" if the promoter is not already occupied by a polymerase or any other factor, even partially. After binding, polymerases move along the DNA strand and terminate at collisions with oncoming polymerases. They can also terminate at encountering protein factors or secondary structures. The abort process is not considered, as only phage-type RNA polymerases are present in mitochondria.
Similarly, protein factors attempt to bind their target sites. An attempt is "successful" if the site is not occupied by a polymerase or another protein. If a polymerase encounters a site bound with a protein factor, it either passes through and the protein-DNA complex dissociates, or it terminates and the complex survives. The analogous situation for cross-hairpins [1], is beyond the scope of the current study. Time intervals between any successive events in the model were estimated from distributions of stochastic and deterministic (polymerase movement) processes described above and were then summed up. Thus, each event can be described with a modeled real time commencing with the start of all modeled processes. The modeled real time of course does not coincide with the computation time, but it allows the transcription level (transcription frequency) for each gene to be computed in terms of the organism's time.
The model has four fixed parameters: (i) the elongation rate (which is constant among chordates), (ii) the size of the phage-type RNA-polymerase, and (iii) the ratio of polymerases terminating at a G-quadruplex, (iv) the geometry of mtDNA, i.e., the location of genes, promoters and terminators. These parameters are discussed in the next section.
Unknown parameters are the transcription initiation (polymerase binding) attempt intensities at each promoter, mTERF protein terminator binding attempt intensities, and the conditional (provided that mTERF is already bound) probabilities p and q of passing the mTERF-dependent terminator in both directions. The model contains no other parameters. All gene transcription levels are estimated in the model using the values of these parameters.
The model software allows the user to specify physical time of modeling along a trajectory, the model run-up time, the number of trajectories to average, etc. The main executing parameters are the number of processors and time to halt modeling and create a checkpoint.
The solution to the model is a set of parameters, including intensities of binding attempts to all existing promoters, conditional probabilities p and q of passing the mTERF-dependent terminator in both directions, and intensity λ of mTERF binding attempts. Here λ incorporates the process of spontaneous dissociation of the mTERFÁDNA complex. Table 3 shows conditional probabilities p and q (passage probabilities) inferred under two RNA polymerase movement rates.
The modeling procedure details are described in Additional file 1 (Section 3).
Comparing the model and experiment requires knowledge of RNA half-lives. This was not considered in the original model description [1]. Notably, for many genes, the transcript concentration relative to the null time point or to a reference RNA concentration (relative RNA concentration), and sometimes the RNA half-life is known in a stationary state.
This allows a gene's transcription level to be correlated with its RNA concentration and half-life.
In frog, the RNA concentration u ij is known for the jth gene at the i-th time point relative to its RNA concentration at null time point, i.e., u ij ¼ Absolute half-lives t j are unknown. Table 4 provides experimental ratios u ij (errors not estimated) and ratios z ij z oj of mean values z ij , and they are compared to each other. The values z ij themselves are in Additional file 2.
In human, RNA concentration u j relative to the reference gene ND1 and RNA half-lives are known in a stationary state, i.e., u j ¼ 2z j Át j 2z 0 Át 0 , where z j is the transcription level of the j-th gene, and t j is the RNA half-life of the jth gene. The ratio z j z 0 is estimated in the model and compared with the experimentally known in Table 5, upper rows. The special case of gene COX1 is discussed in Additional file 1 (Section 3).
In both rats, mRNA concentrations of the genes COX1, ATP6/8, COX3, ND4, ND5, CYTB were measured individually relative to 16 S rRNA. Each ratio in the hypothyroid rat was calculated as percentage of the corresponding ratio in the euthyroid rat, Table 6. Thus, the experimentally measured ratio is where z j is the transcription level of the j-th gene in hypothyroid (h) or euthyroid (e) rats, j = 1,. . .,6; z 0 is the 16 S rRNA transcription level; t are half-lives specific to j, e, h. Therefore, the value The numerator and denominator of the ratio are both unknown.
Note that, when comparing gene transcription levels in model and experiment, the choice of functional is not straightforward. Consider the natural functional Where x ji and y ji are relative transcription levels in gene j at time i (if no time data, the i index is omitted) in the model and the experiment. To compare data on three experimental frogs simultaneously, metric (3) can be generalized: where n k is the "dimension" of compared data, s ¼ The dimensions are, n 1 = 54 (nine time points, six genes), n 2 = 36 (six time points, six genes), n 3 = 10 (five time points, two genes), respectively. We also considered some other functionals, listed in Additional file 1 (Section 4), which produced similar solutions. Hereafter, only functionals (3)-(4) are discussed.

Fixed parameters of the main model
All model parameters are listed in the previous section.
As for the mtDNA geometry, sequence data was obtained from GenBank, NCBI, [27]. Multiple alignments were constructed with MEGA 5 [29], in order to detect terminator sequences.
The ratio of polymerases terminating at a Gquadruplex is discussed in Additional file 1 (Section 1).
To be formal, we should include the RNA polymerase elongation rate in unknown parameters, and the rate should also be varied. Because of large computations, in current study we examined only two values of the rate, 200 nt/s and 500 nt/s. The rate is unlikely to be less than 200 nt/s, but values greater than 500 nt/s are possible (see the next paragraph). A preliminary prediction of the model is that the polymerase elongation rate is 500-800 nt/s.

The elongation rate of phage type RNA polymerase (NEP)
The NEP rate is a key parameter in our model, but no experimental data of this rate is available to the best of our knowledge. The rate of replication fork progression in E. coli is 1500 nt/s, which is assumed to be the maximal NEP rate. A lower bound for this rate can be Table 3 The mTERF terminator passage probabilities on the heavy and light strands Rate, nt/s p q L 1 n (Frog1) The minimum of L 1 n for three frog specimens, and minimum of L 1 n(total) are shown. For comparison, the same minima are given for the RNA polymerase elongation rate of 200 nt/s. estimated from the ratio of length E of the first exon and length I of the first intron in protein-coding genes in plastids of the Streptophyta lineage. In plastids, transcription and translation processes are concurrent, and transcription of the first intron must end before translation of the first exon starts. Therefore, in the absence of translational regulation, the ratio of the polymerase and ribosome movement rates must exceed (E + I)/E. Such regulation was predicted for genes atpF, clpP, petB, petD [30]. For genes with short first exons, a delay in translation initiation must occur. Therefore, to estimate the lower bound of NEP rate, only genes unlikely to possess such regulation or trans-splicing should be used (e.g. rpoC1, infA, ndhA, ndhB in Arabidopsis thaliana). NEP-transcription is experimentally shown for rpoC1 [31] and can be suggested for other genes because of   In columns: "time" gives absolute times of embryonic development; "mTERF" and "LSP1" are intensities of the mTERF factor binding with the termination site and cooperative binding with the promoter, respectively, under the polymerase rate of 500 nt/s; further columns are modeled (mod) and experimental (exp) gene transcription levels and their relative deviations (dev) obtained with equation (9). Variables are in bold. The size of phage type RNA polymerase (NEP) The NEP size is assumed equal to the size of the phage T7 polymerase. The promoter size was experimentally identified as −14 to +1 bp (relative to the transcription initiation site) in mutagenic studies of the phage T7 NEP ortholog in tobacco chloroplasts [7]. The −15 position indicates a slight effect on the promoter quality [7]. Footprinting studies suggest that 15 DNA bases are occupied by NEP or that 11 bases are unpaired [8]. The estimate of 15 bases was obtained in X-ray structure analysis of the phage T7 polymerase [9]. The current study assumes the NEP occupancy to be −15 to +1.

Model of a polysome and ribonuclease interaction (auxiliary model)
To explain the MELAS phenotype, an auxiliary model was developed that describes the interaction between polysomal mRNA and ribonuclease.
The following expression describes time τ of any mRNA half-life, in terms of elementary probabilities (see Additional file 3): where λ ¼ νN 1þαN is the intensity of ribosome binding to its site, ν is a specific intensity under low N, where N is the number of ribosomes in a healthy human mitochondrion, α is a Michaelis-Menten dependency parameter (saturation over λ occurs at high N and equals ν α ). Then, w is the ratio of the linear size h (in codons) of the ribonuclease on mRNA to the rate V of the ribosome elongation (V = 15 codons per second, h = Vw = 15w), d is the ratio of size h 1 of the ribosome on mRNA to the ribosome elongation rate V (h 1 = 10 codons, h 1 = Vd), μ is the interaction intensity of the ribonuclease with a specific mRNA site that leads to RNA cleavage. It is clear that w cannot be less than 1/15 s, and it is likely less than 4/3 s; N depends on the expression of other genes, especially ribosome genes, its value is inferred in the main model as absolute concentrations of 12 S or 16 S rRNA. Parameters ν, α and μ depend on the mRNA sequence. Unfortunately, values of ν and α are unknown. Note that ν and α depend on the ribosome binding site, i.e., in the general case those are ν j and α j over j RNAs. It is plausible that μ is the same for healthy and diseased humans.
Although here only the ribonuclease is considered as a factor in mRNA cleavage, other factors can be considered analogously.
Similarly, in diseased human, the mRNA half-life τ′ is expressed in terms of N′, the number of ribosomes in the mitochondrion, thus The left hand of (6) is evidently more than one, τ/τ′ > 1. For at least one mRNA, the experimentally expected interval of τ/τ′ is 1.5-3 [26]. For that mRNA, it follows from equation (6), that a small change in the absolute number N of ribosomes greatly influences the mRNA half-life, and thus the amount of the corresponding protein. The model solution confirms that the half-lives of rRNA and most mRNAs are similar in healthy and diseased humans. The disease phenotype might require a sharp decrease in half-life of a single (even short) mRNA (see item 6 in Additional file 1, Section 3). This suggests a plausible explanation for the MELAS phenotype.
The intensity of RNA decay by the ribonuclease is given by: Each mRNA was assumed above to possess a single ribonuclease site. To the best of our knowledge the actual number and arrangement of cleavage sites on RNA remain to be determined. However, if the average number of cleavage sites per RNA for a given gene among all mitochondria in a tissue is k, then in the above equation κ is substituted by κk. This conclusion and equations (5) to (7) are derived in Additional file 3. An RNA window is defined as a region of a minimum length h = Vw, that separates two neighboring ribosomes, where Vw is the linear size of the ribonuclease on mRNA.
By combining equations (5) and (6) for different RNAs, ν j can be expressed in terms of w.

Comparison to the experiment
The distributions of variables u j ; t 0 ; t j are not experimentally known. This precludes us from estimating confidence for experimental values (1) or (2) on the basis of probability-theoretical methods which are usually applied to compare predictions with experiments. However, absolute errors can be used instead. Let Δ be an absolute error of b (expressed as (1) or (2)) and a is a model value for the same expression, then it can be verified if: The error Δ of expression value (1) or (2) is then estimated trivially [32] by using one of two common empirical relations. First, the error of an algebraic sum is if summand errors are statistically independent. Alternatively, Δ x AE y ð Þ≤Δ x ð Þ þ Δ y ð Þ. The error of product x Á y or ratio x/y is Δ x∘y if member errors are statistically independent. Otherwise, Δ x∘y ð Þ≤ x∘y ð ÞÁ Δ x ð Þ=xþ ð Δ y ð Þ=yÞ. Therefore, either a hypothesis of statistical independence is assumed to be true and equalities apply, or inequalities are used, in which case expression (8) becomes uncertain. Fortunately, both cases produce similar results. The experimental and model values of (1) and (2) and their errors are given in Table 5 and Table 7, assuming that constituent errors are either statistically independent, or not. The model predictions fall within the error intervals b AE 1:3Δ and b AE 2:4Δ , respectively (Table 5  and Table 7).
Importantly, experimental and predicted solutions from Table 4, Table 5, Table 7 differ "insignificantly" in another respect. The percentage concordance between values a and b (empirical and predicted) can be expressed as: The sign of this value indicates a decrease or increase of a relative to b. As a rule-of-thumb, changes in gene transcription level that are between −50% (i.e. halving) and +100% (i.e. doubling) are considered "insignificant" [33]. By this measure, almost all model predictions differ insignificantly from experimental data (Table 4, Table 5,  Table 7).

Model implementation
The model was implemented in C++ in two versions (command line interface and GUI) and is distributed freely under the GNU General Public License v.3 on the web page [34]. Similar to the earlier release [1], it is an event-driven automaton that simulates large combinations of interacting stochastic and deterministic processes in a DNA locus against modeled physical time. RNA polymerase binding is modeled as a stochastic process. The subsequent polymerase elongation in this study is modeled as a deterministic process with constant rate. The following collision events are modeled: (i) a polymerase or factor attempts to bind a previously occupied site, (ii) secondary structures attempt to form within a bound site, and (iii) two oncoming polymerases attempt to process the same nucleotide. Scenarios and rules of the collision's resolution are customizable model options.
The events in the model are handled chronologically, and ordered in a complex system of partially ordered queues. The program performance depends heavily on the rate of serving the queues.
Polymerase interaction was previously studied within short loci of a few thousand base pairs [1]. Here we model transcription along complete mitochondrial genomes of up to 18 kbp in length. The circular arrangement of this DNA introduces a novel scenario: polymerases can continue transcription until they complete several circles and collide, which considerably increases the number of simultaneously modeled events.
Another important aspect of our model is that we considered phage-type RNA polymerase. Although its elongation rate is not experimentally known it is likely higher than that of the bacterial-type polymerase. In plastids, transcription is carried out by polymerases of both types. However, a faster polymerase does not make a difference, as it cannot outpace or influence the slower polymerase. In this study, elongation rates of 200 nt/s and greater were assumed, which is an order of magnitude higher than in the earlier modeling attempts. This led to a higher rate of access to larger event queues and therefore to reduced performance. Queue processing was considerably improved in the current implementation by changing from a linear event queue to a system of partially ordered queues, which allowed us to attain previously observed levels of performance [1].
In our previous implementation, any polymerase collision with a factor or secondary structure terminated transcription [1]. The current model implements a new class of objects, protein terminators with nonzero passage probabilities in both directions; these probabilities p and q are the terminator parameters mentioned under Methods. The model constructs an individual trajectory in the event space and estimates gene transcription levels along the trajectory. Under the same set of model parameters, the levels are averaged over multiple trajectories. Computations can be effectively parallelized on a cluster with MPI v. 1.2 or newer. In this study the results were obtained on 2048 CPUs at the MVS-100 K supercomputer of the Joint Supercomputer Center of the RAS [35].
The inverse problem was solved by multi-objective optimization. For example, the response landscape for functional (4) is complex, with numerous valleys and local minima, which excludes the use of standard (e.g. gradient-based) local minimization techniques and requires heuristic solutions. Our approach is based on the following effective strategy: promoters in both strands of mtDNA are concentrated in a region containing only Phe-tRNA genes in human and rat, which produces opposite flows of polymerases that compete mostly outside this region. Therefore, a lack of transcription on one strand indicates that it is blocked by a heavy flow of polymerases along the opposite strand. Under the general assumption of non-zero transcription level, further increase of promoter binding intensities on the successful strand is not reasonable and does not lead to a better solution. This fact considerably reduces computational complexity. For any current set of model parameters, promoter binding intensities are varied in each direction only until transcription stops for one of the genes. This optimization strategy is referred to as "active search".

Results
Given an RNA polymerase transcription rate of 500 nt/s, one solution was obtained for each model organism, the frog (three embryos), human (healthy and diseased), and rat (eu-and hypothyroid).
In all cases the mTERF terminator passage probabilities (i.e., the portion of RNA polymerase passages through bound mTERF) were p = 0.0164 on the heavy strand and q = 0.0056 on the light strand, indicating a triple-fold polarization of the terminator.
In frogs the LSP1 promoter binding intensities mostly increase with time (Table 4 and Figure 4). Predicted and experimental transcription levels are in good agreement, with the exception of the estimate obtained with equation (9) for the first frog. In this case, at 96 hours of development the difference slightly exceeds +100%; for gene ND4 at all time points the difference slightly exceeds −50%, Table 4.
In healthy human the LSP, HSP1, HSP2 and mTERF binding intensities are 0.0031, 0.0031, 0.0126, and 0.6456, respectively (Table 5). In the case of MELAS syndrome the HSP1 and mTERF intensities drop 7.75-and 1.21-fold and become 0.0004 and 0.5336, respectively. The ratio R of gene 12 S to gene COX2 transcription levels is 24, and the ratio RNA/RNA − of weighted total RNA concentrations in healthy and diseased human is 1.18 (these denotations, R and RNA/ RNA -, are explained and justified in Additional file 1, Section 3). Transcription levels of tRNA-Phe and 16 S rRNA dropped in diseased human 3.84-and 1.2-fold, respectively. The decrease in tRNA-Leu and tRNA-Lys transcription was 1.2-fold, which is within experimental error. The functional value for optimal solution under all imposed additional conditions vs. only general conditions (ref. definitions in Additional file 1, Section 3) differs by 2.4%.
All predicted and experimental transcription levels in healthy human are within experimental error, except for CYTB, for which this error is exceeded by 29%. In addition CYTB exhibits a difference of around −50% in Figure 4 LSP1 promoter binding intensities plotted against time for three frog embryos. The LSP1 promoter binding intensities mostly increase in agreement with [19].
healthy human as estimated with equation (9), Table 5. We will revert to the case of CYTB in the Discussion.
In rats, we defined HSP as the total intensity of binding attempts to promoters HSP1 and HSP2, i.e. HSP = HSP1 + HSP2. In euthyroid rat, the binding intensities for LSP, HSP and mTERF are 0.1056, 0.0721 and 0.9453, respectively. In hypothyroid rat, HSP drops to 0.0336 ( Table 7). The ratio R of 12 S and COX2 transcription levels is 30.605 in euthyroid rat and slightly increases to 30.637 in hypothyroid one. Similarily, differences in predicted and experimental transcription levels between the eu-and hypothyroid animals are within experimental error. Differences obtained with equation (9) are insignificant, Table 7.
The model's solutions and comparisons to experimental values are presented in Table 3, Table 4, Table 5,  Table 7. The predicted absolute transcription levels of all genes are illustrated in Figure 5 and detailed in Additional file 2. Importantly, most predictions fall within experimental error. This definition, however, needs further justification. Specifically, a more accurate error estimation would require knowledge of the distribution of the experimental measurements (see Section 4 of the Methods).

Discussion
The high degree of evolutionary conservation of the mTERF factor and its binding site [25] allows estimates of parameters p and q (the terminator passage probabilities in both directions) to be extrapolated among chordates. However, other parameters should be interpreted with caution. For example, the mouse possesses the terminator D-TERM, which is unknown in human and rat, between promoters LSP and HSP1 in the 5'-leader region of the tRNA-Phe gene [36]. Homologs of this terminator cannot be detected by sequence similarity even across close relatives (see Additional file 1, Section 5). Our model's predictions agree well with experimentally derived transcription levels in the mitochondria of frog, human and rat, including disease states such as the MELAS syndrome in human and the hypothyroid condition in rat.
The model predicts the intensities of polymerasepromoter binding, properties of the mTERF transcription terminator and absolute transcription levels of all mitochondrial genes ( Figure 5 and Additional file 2). This is in contrast to the experiment which only provides relative measures of transcription of selected genes. An argument in favor of our model is the fact that all terminators are important in the prediction of transcription. Notably, excluding any of the terminators from modeling leads to predictions that intuitively are inadequate, e.g., transcription of only one DNA strand.
With a polymerase transcription rate of 500 nt/s, the model conforms better to the experiment (i.e. most predictions fall within experimental error) than with a rate of 200 nt/s. However, the rate choice needs further justification both in silico and in the experiment.
The concentration of the mtTFA transcription factor increases monotonically during early embryogenesis in frog [18]. mtTFA is a universal activator, and the expected increase of its binding intensities to all promoters is accurately predicted by the model (column LSP1, Table 4). There is no similar evidence for the mTERF factor, and the model does not predict the monotonic change either.
In mammals, the efficiency of the HSP1 promoter is predicted to be markedly greater than that of the HSP2 promoter on the heavy strand [5]. Notably, this result is not supported in our model, where HSP2 is 4 times more efficient than HSP1.
In human, the predicted absolute transcription levels of protein-coding genes are unexpectedly low. One transcript is produced every 15-26 minutes, depending on the gene ( Figure 5, Additional file 2), while the entire mitochondrial genome is transcribed in 33 s under the polymerase elongation rate of 500 nt/s. Such selective transcription agrees with previous measurements of absolute mRNA concentrations [37].
Furthermore, in human, transcription levels of proteincoding genes on the light and heavy strand are predicted to be very similar. This suggests that polymerases that initiate at promoters HSP1 and HSP2, and that pass the mTERF terminator collide very rarely. Such counterflows of polymerases are also rare for LSP-initiated polymerases that passed the first G-quadruplex terminator. There is negligible competition between free and bound polymerases for the promoter, because under high elongation rates and low binding intensities each promoter becomes available long before the next binding attempt. The probability that a promoter bound by a polymerase continues into the next circle is also low: in 9 hours of modeled time there are only 1 ± 1 such polymerases on the light strand and 23 ± 6on the heavy strand (average values ± unbiased standard deviation over n = 1000 trajectories). A possible biological explanation of such low polymerase competition in mitochondria of human is minimization of DNA damage during collisions. In plastids, the bacterial-type polymerases are considerably slower but collide frequently [1]. The bacterial-type polymerases inherited by mitochondria from their α-proteobacterial ancestors might have been lost and replaced by faster phage polymerases to reduce DNA mutation rate [38] (Part 3, Section 9.7).
In frog and rat, on the other hand, polymerase competition is more pronounced (see Figure 5 and Additional file 2), which can be explained by differences in RNA half-lives.
In human with MELAS syndrome, the model predicts a 1.21-fold decrease of the mTERFÁDNA binding intensity and a 7.75-fold decrease of the HSP1 promoter efficiency. Transcription levels of tRNA-Phe and rRNA drop 3.84-and 1.2-fold, respectively, with possible implications for the MELAS phenotype. For example, consider the influence of rRNA. Highly expressed mRNAs are normally entirely covered by ribosomes that protect them from adverse modifications. The decreased expression of rRNA predicted by our model may lead to decreased ribosomal coverage of mRNAs and cause increased transcript damage (see Section 3 of the Methods). Similarly, we can consider the influence of tRNA-Phe: underexpression of tRNA-Phe attenuates translation and, at the same time, widens the window between ribosomes on RNA (see Section 3 of the Methods).
Increased transcription of CYTB relative to the upstream genes reported from the experiment cannot be predicted in the model. However, in thiamphenicoltreated cells the CYTB mRNA half-life shows a high degree of variation [39] (Table 2). This suggests a possible experimental bias in measuring the time and thus the CYTB transcription level. A slight discrepancy of 6% between the model and experiment for expression of the human ND2 gene might be explained by a similar bias in estimating half-lives [39] (Table 2).
In the model, intensities of the mTERF binding and LSP transcription initiation are equal in eu-and hypothyroid rats. In experiment, methylation of the mTERF binding site remains unaffected, and of the LSP promoter exhibits minimal changes [14], which agrees well with the stability of their predicted binding intensities.
Conversely, in the model, the total intensity HSP of transcription initiation from promoters HSP1 and HSP2 is 2.15-fold lower in the hypothyroid rat. In experiment, methylation of the HSP1 region changes considerably, but methylation of the HSP2 changes negligibly [14], which also agrees with the predicted decrease of transcription initiation from promoters HSP1 and HSP2.

Conclusions
Our previous analyses [1] and current study demonstrate that the model produces results in good agreement with experimental evidence from plastids and mitochondria. It accurately predicts the RNA polymerase binding intensities, transcription terminator characteristics, and absolute transcription levels of all mitochondrial genes.
Individual gene transcription intervals are predicted to be long in human (15-26 min) and rat (2-10 min), but short in frog (8-25 sec). RNA polymerase competition is shown to be negligible in the mitochondria of human but evident in rat and frog, albeit much less intense compared to that in plastids. Advantages of the phagetype vs. bacterial-type RNA polymerases in mitochondria are suggested and discussed.
In hypothyroid rat, we describe how changes in methylation patterns of the mTERF binding site and three promoters correlate with intensities of the mTERF binding and transcription initiations.
We have also proposed a polysome-ribonuclease interaction model (see Section 3 of the Methods) and factors explaining the MELAS phenotype development in human.
Additional file 2: Supplement 2. Transcriptions predicted during 9 hours of modeled physical time.
Additional file 3: Supplement 3. Details of the auxiliary model.

Competing interests
The authors declare no competing interests.
Authors' contributions VAL, OAZ and AVS proposed the model, estimated its parameters, chose source data and participated in the analysis of results. VAL, SAP and AVS wrote Additional file 3 including derivation of formulas (5)- (7). LIR actualized the model implementation: wrote software, performed the computations and participated in the data preparation and analysis. OAZ performed the sequence alignment, the statistical analysis and participated in the data preparation and the computations. VAL, OAZ, AVS and LIR wrote the manuscript. All authors read and approved the final manuscript.

Reviewers' comments
Reviewer's report 1 -Dr. Anthony Almudevar The authors apply a parametric model for RNA polymerase interaction during transcription, developed in an earlier paper [1], which successfully predicted changes in gene transcription levels attributable to various experimental perturbations. The model is applied to new data, and appears to represent a significant extension of the model. The reported predictions are quite interesting, but the paper as a whole seems very hastily written, often consisting of a sequence of brief paragraphs. This is especially true of Section [4. Modeling Procedure] on Page 13. The paper needs to be significantly reorganized.

Response.
A sequence of brief paragraphs is justified by the intention to concisely list various experimental data and to state what the model is based on. Generally, the manuscript was reorganized (refer to response #2 to Reviewer 2), and currently this sequence is distributed between the main text and Supplement 1. I think it would also be important to introduce a careful definition of the model, introducing notation for all parameters and observations at one place. The paragraph on page 8 beginning "The solution is a set of parameters …" introduces λ, p and q, but more model elements are introduced at various subsequent places in the paper (including the Supplementary). It would better to see the full extent of the model, including a clear definition of the parameter space, at this point. Response. The model definition in Section 1 of the Methods is extended and all model parameters are now grouped. Additionally, the software setup and execution parameters are briefly described. Quality of written English: Needs some language corrections before being published.
Response. Done (see the last response to Reviewer 2). Reviewer's report 2 -Prof. Marek Kimmel The model is an application of the previously published more general model of interaction of RNA polymerases, protein factors and secondary structure during transcription. Current version is limited to phage-type polymerases and transcription factors in the mitochondrial genomes of human, rat and frog. Response #1. The current version of the model is more sophisticated: it now accounts for a protein-dependent terminator (mTERF) and completion of more than one circle on the mtDNA strand by RNA polymerases. Furthermore, the polymerase interaction described in [1] was not necessarily applicable to mitochondria, as the model behaves differently: it predicts high polymerase competition in plastids, and negligible competition in human mitochondria. Moreover, modeling the same mechanism in mitochondria provides explanations for the MELAS phenotype and the phenotype caused by hyposecretion of thyroid hormone. Although, different functional phenomena were explained in [1]. The model is based on comprehensive data and a detailed analysis of inferred parameter values. This is an interesting paper and it may be ultimately published. However, as it is now, it is overloaded with detail on one side and is missing some important information on the other. Accordingly, in my opinion, most of the details included in Sections 4-6 of the Background, should be moved to an online Supplement. Same is true of most of Section 4 of the Methods. What is to remain in the main body, are pointers to respective sections of the Supplement. Of Figures 1, 2 and 3, one might remain in the paper body. Background section is missing a description of the principles of the model. Instead, the authors seem to have decided that it will be enough to cite an earlier paper and defer the description (in a rudimentary form) to Section 5 of the Methods. This is unfortunate and I think should be corrected by expanding this material to a full-fledged description and moving it to the Background or to the beginning of the Methods. The part of Discussion in pp. 23-24 belongs more to the model description or to the Methods. After this "cleanup", the manuscript will be much more legible. Response #2. Sections 4, 6 of the Background and section 4 of the Methods have been moved to Supplement 1 (Sections 1-3). Discussion in pp. [23][24] has been moved into the model description (Section 1 of the Methods). The model description in Section 1 of the Methods has now been extended and all model parameters grouped together. Detailed remarks Page 8. "Ann attempt is successful if the promoter is not occupied by a polymerase or factor." This seems unclear; the factors are supposed to attract polymerases (?) Response. Transcription factors (proteins) binding DNA close to a promoter often act as activators or repressors of transcription. Such factors are common in plastids and mitochondria. In the mitochondria of metazoa and some protozoa, the mTERF factor has an important function as both a terminator (not bound to the promoter) and an activator. Modeling predicts the intensities of mTERF binding and transcription initiation attempts for all promoters that unlikely can be measured directly in the experiment. Page 8, bottom. Define "contents". Response. The "relative RNA content" is replaced with «relative RNA concentration»: u ij is the RNA concentration of the j-th gene at i-th time point relative to the same concentration at the null time point, i.e., the ratio of two concentrations sampled during an experiment. Notably, the