### A generic phase diagram for replicating molecules

In this section, we briefly summarize our RNA polymerization model [26] and show how the phase diagram is obtained in this case. We then introduce a very simple generic model for replicating molecules and show that the phase diagram is equivalent.

In the RNA system, we suppose that precursor “food” molecules are available in the environment at concentrations *F*_{
1
} and *F*_{
2
}. We suppose that monomers, denoted by *A*, can be synthesized from *F*_{1,} with rate constant *s*. These monomers can react with *F*_{
2
} to produce activated monomers, *A**, with rate constant *a*. RNA polymers of length *n* are denoted *A*_{
n
}. An activated monomer can react with a polymer to extend its length by 1 with rate constant *r*. All molecules can escape from the system at a rate *u*. If the system is well-mixed, it can be described by the following ordinary differential equations:

\frac{dA}{dt}=s{F}_{1}-a{F}_{2}A-rA{A}^{*}-uA

(1)

\frac{d{A}_{n}}{dt}=r{A}^{*}\left({A}_{n-1}-{A}_{n}\right)-u{A}_{n}

(2)

\frac{d{A}^{*}}{dt}=a{F}_{2}A-r{A}^{*}\left(A+P\right)-u{A}^{*}

(3)

In Equation 3, *P* is the total polymer concentration of all lengths: P={\displaystyle {\sum}_{n\ge 2}{A}_{n}}. We assume that there is a minimum length *m* above which polymers have the possibility to act as catalysts for the polymerization step. The concentration of polymers of at least length *m* is {P}_{m}={\displaystyle {\sum}_{n\ge m}{A}_{n}}. The polymerization rate constant is

r={r}_{0}+k{P}_{m}\text{,}

(4)

where *r*_{0} is a small spontaneous polymerization rate and *kP*_{
m
} is the term due to ribozyme catalysts. The constant *k* is the catalytic efficiency of the long polymers, which may be written as *k = k*_{
0
}*f*, where *f* is the fraction of the long polymers that function as catalysts and *k*_{
0
} is the catalytic efficiency of a functional catalyst.

We refer to Eqn. 4 as the feedback equation, because it says that the presence of catalysts feeds back into the polymerization rate and increases the rate of formation of more catalysts. The stationary states of the molecular concentrations can be obtained from Equations 14 using the methods described previously [24]. The key point is that there is a dead (or non-living) state in which *P*_{
m
} is very small and *r ≈ r*_{
0
}, so that polymerization occurs at the spontaneous rate, and there is a living state in which *kP*_{
m
}*>> r*_{
0
}, so that polymerization is autocatalytic and occurs at a much higher rate than the spontaneous rate. Figure 1 shows a phase diagram as a function of *k* and *r*_{
0.
} We are most interested in the region where both states are stable. The bistable region occurs if *r*_{
0
} is fairly small and *k* is fairly large. If *k* is too small, only the dead state is stable, and if *k* is too large, only the living state is stable. Furthermore, if *r*_{
0
} is large then there is only one stable state for all values of *k*, and this changes smoothly from dead to living as *k* is increased across the dotted line in Figure 1. An example of this phase diagram for different parameter values has previously been given [26]. We have also considered the case where feedback is in the monomer synthesis rate [27] and have extended the model to deal with the possibility of chiral monomers [28]. Bistable regions were also found in these cases. A bistable region equivalent to ours was observed in related models for autocatalytic polymers [29–31], although no explicit phase diagram was drawn. When designing our original model, we were aware of the toy model for the origin of life by Dyson [32]. This model is described in a very abstract way with no explicit chemical reaction equations. Nevertheless, the central ideas of living and dead stable states and a stochastic transition between these states exist in this model, and the phase diagram of Dyson’s model also has a bistable region similar to ours.

Having seen that the kind of phase diagram in Figure 1 occurs in many different models, we now wish to introduce the simplest model we can that has the same phase diagram. The simplified model will then be useful to investigate the dynamic properties of the stochastic transition to life in the following sections of this paper.

We consider a single type of replicating molecule whose concentration is *ϕ*. Replicators can appear at rate *s*, representing a slow rate of spontaneous synthesis by random polymerization. This process is independent of the replicator concentration. Existing replicators may be copied with a rate constant *r*, representing a process of non-living template-directed synthesis. This process is proportional to the current replication concentration, process is proportional to the current replication *ϕ*. Replicators may also act as polymerases that catalyze replication by using another replicator as a template. The rate constant for this catalytic process is *k* and this process is proportional to the square of the current concentration, *ϕ*^{2}. The increase in replicator concentration is limited by finite resources. The simplest way to model this is to assume a finite carrying capacity of the system, corresponding to a concentration *ϕ* = 1, and to multiply all the growth rates by a factor (1-*ϕ*). Finally, replicators die (or are destroyed or removed from the system) at a rate *u*. The deterministic dynamical equation for *ϕ* is:

\frac{d\varphi}{dt}=\left(s+r\varphi +k{\varphi}^{2}\right)\left(1-\varphi \right)-u\varphi

(5)

In many models of population dynamics in evolution and ecology, the linear growth term, *rϕ*, is natural, and the spontaneous term, *s*, and nonlinear term, *kϕ*^{2} are not considered. However, when considering the origin of life, the spontaneous term is essential for generation of the initial replicators, and the nonlinear term is essential in order to give the possibility of two stable states. The linear term is less important; therefore we will first consider the case where *r* = 0. The stationary states are the roots of this cubic equation. If *s* is small and *k* is fairly large, there are two stable states, *ϕ←*_{
1
} and *ϕ*_{
2
}, separated by an unstable state *ϕ*_{
3
}. The lower stable state is a dead state, controlled by the balance between spontaneous generation and death: *ϕ∼s*/*u*. The upper stable state is a living state controlled by the catalytic term *k*. If *k* is large, the concentration will approach the carrying capacity, *ϕ*_{
2
}*∼*1. It is easy to obtain phase diagram for existence of dead and living states as in Figure 2. This has the same regions as Figure 1. If the linear growth term *r,* is non-zero, the positions of the phase boundaries move, but shape of the diagram is qualitatively unchanged (also shown in Figure 2). We therefore argue that this kind of phase diagram is a generic feature of replicator models for the origin of life, and that Equation (5) is the simplest possible dynamical equation that has this phase diagram.

### The two’s company model

The main aim of this paper is to consider the influence of spatial concentration fluctuations and limited diffusion rate on the stochastic transition that leads to life. Therefore, we will now define a spatial lattice model, which we call the Two’s Company Model, that is a spatial version of Eqn. 5, and reduces to Eqn. 5 when the system is well mixed.

We consider a 2D square lattice of *L×L* sites. The number of molecules on any one site may be *n* = 0, 1, 2 or 3 only. The carrying capacity of the whole lattice is therefore 3*L*^{2}. If there are *N* molecules on the lattice, the mean concentration relative to the carrying capacity is *ϕ = N/*(3*L*^{2}). There are 3-*n* vacancies on a site with *n* molecules. Vacancies are treated as resources, and the rates of spontaneous, linear, and replicative growth are all proportional to the number of vacancies. The rate of adding a molecule by the spontaneous reaction is defined as *s* times the number of resources, *i.e. s*(3-*n*). The rate of adding a molecule by the linear growth process is defined as *r*/2 times the number of molecules times the number of resources, *i.e. rn*(3 − *n*)/2. This is equal to *r* when *n* = 1 or 2, and zero otherwise. The rate of catalytic replication is defined as *k/2* times the number of ways of picking a replicator-template pair times the number of resources, *i.e. n*(*n* − 1)(3 − *n*)*k*/2. This is *k* when *n* = 2 and zero otherwise. The replication process in the model can only occur when *n* = 2. “Two’s company, three’s a crowd”; hence the name of the model. Combining the three birth processes, we obtain the total birth rate of molecules on a site with *n* molecules, as summarized in Figure 3. The death rate of a molecule on a site with *n* molecules is *un* (i.e. *u* per molecule).

Hopping of molecules between sites is implemented using either local or a global hopping rules in the following way. Each molecule attempts to hop at rate *h*. A destination site is chosen for the molecule. In the case of local hopping rules, the destination is chosen at random from the 8 neighbouring sites of the original site. In the case of global hopping rules, the destination site is chosen at random from all the other sites on the lattice. The molecule hops successfully to the destination site if it finds a vacancy there, *i.e.* the hop is successful with probability 1-*n*/3 (as shown in Figure 3). If the hop is unsuccessful, the molecule remains on its original site. Further details of the stochastic dynamics used to implement the model are given in the Methods section.

The local hopping rules simulate local molecular diffusion, which is an important part of this model. The global hopping rules are intended as a comparison that helps to illustrate the importance of local hopping. The introduction of local hopping rules means that correlations exist between the numbers of molecules on neighbouring sites. However, if *h* is sufficiently large, the system reaches a well-mixed limit where molecules are randomly positioned over the whole lattice and there is no remaining correlation between sites. In the case of global hopping, there is no correlation between sites, even when *h* is small. Below, we will discuss the mean field approximation, which is exact for the global hopping dynamics but it is only an approximation for the model with local hopping dynamics. In the Methods section, we show that when *h* is very large, both global and local hopping models reach the same well-mixed limit, which corresponds to the generic replicator model in Equation 5.

Simulations are initiated in the dead state and followed until a transition to the living state occurs. The concentration in the dead state should be close to the stable solution *ϕ*_{
1
} of Equation 5. In order to initiate the stochastic simulation in the dead state, each lattice site is seeded with molecular numbers sampled from a binomial distribution with average concentration *ϕ*_{
1
}. In a typical simulation, the system remains in the dead state for a long time until a localized patch of high concentration arises that is sufficiently large to be stable in the living state. The origin of this living patch is a rare stochastic event. However, once it is formed, the living patch then spreads deterministically across the whole lattice.

A snapshot of a simulation shortly after the transition to life is shown in Figure 4. Molecules that were synthesized by the spontaneous process are coloured grey, and molecules that were synthesized by the catalytic process are coloured red. These colours are used for illustration, but all the molecules have identical properties. In Figure 4, a living patch has arisen in one region. Within this region, there is a high concentration (most sites have *n =* 2 or 3), and most molecules are red, indicating that this patch is sustained by catalytic replication. The rest of the lattice is still in the dead state, where there is a low concentration (most sites have *n =* 0 or 1), and most molecules are grey. Small clusters of red molecules are nevertheless visible within the dead region. These clusters appear and disappear rapidly. The origin of life occurs when a cluster of this kind becomes large enough to be stable and to continue to grow. If the spontaneous rate *s* is suddenly switched to zero in the configuration shown in Figure 4, then the isolated molecules in the dead state rapidly disappear, but the living patch remains virtually unaffected and continues to spread. The spontaneous process is not necessary to sustain the living state because the living state is autocatalytic. Of course, the spontaneous rate must be non-zero in order to allow the transition to the living state in the first place.

We performed many simulations of this model in order to measure the way the time taken for the transition depends on the parameters. We define *T*_{
sys
} = *T*_{
reg
} + *T*_{
spread
}, where *T*_{
reg
} is the time until a local transition occurs in any one region of the lattice, and *T*_{
spread
} is the time for the living state to spread from one region across the rest of the lattice, and *T*_{
sys
} is the time required for the full lattice to reach the living state. These times were measured as described in the Methods section.

Figure 5 shows the average *T*_{
sys
} as *h* is varied with a fixed system size. A comparison of the transition times with local and global hopping rules is shown for a small lattice with *L* = 10. For large *h* the measured times for local and global hopping converge, showing that the system is well mixed. For smaller *h*, the transition time with local hopping is always shorter than that with global hopping. This shows that the local concentration fluctuations that arise in the local hopping model make the transition much easier. It can be seen that if *h* is very small, the transition time becomes very long, both for local and global hopping. It should be remembered that the spontaneous generation rate *s* is very small, so the concentration of molecules in the dead state is very small. For the catalytic process to occur, it requires two molecules on the same site. If *h* is too small, molecules do not encounter one another frequently. Furthermore, if molecules do encounter one another and a replication occurs, there will now be three molecules on one site, which blocks further replication. It is necessary for one of these molecules to hop away before a second replication can occur; hence, *h* should not be too small. It can be seen that, for the local hopping rules, there is an optimum diffusion rate at which the transition is fastest. This effect is not very strong in the small lattice (*L* = 10), but is very pronounced in the larger lattice (*L =* 100) also shown in Figure 5. The transition time decreases by several orders of magnitude in the middle of the range of *h*. It is not possible to show the transition time for the global hopping case for the larger lattice size because it is too long to measure accurately in the simulation with these parameters. In summary, Figure 5 shows that local diffusion is essential in order to allow the transition to life to occur at a reasonable rate in large systems, and when diffusion is local, there is an optimal intermediate rate of diffusion.

Figure 6 shows the variation of transition time with system size when the hopping rate is kept fixed. For the global hopping case, it is only possible to do these simulations for fairly small lattice sizes, because the transition time increases very rapidly with *L*. For the local hopping case, it is possible to carry out simulations over a much wider range of *L*, and it can be seen that the transition time actually decreases with *L*, if *L* is sufficiently large.

As shown in left part of Figure 6, when the system is small, it is more or less well mixed. Hence the transition will happen globally across the whole system at the same time. When the system is larger, it is no longer homogeneous. The transition will happen at a local region first, then it will spread out to the whole system. The number of independent local regions should be proportional to *L*^{2}. The average time taken for the transition to occur in any one of these regions should vary inversely with the number of regions, *i.e. T*_{
reg
}*~ 1/L*^{2}*.* If the spread of the living state is rapid, the spreading time, *T*_{
spread
} is small compared with the time taken for the transition to occur, *T*_{
reg
}. In this case the time taken for the whole system to go through the transition is close to *T*_{
reg
}. It can be seen that there is an intermediate range of *L* in Figure 6, where *T*_{
reg
} and *T*_{
sys
} are very close to one another and where both scale as 1/*L*^{2}, as expected from this argument. Comparison of the results with *h* = 24 and *h =* 45 shows that when diffusion is slower, the transition from homogeneous system to heterogeneous system happens for a smaller system size.

If the system size is extremely large, there comes a point where *T*_{
spread
} is comparable to *T*_{
reg
}, and the curves for *T*_{
sys
} and *T*_{
reg
} separate in Figure 6. At this point it is possible for the transition to life to occur in a second region independently before the living state from the first transition has spread across the lattice. Beyond this point, *T*_{
reg
} continues to decrease, but *T*_{
sys
} reaches a constant value, which is what we expect if there are multiple origins happening in different places.

A useful way to understand the effects of the limited diffusion rate in this model is to calculate the mean field solution. In the Methods section, we calculate the probability *P*_{
n
} that there are *n* molecules on a site, using a mean field approximation that ignores correlations between neighbouring sites. The mean field solution depends on *h* and it is not equivalent to the well-mixed case, except for the limit of large *h*, where both the lattice model and the mean field approximation converge to the well-mixed case. We expect that the mean field approximation will be fairly good for the local hopping model if *h* is large, because correlations between neighbouring sites should be small in this limit. For the global hopping model, the mean field solution is exact, even for small *h*.

In Figure 7, we compare the solutions of the mean field equations with dynamical simulations. The black curve is the homogeneous solution from the roots of the cubic Equation 5. This is the well mixed limit of the spatial model. The solid black lines illustrate the stable living and dead states and the dashed line is the unstable intermediate state. The red, green and blue lines show the stable and unstable solutions of the mean field equations with three values of *h*. For large *h*, the mean field solution is close to the well-mixed limit, as expected. The symbols show average concentrations of molecules obtained from simulations of the lattice model with local hopping. For the living state, the mean field theory gives a fairly good approximation to the result with local hopping, even for the smaller values of *h*. This is because the density is quite high across the whole lattice in the living state and correlations between neighbouring sites are quite weak. On the other hand, there are large deviations between the mean field theory and the results of the local hopping model in the dead state when it is close to the bistable region, which shows that there are important local spatial correlations in this case.

Figure 7 illustrates that reduction in the local diffusion rate has both positive and negative effects on the replication process. As *h* is reduced, the concentration of the living state is reduced because there is more local interference between molecules on fully occupied sites. Also the minimum value of *k* required for the living state to be stable increases as *h* decreases. Thus, rapid diffusion is favourable for the living state, if the living state is already established. However, we have already seen in Figure 5 that high diffusion rates cause the system to be well mixed, in which case the transition time is extremely long. It is therefore important to have a diffusion rate that is not too large in order for the transition to the living state to occur at an appreciable rate in the first place.

### A spatial model for RNA polymerization

The Two’s Company model is the simplest model that can be used to study the stochastic transition to life in a spatially distributed system. However, we wish to show that the same behaviour occurs in other models. We will return to the model of RNA polymerization described in Equations 1–4 in order to study the effects of diffusion and spatial concentration fluctuations in this case.

The simulation is carried out on 2D square lattice composed of *L* × *L* sites, with each site defined to have a size *l* = 1. We keep track of how many molecules of each kind there are on each site. The reactions of monomer synthesis, activation, and polymerization all occur locally on individual lattice sites. The implementation of stochastic dynamics for this model using the Gillespie algorithm is described in our previous paper [26] for a non-spatial model. In the spatial case, the same method is used to define reaction rates on each site as a function of the numbers of molecules of each type on each site. For simplicity, molecules of all kinds are assumed to hop with the same rate *h*. Hopping may either be local or global, as with the Two’s Company model. However, there is no need to impose a maximum number of molecules per site, because concentration is limited by food input rather than by a carrying capacity. Therefore every attempted hop is successful, and hops are not blocked by the molecules on the destination site, as they are for the Two’s Company model.

Figure 8 shows snapshots of the distribution of catalytic polymers across the lattice. Three time points are chosen in order to illustrate the way the catalyst concentration increases over time. As shown in first row of Figure 8, when system is small (*L*=30), the system is fairly well mixed. Hence the transition happens across the whole system at the same time. When system is larger (*L*=100), the system is no longer homogeneous. The transition happens at a local region first, then it spreads out to the whole system as shown in second row of Figure 8. When the system is very large (*L*=1000), the time it takes to spread across the whole system is longer than the regional transition time. Hence, multiple origins occur, as in third row of Figure 8.

When the transition occurs, the increase in the catalyst concentration is accompanied by a sharp decrease in the activated monomer concentration. As there are many more activated monomers than catalysts, the activated monomer concentration varies more smoothly than the catalyst concentration. Therefore, we used the decrease in the activated monomer concentration as a marker for the transition time. This concentration was averaged over a sliding time interval, and this was used to define regional and system transition times, in the same way as for the Two’s Company model (see Methods section).

The transition time is shown as function of system size in Figure 9. For global hopping, the transition time follows a U-shaped curve, for the same reasons as for the non-spatial model of RNA polymerization (Figure seven of [26]). If the system is too small, polymers long enough to be catalysts do not easily arise, and if the system is too big, the concentration fluctuations are too small in the global case to allow the stochastic transition to occur. In contrast, when hopping is local, the transition occurs easily for large system sizes. When the lattice is very small, the time for the transition with local hopping is close to that with global hopping, but when the lattice size is larger, the transition is much faster with local hopping. The transition time follows the same shape curve for this model as for the Two’s Company model in Figure 6. There is an intermediate range where the transition time scales as 1/*L*^{2} and where *T*_{
sys
}*= T*_{
reg
}, and there is a range for very large lattice size where multiple origins occur and where *T*_{
sys
}*> T*_{
reg
}.