### Confirmation of CIMP status

Based on observed patterns of hypermethylation, cell lines SW48 and RKO have been designated as CIMP cell lines in the literature [31]. The cell lines HT29 and HCT116 are thought to be non-CIMP cell lines [31]. We aimed to confirm these classifications in the context of our experiments. Hence, the baseline methylation level of the four cell lines was measured with respect to the 7 different sites: ALU, Line-1, APC1, RASSF2-1, HPP1, SFRP2, and MGMT. Results of these measurements are consistent with the previously established CIMP classification. The baseline methylation levels of HCT116 and HT29 are generally lower than those for SW48 and RKO.

### Observed patterns of *de novo*methylation

The cells were treated with 5-AZA for the purpose of demethylation. After treatment, we examined the temporal process of re-methylation with respect to the 7 sites. At several time-points post-treatment, we measured the percentage of methylation for each site for each cell line by quantitative pyrosequencing assays. The results of these measurements are presented in Figure 1. There, each of the 7 plots (a-g) corresponds to a different site, which is marked on the corresponding graph. The four lines on each graph correspond to the four cell lines, which are color-coded. The measured methylation level (as percentage) is plotted against time post-treatment. The baseline methylation levels for the cell lines are presented by horizontal lines of the corresponding color.

We can see that qualitatively, the majority of remethylation curves start at a level lower than their base methylation level, and then climb up with visibly different slopes. Some of the curves reach saturation (which interestingly may or may not coincide with the measured baseline methylation level), while others continue to climb during the whole duration of the experiment. Another noticeable feature of some of the curves is the presence of a time-delay. In fact, the majority of the experimental runs can be assigned to one of two different groups, as shown in Figure 1. In one group, the methylation level starts climbing up immediately upon the cessation of 5-AZA treatment. An example of such behavior is exhibited by line SW48 (the green line in Figure 1(a)). In the other group, there is a certain time-delay between the cessation of 5-AZA treatment and the point where the methylation process picks up. For example, the blue line (HCT116) in Figure 1(a) climbs up between time-points approximately 10 and 25. For the first 10 days post 5-AZA treatment, the process of remethylation is slow to gain momentum.

### Quantifying the *de novo*methylation kinetics

In order to quantify these data and to extract information about methylation rates, we performed fitting of these data with the function *f*(*t*) = *y*_{0} + *y*_{1} tanh(*mt* − *b*). This function is characterized by 4 parameters. Parameter *m* (days^{-1}) measures the methylation rate; the dimensionless parameter *b* characterizes the time-delay until the *de novo* methylation process starts to gain momentum, as described above; parameters *y*_{
0
} and *y*_{
1
} are related to the initial and target methylatin level, with *y*_{0} − *y*_{1} tanh(*b*) being the initial methylation level (after the 5-AZA treatment), and *y*_{0} + *y*_{1} the target methylation level. The choice of this functional form was dictated by the patterns seen in the methylation time-series and described above. Function *f(t)* is one of many possible functions capable of accounting for the two types of methylation patterns, with and without delay. The exact mathematical form of this function is unimportant, as long as the following requirements are met: (a) the function has the ability to capture a climb (gradual or step-like) from one level to another level and (b) has parameters which can be extracted to characterize the slope of the climb (the *de novo* methylation rate) and the amount of time delay. In the function *f(t)* chosen here, parameter *m* measures the *de novo* methylation rate, and parameter *b* characterizes the time-delay.

Figure 1 presents the results of fitting the function *f(t)* to the data, which for each case is given by a dashed line of the appropriate color. Out of the total of 4x7 =28 experimental runs, we were able to successfully fit 18: these include all the cell-lines in the context of ALU, LINE-1, HPP1, and SFRP2, as well as lines HT29 and SW48 in the context of RASSF2-1. We will refer to these experimental runs as “successful runs”. For gene APC1, the base methylation levels were too low to be included in the anlaysis. For gene RASSF2 − 1, cell lines HCT116 and RKO failed to re-methylate upon 5-AZA treatment (the measured methylation levels stayed more or less constant for RKO, and they actually showed a decline for HCT116). For gene MGMT, lines RKO and HT29 did not have a significant base methylation level, and lines SW48 and HCT116 did not exhibit a significant climb which would allow us to assess the methylation rate with confidence (in the case of SW48, the 5-AZA treatment did not lead to a significant decrease of the methylation level compared to the base level, and in the case of HCT116, the pattern of methylation was erratic leading to the failure of the fitting procedure). Therefore, we will proceed with the analysis of the data from the 18 successful runs.

Fitting the function *f(t)* to the data yielded the numerical values of the parameters *m* (remethyation rate) and *b* (the dimensionless methylation onset). We will first focus on the remethylation rate. The parameter *m* characterizes the “steepness” of the slope of the methylation time-series in the regions where *de novo* methylation process takes place (and it does not capture other aspects of the methylation process). Figure 2(a) presents a scatter plot of the measured re-methylation rates versus the baseline methylation level for the 18 experients. We performed a linear regression analysis of this scatter plot and determined that the methylation rate negatively correlates with the base methylation level, with the Pearson rank correlation of −0.499 and the Spearman rank correlation of −0.574 (for the sample size of n = 18). This correlation is significant, with the *p*-value of 0.035 (or *p* = 0.013 if using the Student’s t-distribution with the Spearman rank correlation). This surprising result suggests that the rate at which the methylation level climbs up after 5-AZA treatment is the lowest for the cell lines with the highest base-level of methylation. This can be clearly seen in Figure 3(a), where we plot the mean average methylation rate for the four cell lines against their mean base methylation level. In the figure, we group the four cell-lines investigated into two groups: lines HT29 and HCT116 comprise the non-CIMP group, while lines SW48 and RKO are classified as CIMP. This grouping is consistent with the characterization these cell lines received in the literature [31], and also corresponds to the higher base methylation levels for the CIMP cells. We can see that lines HT29 and HCT116 (which exhibit lower base methylation levels compared to their CIMP counterparts) are characterized by higher *de novo* methylation rates. Thus, CIMP cell lines show slower rate of re-methylation, while non-CIMP cell lines show faster rates of re-methylation.

To examine this phenomenon more closely, we performed further data analysis. While parameter *m* only provides information on how steeply *de novo* methylation curves climb up, it does not reflect the presence or the absence of a time-delay in the onset of the re-methylation process. To quantify these differences in a systematic way, we look at parameter *b*, which yields the dimensionless onset for remethylation. It turns out that there is a significant negative correlation between the base methylation level, and the onset parameter, *b* (the *p*-value is 0.038), see Figure 2(b) and 3(b). There is also a very strong (with *p = 2x10*^{-6}) positive correlation between the re-methylation rate, *m*, and the onset parameter, *b* (not shown). This suggests that the *de novo* methylation rate and the onset parameter (as specified by the function *f*) vary together, and a better description of the observed phenomena is provided by a function \tilde{f}\left(t\right)={y}_{0}+{y}_{1}tanh\left(m\left(t-b\text{'}\right)\right), where parameter *b’ = b/m* measures the delay time in days. The time-delay *b’* does not show a significant correlation with the base methylation levels (not shown). The function \tilde{f}\left(t\right) assumes explicitly that re-methylation time-series that climb faster, tend to experience a longer delay between the 5-AZA treatment cessation and the onset of *de novo* methylation.

To summarize this analysis, we can say that the CIMP phenotypes, which are characterized by higher base methylation levels, tend to show a slower *de novo* methylation rate, but experience a steady climb in methylation levels which starts soon after the cessation of 5-AZA treatment. In contrast, non-CIMP cell lines with a lower base methylation level tend to exhibit a certain delay in re-gaining their methylation status, followed by a relatively rapid increase in methylation levels. Roughly speaking, non-CIMP phenotypes have a spurt a of relatively rapid methylation increase following a relatively long delay. The methylator phenotypes start increasing their methylation levels relatively quickly and steadily, albeit slowly.

To gain further insights, we also performed a systematic correlation analyses of several other characteristics (see Appendix), which did not reveal any further statistically significant correlations. We also tested the hypothesis that *de novo* methylation rates are correlated with cellular kinetic parameters of the dividing cell lines, such as their division and death rates. No significant patterns have been found.

### A mathematical model to explain observations

A key finding from our quantitative analysis was that CIMP cells tended to start the methylation process immediately upon cessation of 5-AZA treatment, but did so relatively slowly, while non-CIMP cells showed a relatively fast burst of re-methylation but only after a certain time delay following treatment cessation. The existence of a time delay before a phase of accelerated re-methylation could indicate the existence of a negative feedback loop in the regulation of the *de novo* methylation process. The basic idea is as follows. If methylation levels in the genome are around a certain homeostatic setpoint, *de novo* methylation ceases to occur due to negative feedback, and only maintenance methylation takes place. On the other hand, if the methylation levels are significantly reduced, e.g. following 5-AZA treatment, release of negative feedback induces appropriate DNA methyltransferase (MTase) activity. The process of activation typically is not instantaneous but requires the interactions among several factors, leading to a delay [32–34]. MTase activation leads to a burst of *de novo* methylation, which is shut down again through negative feedback as methylation levels in the genome recover. On the other hand, it can be hypothesized that in CIMP cells this feedback regulatory mechanism is corrupted and the appropriate MTases are constantly active at a relatively low level, leading to slow but continuous *de novo* methylation, as reported experimentally [29, 30]. According to this scenario, re-methylation of CIMP cells would commence without a delay following 5-AZA treatment, and would proceed with a relatively slow rate because of the continuous activity of MTase. The exact MTase responsible for CpG island *de novo* methylation in cancer cells is debated. In the context of non-cancerous cells, it is thought that DNMT1 contributes mainly to maintenance methylation, while *de novo* methylation activity is ascribed to DNMT3a and DNMT3b [35, 36]. Work in human cancer cell lines, however, demonstrated that DNMT1 can exhibit *de novo* methylation activity for CpG islands [37], and DNMT1 has been shown to be up-regulated in different tumor types [38–40].

To investigate whether this hypothesis is consistent with the observed experimental patterns, we construct a mathematical model of *de novo* methylation in the context of negative feedback regulation of MTases that are responsible for *de novo* methylation. The model takes into account the following variables. The methylation level of the loci in question is denoted by *x*. The methylation level of loci that drive the negative feedback is denoted by *w*. These remain hypothetical loci for now. The molecular processes that regulate *de novo* methylation are not well understood [33, 41], although feedback mechanisms have been implicated in the regulation of DNMT1 [42]. It has been suggested that the methylation status of regulatory elements of DNMT1 determines the activity of this MTase, which could result in negative feedback. This element has been shown to be highly methylated in somatic tissues, and unmethylated in a mouse adrenal carcinoma cell line [42], consistent with the notions explored here. For simplicity, we will refer to such regulatory elements as “feedback sensors”, without assuming a particular mechanism that underlies feedback. Our model generally assumes a “sensor” that reduces and shuts down MTase activity if methylation levels in the genome rise towards some homeostatic level. The model can be adjusted as specific biological information becomes available. MTase activity that is required for *de novo* methylation is denoted by *y*_{
n
}. It is assumed that activation of MTase requires the interactions of different signaling components, which are denoted by *y*_{
i
}, where *i = 1…n-1*. The model is given by the following set of ordinary differential equations, which describe the time-evolution of these variables.

\begin{array}{l}\frac{\mathit{dx}}{\mathit{dt}}=\lambda {y}_{n}\left(1-\frac{x}{{k}_{1}}\right)-{a}_{1}x,\\ \frac{\mathit{dw}}{\mathit{dt}}=\gamma {y}_{n}\left(1-\frac{w}{{k}_{2}}\right)-{a}_{2}w,\\ \frac{d{y}_{1}}{\mathit{dt}}={\eta}_{\mathit{prod}}-q{y}_{1},\\ \frac{d{y}_{i}}{\mathit{dt}}=q{y}_{i-1}-q{y}_{i},\begin{array}{}\end{array}1<i<n,\\ \frac{d{y}_{n}}{\mathit{dt}}=q{y}_{n-1}-b{y}_{n}.\end{array}

The locus in question is methylated in the presence of active MTase (*y*_{
n
}) with a rate *λ*. It is assumed that *k*_{
1
} methyl groups can be added. During 5-AZA treatment, de-methylation occurs with a rate *a*_{
1
}. Similarly, *de novo* methylation of feedback sensors occurs with a rate *γ* in the presence of MTase, *y*_{
n
}, and 5-AZA treatment causes de-methylation of these loci with a rate *a*_{
2
}. The maximum methylation level of feedback sensors is given by *k*_{
2
}. Activation of MTase, *y*_{
n
}, occurs via a signaling cascade, *y*_{
i
}. Regulation occurs in the first element of this signaling cascade, *y*_{
1
}, which is produced with a rate *η*_{
prod
}. Details of this production term depend on the nature of the cell line. For non-CIMP cells, we assume the presence of negative feedback. Thus, if the methylation levels of feedback sensors lie below a threshold, *c*, production occurs with a rate *η(c-w)*. If the methylation level of feedback sensors rises above this threshold, the rate of production is set to zero. On the other hand, for CIMP cells, it is assumed that production of *y*_{
1
} occurs at a constant rate *η*. Once *y*_{
1
} is produced, it induces MTase activity through interactions with elements of the signaling cascade, *y*_{
i
}. Finally, MTase activity decays with a rate *b*.

We will concentrate on the parameter regime where the MTase activity is relatively short-lived in the absence of the activation signals in the model. That is, the parameter *b* is sufficiently large. This ensures that when the activation signal is switched off, *de novo* methylation ceases without significant delay. Model properties will be described first for non-CIMP cells and then for CIMP cells under this assumption.

For non-CIMP cells, the methylation of feedback sensors, *w*, always rises towards a level given by the parameter *c*, after which the negative feedback kicks in and *de novo* methylation stops. Hence, the methylation of *w* remains constant at this level. On the other hand, the degree to which individual loci become methylated before *de novo* methylation is shut down by negative feedback depends on initial conditions, as explained below. Figure 4a has been generated by starting the computer simulation with a completely un-methylated cell and allowing the genome to become methylated. In this case, the individual methylation of locus *x* stabilizes around 75%. This stable state is shown in Figure 4a before the start of 5-AZA treatment. Then, 5-AZA treatment is initiated in the simulation and maintained for 72 hours, after which treatment stops. Following the simulated 5-AZA treatment, both the site of interest, and the feedback sensors, become demethylated. However, the feedback sensors retain a higher level of methylation. This is compatible with the notion that different sites in the genome de-methylate at different rates in response to 5-AZA treatment. After cessation of 5-AZA treatment, after a certain time delay, active MTase levels rise and *de novo* methylation increases at a relatively fast rate, as seen in the experimental data. Over time, the methylation level stabilizes. In the simulation of Figure 4a, it stabilizes at a lower level compared to the base methylation level, as observed in some of the experimental patterns described here. This is a consequence of the assumption that the methylation level of feedback sensors, *w*, was reduced to a lesser degree than the methylation level of the locus under consideration, *x*. Therefore, upon re-methylation, feedback sensors reach their homeostatic set-point and shut down MTase activity before methylation of the locus under consideration has reached its pre-treatment base level. In general, the degree to which given loci become re-methylated following 5-AZA treatment depends on the exact state of the cell after treatment is complete. It can become less methylated, or it can achieve the same amount of methylation seen before treatment. Restoration of pre-treatment methylation levels is likely either if the amount of methylation of feedback sensors is reduced more during 5-AZA treatment, or of the loci in question become de-methylated to a lesser extend during treatment.

For CIMP cells, different dynamics are observed (Figure 4b). Before 5-AZA treatment, the methylation level of the locus *x* is not stable but rises at a slow rate. This is the consequence of the assumption that MTase activity is constantly on at relatively low levels and that feedback regulation is corrupted. As before, the simulation assumes 5-AZA treatment for 72 hours. Re-methylation commences instantly and occurs at a relatively slow rate, as observed in the experimental data. Again, the reason is that MTase activity is constantly on. Thus, after de-methylation, it does not have to be activated and hence re-methylation starts immediately. Similarly, because feedback regulation is corrupted, low levels of methylation do not induce a sharp rise in the methylation rate.

In summary, the mathematical model reproduces the key phenomena found in the data: Following 5-AZA treatment, non-CIMP cells re-methylate with a faster rate following a certain time delay, while CIMP cells start re-methylation immediately, although at a slower rate. Moreover, in agreement with the data, the model predicts that in non-CIMP cells, individual loci can re-methylate to levels that are lower than those found before 5-AZA treatment.