Epigenetic hereditary transcription profiles III, evidence for an epigenetic network resulting in gender, tissue and age-specific variation in overall transcription

Background We have previously shown that deviations from the average transcription profile of a group of functionally related genes are not only heritable, but also demonstrate specific patterns associated with age, gender and differentiation, thereby implicating genome-wide nuclear programming as the cause. To determine whether these results could be reproduced, a different micro-array database (obtained from two types of muscle tissue, derived from 81 human donors aged between 16 to 89 years) was studied. Results This new database also revealed the existence of age, gender and tissue-specific features in a small group of functionally related genes. In order to further analyze this phenomenon, a method was developed for quantifying the contribution of different factors to the variability in gene expression, and for generating a database limited to residual values reflecting constitutional differences between individuals. These constitutional differences, presumably epigenetic in origin, contribute to about 50% of the observed residual variance which is connected with a network of interrelated changes in gene expression with some genes displaying a decrease or increase in residual variation with age. Conclusion Epigenetic variation in gene expression without a clear concomitant relation to gene function appears to be a widespread phenomenon. This variation is connected with interactions between genes, is gender and tissue specific and is related to cellular aging. This finding, together with the method developed for analysis, might contribute to the elucidation of the role of nuclear programming in differentiation, aging and carcinogenesis Reviewers This article was reviewed by Thiago M. Venancio (nominated by Aravind Iyer), Hua Li (nominated by Arcady Mushegian) and Arcady Mushegian and J.P.de Magelhaes (nominated by G. Church).


Background
The phenomenon of nuclear programming; i.e. the persistent epigenetic system that controls and maintains the differentiated state; has been clearly demonstrated by the ability to clone animals via transfer of somatic cell nuclei into enucleated egg cells. Elucidating the process which underlies this important epigenetic mechanism is crucial for reprogramming somatic cells into pluripotent cells for biomedical and agricultural applications and invaluable in biological research on development, aging and disease, including cancer [1].
In order to understand the mechanism of epigenetic nuclear programming it is generally supposed that there exists a genome-wide complex network of higher-order gene regulation that controls global gene expression and, thereby, the cell's identity. At present, the methodology being used to help reveal this network of interacting genes involves studying the processes assumed to be of importance (e.g. nuclear architecture, chromatin structure, regulatory elements, DNA methylation, histone modification, genomic imprinting, etc.) and their connections with genome-wide transcriptome and proteome analysis.
This approach however, is only in its infancy, and there exists a need for additional data analysis strategies in order to obtain an integrated picture of nuclear programming.
The recent, serendipitous discovery of "epigenetic hereditable transcription profiles" suggests such an additional strategy [2,3] and will in the sequel be referred to as EPI-GENE (Epigenetic Programs in GENe Expression). The EPIGENE method does not try disentangle a large complex network of interacting genes in gene regulation, instead, it studies the variation in expression in a small group of functionally related genes.
To date, only the genes coding for a relatively simple cell organelle, the proteasome, have been studied. Surprisingly, a wide variation in the expression profiles of the genes between individual libraries was observed. This variation appears to lack a corresponding biological role, since the structure of the proteasome is, in all probability, the same in all individuals and tissues.
The main observations obtained using EPIGENE were that:--the deviations from the mean transcription profile for this set of genes in a database display a network of interrelated variations in expression of the genes -such a specific network is hereditary since it can be transmitted to daughter cells -different tissues demonstrate specific profiles -gender-specific deviations from the mean transcription profile are present -aging is correlated with specific changes in the transcription profiles -tissue samples demonstrating disturbed profiles are observed.
This data suggests that epigenetic programming is essential in the expression of the majority of the genes and can therefore be studied in any subset of functionally related genes. Furthermore it indicates that the specificity in these expression profiles occurs autonomously rather than by ongoing regulatory signals.
Confirmation of these observations will validate EPI-GENE as a useful strategy for studying several aspects of nuclear programming. In all probability, cell lineage trees could be identified by comparisons of transcription profiles. Such lineage trees might, for example, monitor foetal development or stages in carcinogenesis. The method might also prove useful in monitoring the reprogramming of somatic cells into pluripotent descendants and in identification of epigenetic abnormality.
The observations outlined above were obtained using a database derived from a large number of different tissues originating from different donors. The present investigation studies EPIGENE in a database derived only from muscle with the specific aims of: -validating and expanding results obtained from previous studies -improving EPIGENE analysis -identifying sources of variation in expression profiles and quantifying their contribution to the total variation For the present study, the transcription profile of genes that code for the 26S proteasome [4] was arbitrarily chosen as model system. Since a cellular organelle has a welldefined structure the genes are functionally related and it can be assumed that their expression is interconnected in such a way as to provide the correct amounts of the various components of the organelle. The characteristics of the genes and their location {as derived from Genatlas [5] } are given in Table 1.
The analysis proceeds as follows: -comparison of the two subgroups of the database: abdominal muscle versus skeletal muscles of the extremities -quantification of variation in expression and identification of contributing factors (page number not for citation purposes)  To assess whether a gender related variation in transcript abundances exists and also males and females have to be analysed separately an ANCOVA was calculated for M libraries. An indication for interaction between age and gender was observed (P = 0,064). Although in the V libraries no significant interaction between gender and age was found (P = 0,236) a significant decrease with age was observed for the proteasome expression level in males (P = 0,0008) but not in females (P = 0,612).

Comparison of the two muscle types
Because of these differences, the database was also subdivided for gender resulting in four groups (M females, M males, V females and V males) that were analysed separately.

Assessment of heterogeneity caused by proteasome expression level
The term "proteasome expression level" refers to the sum of all probe expression levels for a given library. The expression level showed a 2,6-fold and 2,5-fold variation  in the M and V libraries respectively. A reasonable assumption as to the cause of this variation is that the proteasome expression level is related to the number of proteasomes in the cell. To remove this source of variation from the database, regression analysis was used to establish the relationship between the proteasome expression level and the expression of individual probes. These regressions were calculated for each of the four groups separately. To remove the contribution of proteasome expression level to the variability in expression, the residuals of these regression lines were added or subtracted from the mean expression of the probes. This data, i.e. without the influence of proteasome expression level, are presented as Database B (Additional file 2).

Assessment of the contribution of diverse factors to the variability in expression
The expression levels of the individual probes differ widely, e.g. in M females the mean expression of PSMB1c is 35 whilst that of PSMA7a is 4787. Consequently, calculated variances are substantially influenced by the variability in expression of the individual probes. To eliminate this source of variability, the data for each of the four groups in Databases A and B (see Additional files 1 &2) were transformed in such a way that the mean expression of each probe became 1000. This procedure removed about 90% to 95% of the variability in the data. The trans-formed data are presented as Databases AA & BB in Additional files 1 &2. When the resulting variance in Database AA is set at 100% the contribution of proteasome expression level ranges from 38,7% for M males to 54,2% for V males (Table 2A). This means that 54-60% of the variability remains unaccounted for. One contributing factor must be experimental error (noise), but since the database does not include replicates this source of variation cannot be directly determined. However, an approximation of the experimental error can be obtained by comparing the expressions of two libraries that show a strong correlation.
Within the group of M females the mean and standard deviation of the correlation coefficient is -0,035 and 0,477, therefore some libraries will be rather similar. The libraries M19 and M25 have 0,985 as correlation coefficient and have a highly significant regression (P = 2,31 × 10 -46 , after Bonferroni correction P = 7,49 × 10 -43 ). This means that these two libraries are very much alike in expression and could approach the similarity of repeats. The variance for these two libraries, as determined from the variances of each individual probe is 1302 with 61 degrees of freedom. This indicates that the experimental error is only 1,4% (table 2A) and thus that over 50% of the variation in expression is due to other presumably epigenetic factors such as different tissue types (M and V libraries) and to constitutive differences between individuals (e.g. gender, age and genetic background. This interpretation is further supported by the comparison of the variances of the four subgroups since significant differences were found for tissue type, gender and even race (Table 2B).

Influence of Age, comparison with previous results
An earlier paper [2] indicated that increase in age was correlated with a decrease in deviations from the mean transcription profile. These deviations were expressed in a deviation index i.e. the standard deviation of all the deviations in a given library. In this paper this deviation index was calculated with database BB (Additional file 2). Since Database BB only consists of deviations from the expected mean expression of 1000, calculation of the standard deviation of these deviations for a given library results in the deviation index for this library. To determine whether a similar age effect; as observed in the previous database; is seen with the new muscle database, deviation indices were calculated for all libraries in the four groups and were examined for any relationship to donor age. An ANCOVA for the M libraries showed that there is a genderspecific difference for the relationship between age and deviation index (P = 0,038). In the ANCOVA for the V libraries such a correlation was not observed (P = 0,960) although there was some indication of an age-effect (P = 0,081). To visualize these effects regressions relating age and deviation index for the four groups are shown in Figure 1. This figure suggests that the deviation index goes up with age for M males and goes down in the other three groups. Therefore the results obtained with the muscle database also suggest an influence of age on the deviation index that can be gender and tissue specific, thereby confirming previous findings obtained with a database comprised of a variety of normal tissues. In contrast with the previous paper also an increase in deviation index with

Figure 1 Relationship between age and deviation index for A) females of M libraries, B) males of M libraries, C) females of V libraries and D) males of V libraries.
increasing age was observed in one of the groups (M males) which is a new observation.
The previous paper [2] also showed that the age-specific alteration in deviation index occurred mainly in a subset of the probes. In order to investigate this further using the muscle database, the probes were sorted according to their contribution to this age effect. To obtain an estimate of this contribution the relationship between age and residual errors was established for each probe.
The procedure, illustrated in Figure 2 for probe PSMB4a, is as follows: residual errors are obtained by subtracting 1000 from each data point in database BB. Half of the resulting residuals will be negative and are transformed to positive absolute values.
When the regression analysis between age and these absolute residual values for a specific probe is significant, then the probe in question has residual values which increase (or decrease) with age and will thus contribute to an age effect.
For M males, 18 of the 62 regression lines had a negative slope and 44 a positive slope, this difference from a 1 to 1 ratio being significant (P = 0,00096) indicating that for many slopes there is an increase of the residual value with age. Eight of the positive slopes were significant (P < 0,05). Using the same procedure for all data, the regression lines between age and the absolute residual values was determined for each of the four groups, 62 regression lines per group. The four groups differed in numbers of regression lines with a positive or negative slope. While M females did not show a significant deviation from the expected 1:1 ratio, V males and V females did (P = 9,61 × 10 -8 and P = 2,29 × 10 -8 respectively), both with an excess of negative slopes (Table 3).
Subsequently for each of the four groups (males and females from the M and V libraries) the probes were ranked in slope order from the most negative slope to the most positive and deviation indices were then calculated for subsets of the probes and their regression with age determined. Table 3 shows that the decrease or increase in deviation index with age depends on the subsets of probes used.
This analysis therefore broadly confirms our previous findings of the effects of age, gender and tissue-type on the expression of the proteosamal genes.

Further analysis of the residual variation in expression
Since this analysis shows that the probes differ in their contribution to the age-related change in deviation index, the question arises to what extent the same probes in all four groups are involved in these age effects. To answer this question, the slopes of the 62 regressions lines (regression of age with absolute residual values) of each group were compared with each other by ranking the slopes from 1 (lowest) to 62 (highest). The slopes of V females correlate with those of V males ( Figure 3A, P = 0,0017), indicating similarity in the age effects of the Illustration of procedure used to evaluate the contribution of a probe to an effect of age on deviation index Figure 2 Illustration of procedure used to evaluate the contribution of a probe to an effect of age on deviation index. Figure 2A shows the transcript abundance of probe PSMB4a in database BB of males in M libraries To generate the residual errors in Figure 2B, all data points were reduced with 1000 and resulting negative residual errors were converted to positive values. In this case of PSMB4a the regression with age is positive and significant (P = 0,005).
probes. Similarly, the slopes of M females also correlate with those of M males ( Figure 3B, P = 0,043). Therefore the probes that contribute to the age effect in the V and M libraries are similar in both sexes. To determine whether there is also similarity in probe contribution between M and V libraries, the mean ranks of females and males were calculated for both M and V libraries and then compared with each other. A strong correlation was found ( Figure  3C, P = 0,0016), indicating that similar probes are involved in the age effects. However this correlation is negative (-0,39) meaning that the probes that show a reduction in residual value with age in the M libraries have an increase in residual value with age in the V libraries and vice versa. Therefore the epigenetic aging phenomenon appears different for the two different tissues although similar probes are involved.

Prospects for further analysis: network of interactions
At present, although the association with age is clear, the cause of these age-related phenomena remains a mystery. However, this database lends itself to further analysis that may help clarify what is actually taking place.
Correlation matrices of expression files reveal the presence of intricate networks of probe expressions. In the M males BB database, about 32% of the 1891 correlations are significant (P < 0,01) with positive and negative correlations occurring in around equal numbers. The distribution of significant correlations between the probes is however not random. While some probes (A6, D05a and D05c) have no or only one 1 significant correlation, other probes like A7a, B4a, B6 and D04c are strongly involved with more than 35 of the 61 correlations being significant. This means that amongst the probes, subgroups can be distinguished that are characterized by a high positive correlation within the groups and a high negative correlation between the groups, Thus when a subgroup shows high expression, the second subgroup shows low expression and vice versa. This suggests that age-related oscillations in gene expression could take place. Further analysis might help to elucidate the nature of such oscillations.

Summary of results
1. The transcription profile of the proteasomal genes is different for the M and V libraries both in degree and pattern of expression. Therefore the abdominal muscle can be distinguished from the skeletal muscles of the extremities on the basis of proteasome expression.
2. About 50% of the variation in transcript abundance is due to differences in proteasome expression level.
3. Since experimental error contributes only 1,4%, the remaining variation (about 50%) is due to inherent differences between the two tissue types and to constitutional factors (e.g. gender and race).
4. After removal of the effect of proteasome expression level on transcript abundance a gender and tissue-specific relationship was found between age and the deviation index (with both increase and decrease being observed).
5. The probes differ considerably in their contribution to the age effect.
6. Although in the four groups similarity exists in the degree each probe contributes to the age effect, the

Validation of previous results
A previous study, using a large number of normal tissues [2], indicated that the pattern of transcription of the 20S proteasomal genes depends to some extent on age, gender and tissue type. This finding has now been corroborated by the present study that uses only two types of muscle. Consequently, since there does not appear to be any indi-cation that these influences relate to proteasome structure and function, it suggests that the influence of age, gender and tissue type on gene expression, without a concomitant relation to the specific function of the gene, is a widespread phenomenon and not just a quality of the proteasomal genes.
A possible explanation for this phenomenon is that it serves to maintain the differentiated state of the cell and is due to epigenetic factors that are gender and tissue-type specific and that are subject to age-related change. The need for such epigenetic factors is thought to arise because cells are complex systems (see discussion in previous paper [2]) with the proteasomal genes just reflecting this Comparison of the slopes of the regression lines between age and the absolute residual values

Rank of slopes in M libraries
A B C complex system as a "pars pro toto". The underlying structural basis in this complex system remains an enigma. The nuclear architecture could be one of the main factors involved.
The present study expands upon the previous findings. Whereas previously only a decrease in deviation index with age was observed, the present study shows an increase in deviation index with age for the males in M libraries. These age-related alterations are connected with a new phenomenon, i.e. with age there is a decrease or increase in residual variation between individuals in expression of the probes The degree of alteration in residual value is probe specific and the direction of the alteration (decrease or increase) is antagonistic in abdominal muscle against skeletal muscles. The reason for this agerelated change remains, thus far, elusive, but is felt that elucidation of the phenomenon will be important for understanding the aging process.
Other new findings relate to the differences between tissues, to the presence of an intricate network of gene expressions, to the quantification of factors involved in the variability in expression and to improvements in the method of analysis.

Relation to some other studies on gene expression variation with age
Variation in gene expression with age has been the subject of study on several occasions suggesting increase in variation in aging tissues [6][7][8] although this increase might be restricted to non renewing tissue [9]. Our foregoing study suggested that aging goes with a decrease in variability between individuals [10] while the present study suggests that both decrease and increase in variation between individuals can occur depending on the tissue and probe used. It is too early to judge this discrepancy as conflicting data since our approach is totally different. Our approach studies the pattern of gene expression in a group of functionally related genes and quantifies the degree of deviation from this pattern in a cohort of individuals of different age. Care was taken to remove differences in expression level from the data by correction for differences in proteasome expression level and by transformation of the data for removal of the differences in degree of expression of the individual probes. Our analysis therefore deals with the age related change in a pattern of expressing genes and not to an increase in noise or destabilization. Whether the observed changes lead to stabilization or destabilization is still an open question and the same holds for the involvement of stochastic events.

EPIGENE method
Progress has been made in developing a method for analysing epigenetic hereditary profiles in gene transcription.
Whereas previously the method consisted of establishing an index for all deviations from the mean transcription profile, the present method consists of the following steps: 1) Establishing more homogeneous subgroups in the database, e.g. for gender and tissue type.
2) Establishing the effect of factors that contribute to the variability in each library. Presently these factors are proteasome expression level and experimental error. The contribution of proteasome expression level to the variability can be removed and in this way a database is generated in which the individual libraries have the same mean proteasome expression level.
3) After setting the mean expression of each probe to 1000, a database with residual variability results that possibly only reflects constitutional differences between individuals that are probably mainly epigenetic in origin. In the case of the proteasome, this variability is still about 50% of the original variability ( Table 2).

4) Analysis of the residual variability.
Presently only one characteristic has emerged: increase or decrease of the residual variability with age. Since a complicated network of positive and negative correlations between the residual values of the probes is present and since the nuclear structure could be involved, further steps in the EPIGENE analysis appear possible.

Materials and methods
The database of human muscle from GEO, GSE5086 was used [11]. This database consists of 81 libraries, 62 derived from the rectus abdominis and 19 taken from skeletal muscles of the extremities. The donors, 37 females and 44 males, varied in age from 16-89 years.
The transcription profile of the genes that code for the 26S proteasome was chosen for analysis. In the database 62 probes are available to study the expression of 33 genes: When more than one probe is available for a particular gene, the gene symbol was extended with alphabetic characters. The probe sets and their expression data are given in Additional file 1. All calculations were performed with XLSTAT

Competing interests
The author declares that he has no competing interests.

Reviewer's report 1 Dr. Thiago M. Venancio (nominated by Dr. Aravind Iyer), NCBI-NIH, Bethesda, Maryland, United States
Reviewer comments on the final version of the manuscript After reading the revised version of the manuscript and the response to my comments, I still think that several assumptions of this study are likely to be wrong and the data sampling and methods are insufficient to support the author's conclusions. These two factors could possibly undermine the obtained results. Moreover, the main points discussed in my previous report were not satisfactorily addressed, as they required additional and deeper analyses and not just modifications in the text. Therefore, I think this manuscript does not meet the scientific standards of Biology Direct.

Reviewer comments on the original version of the manuscript Major points
In the present paper, Simons addressed the interesting and complex problem of epigenetic effects on transcription, aiming to find gender-, tissue-and age-specific variations. To achieve this goal, he presented a novel method to measure the impact of the different factors to the gene expression variability.
First and foremost, although I consider epigenetic control on gene regulation an important topic, I do have considerable criticisms on this manuscript, regarding the fundamental assumptions, the way the problem is addressed and the drawn conclusions. I detail my observations below, along with suggestions to improve the manuscript.
The Background section is insufficiently detailed to give the reader a good introduction to the topics covered by the manuscript. This section has to be improved, with more references to the previous works, instead of merely citing topics followed by etcetera. It was surprising to see how a paper covering such an extensively studied topic can have only seven references (including one link and two selfcitations).

Author response
This paper does not cover an extensively studied topic since it deals with a new phenomenon. It would have been easy to include many references on well-known epigenetic factors such as DNA methylation, nuclear architecture, chromatin structure, histone modification, genomic imprinting, regulatory elements etcetera, but in my opinion that would not have resulted in an improvement of the "background". Instead the paucity of references underlines that a phenomenon is under study that so far was not. Unfortunately this might give the impression of disregard of published data. Therefore in order to avoid such misunderstanding some small changes in the text were provided.
The author claims in the Background section that the proteasome does not play a role in gene regulation, which is not true for obvious reasons. The proteasome exerts extensive control over several different classes of proteins and their underlying biological processes. Therefore, it certainly encompasses (epigenetically and non-epigenetically) gene regulation and transcription [12][13][14]. I am not sure how critical is this assumption for the method, but it potentially undermines the model. In addition, some proteasome subunits are differentially used upon specific cellular signals. If some of these subunits were probed in this study, an additional source of noise might be expected.

Author response
This role of the proteasome was new to me. Thanks. Therefore this sentence was removed and in the section on heterogeneity caused by proteasome expression level this possible source of additional heterogeneity has been mentioned.
The methods section of the article is extremely poor. Although the author describes the methodology in the Results and discussion section, the text is inaccurate and hence it would be virtually impossible to reproduce the obtained results. In particular, I am still uncertain about what the author calls as microarray database and libraries.
The datasets are not properly cited and the data processing to generate the tables is not described. The term database is also used in different contexts across the text. Is this a relational database? Is this a dataset? Is this a database of one dataset? What is the accession numbers of the dataset(s)? How the microarray data were processed? Which platform(s) was (were) used? All these questions must be explicitly answered for an adequate understanding of the study.

Author response
Since the whole paper is in essence a search for a method to study epigenetic variation in gene expression the most of the section of "Results and Discussion" is on methodology and it has been the intention to deliver the data in such a way that results can be reproduced. However your remarks are to the point as to my dismay a small but essential section on "Materials and Methods" appeared to be missing from the text. This section has been re-inserted. My apologies for this shortcoming.
An additional column with a unique identifier (e.g. Gene ID) could be provided in the table 1. This would help the reader to make an unequivocal reference to the gene in the public databases.

Author response
Gene id's have been added to table 1.
In terms of biological results, I think the dataset is definitely too small and unbalanced to draw the conclusions presented in the paper. Even if these results are corroborated with an adequate number of samples, general biological inferences cannot be done with such small number of (related) genes. There are also technical concerns, as microarray and SAGE have their inherent limitations. It is not uncommon to reach discordant conclusions when comparing the results obtained by the two techniques. In addition, noise in gene expression was recently shown to be widespread at the cellular level. If one considers the difference between individuals, with different ages, gender and life-styles, the number of variables affecting gene expression is enormous and does not allow such direct conclusions. Therefore, I think the conclusions are unsupported by the presented data.

Author response
Strictly speaking the results with these dataset, even although they corroborated earlier results with another dataset, hold only for the proteasomal genes. However since no discordant observations were made, the possibility of a general biological significance cannot be denied when the number of genes is small The degree of noise in gene expression is not a sett led item and in fact our results indicate that effects of age and gender should be distinguished from noise.
As pointed above, I do not have the required details for a deeper evaluation of the method. However, I suggest some simple analyses. 1) Principal (or independent) component analysis would aid to recover important variables (components) affecting gene expression; 2) The statistical significance of the findings could be evaluated through a comparison of the obtained results with randomized datasets.
Author response 1)) The Principal Component Analysis was applied in the previous paper and identified an effect of age and proteasome expression level on the variability in expression, which led to a further analysis. The present paper investigates whether the results of that analysis could be corroborated using another dataset. Since this appeared to be the case, a PCA was not opportune.2)) The use of a randomized dataset could certainly be of some value to exclude inappropriate handling of data.

However one can already beforehand exclude the possibility that 2 libraries (from different donors) are so alike each other in gene expression that the experimental error (noise) can be at the most 3,5%. In fact this finding indicates that most of the variation in gene expression is due to variables and not to noise.
Regarding the methodological contribution of the manuscript, it lacks several desired/required characteristics of a methods paper, such as statistical formalism and background information, benchmarking with other methods and simulations with randomized datasets.

Author response
For me it is understandable that this paper leads to many more questions than answers and probably one can wonder whether the approach is the best one. However presently this is as far the method has been developed and these remarks cannot be translated by me into further improvements.
In addition, open source programs/libraries are highly appreciated by the scientific community, in order to give free access to the heart of the scientific discoveries and permit a fair comparison with other techniques. In my opinion, free (of free speech) software should be mandatory for methods papers

Author response
The missing reference that gives free access to the libraries of muscle has been included and XLSTAT is a free software.
'In the present format, I think this paper is below the scientific standards and therefore do not support its publication in Biology Direct with the present format. I may recommend the acceptance as a hypothesis paper after reevaluation of the extensive modifications' Minor corrections -The numbers are not formatted according to the journal recommendations. Comma is used instead of decimal delimiters.
-The quality/resolution of the figures is very bad. The inkto-data ratio and the histogram color combination are also inadequate.
-There is a reference to a figure z.
-There are two References sections (one is empty).
-Blank pages in the PDF file.
-Additional files could be provided in plain text format to allow a better access by other scientists.
-The Geneatlas link is broken.
-There is an Addendum in the paper. This part could be included in the main text.
I declare that I have no competing interests.

Author response
In the final version care will be taken that the paper and the additional files meet the official requirements if that would not be the case. Figure z has been changed into figure 5. One" Reference" heading has been removed. The link to Geneatlas was restored. The addendum became superfluous.

Reviewer's report 2 Dr. Hua Li (nominated by Arcady Mushegian) and Dr. Arcady Mushegian, Sowers Institute, Kansas City, United States
Most of the statistical models and analyses throughout the paper should be discussed in more detail.
Specifically, 1. Page 5 par 2: "To make the expression of the probes comparable, the distributions of each probe is transformed such that the sum of the expressions for each probe amounts to 100." Could you explain how this was done exactly, and why were the data rescaled to 100?

Author response
The data were rescaled in order to make the expressions of the probes comparable and to to determine in a specific library the number of probes with exceptionally high or low expression (actually the transformation was to a distribution with a mean of 0 and a standard deviation of 1, this has been corrected in the text. better justified in the revised ms and the discussion is improved. However, a few problems persist. In particular, I am still concerned about the significance of some of the statistical analyses. For example, the author still uses p = 0.05 as threshold for testing the statistical significance of the slopes of the probes (page 7), which indicates that no correction was done for multiple hypotheses testing. Lastly, I maintain my opinion that, even though gene expression variation with age is an interesting subject, this work only slightly advances our understanding of it.

Reviewer comments on the original version of the manuscript
In this work, the author employs data from human muscle to study gene expression variation in genes encoding the 26S proteasome. While in some tissues gene expression variation appears to decrease with age, in others it increases (also depending on gender). The author attributes these differences to specific genes. I thought the topic was quite interesting and timely, yet unfortunately I found very little new or surprising insights in the results. Moreover, the ms has numerous problems, as described below.
Although I think the idea that gene expression variation changes with age intriguing, this has been reported previously by others, including the author. The author mentions that the increase in the deviation index with age for males in one of the tissues is novel, but as detailed below I do not think this is statistically significant. So the results seem to me to be mostly confirmatory of earlier findings. Even if the results are significant, it is not clear to me what they mean as the author reports that the deviation index increases with age in some tissues (or gender) but not in others. The author does not appear to offer any explanation for this discrepancy either, so the relevance of these results eludes me.
The last paragraph of the discussion mentions the new findings reported in the ms, one of which was "the possible involvement of nuclear factors" which I could not understand how it relates to this study since no nuclear factors were studied. The author also reports "a complicated network of positive and negative correlations" between the probes, but I was not surprised by this. I would expect an analysis focusing on a given protein complex to find correlations between the expression profiles of its individual components.

Author response
The paragraph mentioning the "correlation network" and the "possible involvement of nuclear factors" had no other intention than to explore further possibilities for research aimed at clarification of the new phenomenon. The finding of negative correlations between frequencies of individual components of the proteasome is puzzling and not to be expected. Since the part on possible involvement of nuclear factors is not essential for understanding the ms, this part of the paragraph has been removed.
One major concern I have with this work is the statistical analyses, for which very few details are given. Importantly, I am not convinced that many of the results reported are statistically significant since given the multiple hypotheses tested some correction is necessary. A Bonferroni correction, for instance, would render the results of figure 1 no longer statistically significant with p = 0.024. For testing the statistical significance of the slopes of the probes the author uses p = 0.05 as threshold, which again does not take into account multiple hypothesis testing. The way the slopes of the regression lines correlate between males and females appears to be significant and the negative correlation in the M libraries was intriguing, though I am not sure what the latter means biologically.

Author response
As stated in the ms an ANCOVA on the M libraries indicated a significant interaction of age and gender on the deviation index (P = 0,038)confirming previous results. Further analysis is based on this finding and figure 1only serves to visualize this result and as you remarked the p-values of the 4 regression lines in figure 1do not give much information. Therefore these p-values have been removed from the ms and an additional sentence was added to the text. That the result is highly significant follows from the further analysis given in table 3 and figure 3.
A number of analyses seem to be arbitrary with no justification. For example, the author calculated an "approximation of the experimental error" using the correlation coefficient of two samples, but I found no explanation for how these two samples were selected.

Author response
Within the group of M females the mean and standard deviation of the correlation coefficient is -0,035 and 0,477, therefore some libraries will be rather similar. The libraries M19 and M25 have 0,985 as correlation coefficient and have a highly significant regression (P = 2,31 × 10 -46 , after Bonferroni correction P = 7,49 × 10 -43 ). This means that these two libraries are very much alike in expression and therefore could approach the similarity of repeats. This explanation has been added to the text.
A few other minor comments: A central thesis of the work is that this variation in gene expression has an epigenetic basis. While this indeed might be the case, I found no direct evidence supporting these claims. I can conjure other explanations for gene expression variation, such as stochastic variations in transcription factor levels between cells that are augmented with age, DNA damage accumulation that is random by nature, etc. I would suggest that the author considers alternative explanations for the results.

Author response
So far there is no proof that this variation in gene expression is epigenetic. This would be indicated when the variation is hereditary for which is some evidence: 1) similarity in deviating patterns in gene expression between a tumor and the normal tissue from which it was derived or between a tumor and its metastasis (paper I), 2)race related variation in expression (this paper), 3)specific expression patterns for the two tissues suggest that the patterns are hereditary, 4)the decrease or increase in deviation index with age does not support random stochastic events as driving force. As suggested alternative explanations could be possible but the phenomenon seems at present too new and unknown to make a more meaningful discussion possible.
As mentioned by the author, some probes had very low levels. Can probes with such a low signal intensity be classified as being expressed (i.e., above background)? There is also evidence that increased transcriptional instability with age may be more significant in nonrenewing tissues: http://www.ncbi.nlm.nih.gov/pubmed/17925006