COVID-19 and iron dysregulation: distant sequence similarity between hepcidin and the novel coronavirus spike glycoprotein

The spike glycoprotein of the SARS-CoV-2 virus, which causes COVID-19, has attracted attention for its vaccine potential and binding capacity to host cell surface receptors. Much of this research focus has centered on the ectodomain of the spike protein. The ectodomain is anchored to a transmembrane region, followed by a cytoplasmic tail. Here we report a distant sequence similarity between the cysteine-rich cytoplasmic tail of the coronavirus spike protein and the hepcidin protein that is found in humans and other vertebrates. Hepcidin is thought to be the key regulator of iron metabolism in humans through its inhibition of the iron-exporting protein ferroportin. An implication of this preliminary observation is to suggest a potential route of investigation in the coronavirus research field making use of an already-established literature on the interplay of local and systemic iron regulation, cytokine-mediated inflammatory processes, respiratory infections and the hepcidin protein. The question of possible homology and an evolutionary connection between the viral spike protein and hepcidin is not assessed in this report, but some scenarios for its study are discussed.

(ACE2) is thought to be its (main, but perhaps not exclusive) receptor on human host cells [132,136]. The spike protein is formed of a receptor-binding subunit (S1), a membrane-fusion subunit (S2), a single-pass transmembrane (TM) domain, and a cytoplasmic/intracellular tail (CT) [75,115]. Of note, the S1 domain has a similar fold as human galectins (galactose-binding lectins) [97]. Briefly, in terms of the putative primary function of the spike protein, Li comments that "because coronaviruses must enter cells for replication, membrane fusion is the central function of coronavirus spikes" [75].

Search for sequence similarity
A basic question that might arise is: what exactly makes the pathobiology and disease course of these particular viruses unique? And, could it be that, in addition to viral replication inside the particular types of human host cells, other intracellular processes specific to these viruses are involved? (For a thematically related inquiry, see e.g., ref. [58].) Having this in mind, we wondered if there might be any sequence similarity (and thereby potential structural similarity) between the SARS-CoV-2 spike protein (which has 1273 amino acids [133,134]) and any vertebrate protein(s).
A simple BLAST search does not reveal any similarities with human proteins. However, based on previous experience with the pufferfish Takifugu rubripes proteome [41,42,111] and its unique evolutionary history, we initially restricted the search to this species. Briefly, teleost ancestor species underwent an ancient wholegenome duplication event [125], and teleost species with well-annotated genomes such as the pufferfish can provide invaluable insights into the sequence evolution of genes with a clear phylogenetic linkage to mammalian genes (for a related commentary, see ref. [76]). Interestingly, a query using the full-length SARS-CoV-2 spike protein (accession no. YP_009724390.1) revealed a sole hit with the pufferfish hepcidin protein (XP_ 003965681.1; score: 32.7, E-value: 0.54). Given that SARS-like coronaviruses can be found in bats [133,134], we also used a full-length bat coronavirus sequence (ANA96027.1) as the query, which showed a closer match with pufferfish hepcidin (score: 38.5, E-value: 0.005). Conversely, a BLAST search in the Coronaviridae family of viruses using the pufferfish hepcidin (XP_ 003965681.1) revealed the bat coronavirus spike protein (ANA96027.1) as the top hit (score: 38.5, E-value: 5e-04). The scores and E-values here are not meant to indicate any claims of statistical significance but are rather provided for the purpose of comparison.
This similarity between the spike protein and hepcidin is at the cytoplasmic tail [80] of the spike protein, or, depending on the exact domain delineation, perhaps at the junction between the TM and CT domains. A multiple sequence alignment of this sequence region (generated using the AlignX feature of Vector NTI Advance 11.0, Invitrogen, Carlsbad, CA, USA), using three coronavirus spike proteins and four hepcidin proteins (from pufferfish, bat and human) is illustrated in Fig. 1a.
The alignment depicts a number of conserved motifs, particularly between the first pufferfish hepcidin sequence (various pufferfish species have at least two hepcidin-like genes [55]) and the coronavirus spike proteins. In a sense, the pufferfish sequence seems to act as a 'bridge' between the coronavirus motifs and those in the human hepcidin sequence. This may not be surprising particularly given the evolutionary context of teleost species described earlier. The similar cysteine-rich motif takes the following form: 'LXXXTXCCXCCKGXXXCGXCC(R/K)F'. Of note, the eight cysteines of the mature human hepcidin in the similarity motif, and the aligned cysteines of the SARS-CoV-2 spike protein, are not all specifically coded by one of the two cysteine-coding codons (TGC and TGT). Both codons are present in the respective gene segments. Also, for comparison purposes, as the coronavirus envelope protein [112] contains a related 'LCAYCCN' motif [135], this sequence was added as the last line of the alignment.
Although this is a 'distant' and limited sequence similarity, it cannot be attributed to 'chance'. The search that found hepcidin did not reveal a range of similarities with other teleost proteins. Moreover, there are many cysteine-rich protein sequences in teleosts and vertebrates in general, yet this similarity to the hepcidin gene family (not merely a one-off sequence) was unique and specific. How or why this similarity arose is then a separate, and potential follow-up, question. On this point, rather than framing the question as one having to do with a chance/random occurrence, it could more aptly be framed under concepts related to convergent/adaptive evolution versus common ancestry. Clearly, however, no conclusive claims about homology and sequence conservation can be made at present without a concerted investigation focused on this topic. Nevertheless, the similarity reported here raises a potential and intriguing question of whether there could be mimicry of human hepcidin (structural, functional or otherwise), perhaps inside the host cell, by the TM-CT junction of the spike protein. These possibilities should not be dismissed based on the mere phylogenetic gap between coronaviruses and vertebrate species. Furthermore, the phylogenetic gap, and the absence of clear evolutionary homologies between genes that may otherwise have unexplored functional linkages, should not dissuade one from pursuing such connections. As a case in point, gene network and protein structure/function linkages between certain yeast (unicellular eukaryotes) and mammalian genes proved to be quite beneficial in investigating neurotoxicity in human cells [123]. The question now is if the linkage between the spike protein and hepcidin could be expanded, and what reasonable scenarios for its experimental validation could be envisaged.

The Hepcidin protein
Hepcidin is a small peptide hormone that was discovered in 2000/2001 [7,63,69,96,102], and initially named LEAP-1 (liver-expressed antimicrobial peptide). It has an antiparallel beta-sheet fold and contains four disulfide bonds, and is involved in iron trafficking and the host's response to infection [34]. In fact, it has been remarked by a number of commentators that "hepcidin is to iron, what insulin is to glucose" [57]. Clear hepcidin orthologs appear to be missing in birds and invertebrates [105]. The human hepcidin (encoded by the HAMP gene on chromosome 19q13) is an 84-amino-acid prepropeptide, leading to a mature 25-amino-acid peptide that is detectable in blood and urine [65]. The proprotein convertase furin has been demonstrated to cleave prohepcidin at a polybasic site [73,124]. Of note, a furin-like cleavage site ('RRAR') has been reported to exist in the ectodomain of the SARS-CoV-2 spike protein, which is Fig. 1 Comparison of select hepcidin and coronavirus spike protein sequences. a A multiple sequence alignment of the C-terminal region of a number of coronavirus spike proteins (encompassing portions of the putative transmembrane and cytoplasmic tail segments), four hepcidins and the SARS-CoV-2 envelope protein is presented. The envelope sequence is provided only to demonstrate the cysteine residues with which the spike protein is proposed to form disulfide bridges [135]. The residue numbers are shown on the sides of each protein segment, and for proteins whose C-terminal sequences continue beyond the alignment, the full residue length is provided to the right. As per a color scheme used previously [111], dark green, grey and black highlights depict conserved, similar and identical residues, respectively. 'Tr' stands for Takifugu rubripes (Japanese pufferfish), 'Rf' for Rhinolophus ferrumequinum (greater horseshoe bat) and 'Hs' for Homo sapiens (human). The protein accession numbers of the sequences shown are, in order: (1) AWH65954.1, (2) YP_009724390.1, (3) ANA96027.1, (4) XP_003965681.1, (5) XP_029694670.1, (6) ENSRFET00010014064.1, (7) NP_066998.1 and (8) QHD43418.1. The domain illustration of the spike protein is based on ref. [132]. b A solved NMR structure of human hepcidin [65] (PDB: 2KEF), adopting an antiparallel beta-sheet fold, is visualized with its putative four disulfide bonds formed between eight cysteine residues. c The position of the disulfide bonds in the sequence of the mature human hepcidin is illustrated along with the potential palmitoylation residues (ten cysteines) of the cytoplasmic tail of the SARS-CoV-2 spike protein. The palmitate visual is as per ref. [77] with permission from the publisher (Springer Nature) absent in coronaviruses of the same clade [5,28]. The protein-coding part of the HAMP gene is split over three exons, with the 25-amino-acid mature peptide occurring on the last exon. In terms of its putative function, Prentice notes that "although the hepcidin molecule does itself possess some antimicrobial activity, this is rather weak compared to peptides such as defensins, and its primary contribution to innate immunity is via regulation of iron" [105]. In fish species, one of the two hepcidin paralogs has been shown to potentially possess antimicrobial effects in innate immunity [78,79]. Finally, hepcidin binds to and mediates the degradation of ferroportin (encoded by the SLC40A1 / FPN1 gene), the only known cellular iron exporter. The structural details of this interaction are being mapped and studied in ever more detail [18,88,108].

Hepcidin and the CoV spike protein Available structures and structural predictions
There are a number of solved structures of hepcidin [65,72]. An NMR structure of human hepcidin is depicted in Fig. 1b (visualized using 3-D Molecule Viewer, Vector NTI Advance 11.0, Invitrogen) including the locations of the four putative disulfide bonds. Of note, in addition to various hepcidin orthologs containing eight cysteines, four-cysteine variants have been described in notothenioid fish species [137]. Available solved structures of the coronavirus spike glycoprotein, as far as our search could reveal, mostly utilize expression constructs that stop just short of the TM domain. As noted, this is partly because the protein's ectodomain is the main focus of studies on viral binding to host surface receptors [129,131,132]. For example, Wrapp, Wang and colleagues have reported the cryo-electron microscopy structure of ectodomain residues 1-1208 of the spike protein (trimer in the prefusion conformation) [132], but this excludes the TM and CT domains. It goes without saying that the inclusion of transmembrane domains would require complicated structural elucidation protocols, and even then, one may still not be able to solve the structure of the protein in its entirety.
Moreover, we would like to note that using the Pfam-A (ver. 32) structural/domain database [46] in the HHpred remote homology and structure prediction toolkit [145], the coronavirus spike protein regions analyzed here show some predicted structural similarity to lipolysis-stimulated receptor (LSR) lipoprotein receptor family (PF05624) [139], and hepcidin sequences show some predicted structural similarity to the Sar8.2 protein family found in Solanaceae plants (PF03058) [2]. What, if any, significance these findings may hold is unclear at present. Of more importance right now would be the theoretical and/or actual elucidation of the structure of the spike protein TM-CT junction region and a comparison with the available hepcidin structures (the Pfam hepcidin entry, PF06446, currently references six PDB structures). Of note, a recent in silico study has reported on a predicted structural similarity and compatibility between hepcidin and an allosteric site in the SARS-CoV-2 spike protein [32].

Post-translational modifications
Given the prominence of cysteines in the aligned motif ( Fig. 1a), how are they utilized in the respective similar domains? At first pass, the usages appear to be different: as noted earlier, in hepcidin, the cysteines may give rise to a compact disulfide-bridged peptide [65,89] (Fig. 1b), whereas in coronavirus spike glycoproteins, the cysteines in the TM-CT junction serve as palmitoylation acceptor residues [117] (Fig. 1c) that facilitate membrane fusion [24]. It should be mentioned, however, that 'mini-hepcidins' conjugated to palmitoyl groups have been synthesized and studied previously [106], but these do not occur naturally. As for the SARS-CoV, at least a portion of the palmitoylation of the spike protein has been reported to occur in a pre-medial Golgi compartment [81]. However, there is also the possibility of crossdisulfide bond formation with a non-homologous small cluster of cysteines within the envelope protein [135]. Moreover, if S-palmitoylation is a reversible and dynamic process [118], it is to be determined if the spike protein junction cysteines might in fact have a different post-translational modification in the host cytoplasmic environment (although there is no evidence of this at the moment, and the reversibility of palmitoylation in viral spike proteins has not been reported [47]). That being said, in the cited paper by McBride and Machamer, the authors conclude that the palmitoylation of the SARS-CoV spike protein "was not necessary for S protein stability, trafficking or subcellular localization" nor "for efficient interaction with M protein" [81]. To what extent the post-translational modifications of the spike protein and hepcidin, be it furin cleavage, disulfide bonds or palmitoylation, are in any way similar in an intracellular context, remains to be examined.

Scenarios for the investigation of homology and evolutionary connections
As alluded to earlier, in terms of the possibility of an evolutionary connection between the spike protein and hepcidin, one could imagine a scenario whereby an ancestral spike protein acquired a hepcidin-like sequence from a host organism, and the new sequence was palmitoylated to aid with membrane association. Li points out that "the primordial form of coronavirus spikes might contain S2 only" [75], and the cytoplasmic motif features highlighted in the current report do not appear to be present in other class I viral membrane fusion proteins (which include the influenza virus hemagglutinin [20,66]), although we have not performed an exhaustive search. However, a number of questions might then arise under this scenario, such as the difference between the primary localizations of hepcidin (considering its putative interaction with extracellular and transmembrane regions of ferroportin [12,18,108]) versus the CT domain of the spike protein. Alternatively, it might be argued that perhaps a case of convergent sequence evolution is at play. For example, the influenza hemagglutinin glycoprotein appears to have a conserved 'CXICI' motif in its cytoplasmic tail domain [93,128], and perhaps an ancestral spike protein with similar features convergently acquired hepcidin-like sequence motifs. These are of course speculations and remain open questions. Investigations pursuing these topics could also make use of studies that attempt to trace the evolutionary history of hepcidin itself [67,137].

Potential leads for coronavirus research Hepcidin-like motifs among different spike proteins
One of the first questions that may arise from this potential sequence similarity is whether the comparison is equal for all seven known human coronavirus strains, or if it is more limited to the severe disease-causing viruses. Figure 2 depicts a sequence alignment to start to answer this question. The alignment is restricted to the length of the mature human hepcidin protein, which is depicted at the bottom row. The first four spike protein sequences are comprised of the mild/asymptomatic strains, i.e., 229E and NL63 (alpha coronaviruses) and HKU1 and OC43 (beta coronaviruses). These are followed by the spike proteins of MERS-CoV, SARS-CoV-1 and SARS-CoV-2. The bat coronavirus sequence and the four hepcidins are depicted in the same order as in Fig.  1a. Paying particular attention to the region between the two conserved cysteines (marked by two black arrows), there appears to be less similarity in the motif between the first four spike proteins than the rest of the sequences. Specifically, for example, a 'conserved' glycine in the sixth position from the first overall-conserved cysteine (indicated with a red arrow) appears to be an important residue that groups together the three disease-causing spike proteins with the bat sequence and the two teleost hepcidin proteins. How, if at all, these differences play out in the cell might be an intriguing experimental direction.
What can also be noted from Fig. 2 is that although there are appreciable differences in the spike protein domains visualized in the figure between the MERS-CoV and the two SARS-CoV sequences, the SARS-CoV-1 and SARS-CoV-2 spike proteins are almost identical in this region (the tabulated amino acid identity and conservation values shown were calculated for the aligned region using the AlignX program). Therefore, based on the hypothetical link presented in this report, it might be reasonable to assume that any pathobiological differences between the two SARS strains would not be as a result of any differing biology attributable to the hepcidin similarity domain. For reference, the SARS-CoV-2 genome has been reported to be 96% identical to bat CoV (nucleotide identity), 80% to SARS-CoV-1, 55% to MERS-CoV and 50% to common cold CoV (e.g., the 229E and OC43 strains) [13]. Also of note in Fig. 2 is the percentage residue identity and conservation between Takifugu hepcidin and the corresponding bat coronavirus spike protein region (54% identity, 62% conservation), which are the same identity and conservation values between the two Takifugu hepcidin paralogs.
More broadly, the potential sequence link reported here points to the need to build on previous research on the cytoplasmic tail of the spike protein (e.g., refs. [80,100,101]) to better understand its distinct roles from the time of viral attachment to possible protein-protein interactions inside the host cell. Of the rare mutations currently reported in the SARS-CoV-2 spike protein (using patient-isolated genomic data in the GISAID repository), only one (P1263L) appears to be in the cytoplasmic tail domain [68], placing it outside of the similarity region reported here.
Moving back to the biology of the hepcidin protein itself, as noted previously, hepcidin binds to and mediates the degradation of the iron exporter ferroportin. If the sequence similarity reported here is actually playing a significant role at the cellular level, could it be that, although the cellular localizations appear to be different based on current knowledge, the SARS-CoV-2 spike protein cytoplasmic tail can partly mimic the structure of hepcidin and interact with ferroportin? Could the cytoplasmic tail even coordinate and bind iron? These remain to be investigated experimentally, but of note, Rishi and colleagues recently reported on the intracellular localization of ferroportin dimers, and concluded that both the carboxy-and amino-termini of the protein are intracellular [109]. As cited earlier, the details of hepcidin's own interaction sites with ferroportin remain the subject of different structural determination projects [12,18,108]. Relatedly, and of interest, Neves and colleagues, using experiments on iron overload in European bass (Dicentrarchus labrax), have discussed the functional partnership between hepcidin and ferroportin from an evolutionary perspective and suggested that this may "open new possibilities for the pharmaceutical use of selected fish […] hepcidins during infections, with no impact on iron homeostasis" [90,91].
Notwithstanding the specificities of the hepcidinferroportin interaction, the sequence similarity reported here also points to a possible broader focus on iron biology. Following calcium, oxygen and lead, iron has historically been one of the most studied elements in cell biology [40], and as such there is a vast body of (at times conflicting) literature to draw upon relating to iron in coronavirus infections. Other investigators have reviewed broad themes from iron biology in the context of COVID-19 [39,78,79], but here I will only touch upon salient features of iron biology that revolve around hepcidin. In the three subsections that follow, I will briefly discuss relevant and recent research with Fig. 2 Comparison of the core hepcidin-like motif among the seven known coronavirus human infection-causing strains. Following the alignment presented in Fig. 1a, the core similarity motif, corresponding to the length of the mature human hepcidin sequence, is shown in an alignment containing the spike protein cytoplasmic domain of the four mild/asymptomatic human coronavirus strains and the three severedisease-associated strains. Hepcidin sequences from the previous figure appear on the last four lines. The alignment color scheme is the same as in Fig. 1. Focusing on the region between the two black arrows, fewer similar residues could be observed between the group of mild/ asymptomatic coronavirus strains and the MERS/SARS-CoV strains. One such residue that can act to distinguish the groups is indicated with a red arrow. The accession numbers of the sequences shown, in order of appearance, are: perspectives on inflammation, hypoxia and diagnostics. Moreover, the main themes discussed in these segments have been summarized pictorially in Fig. 3.

Hepcidin, iron biology and inflammation
Foremost among the potential physiological connections related to the present discussion is the so-called 'cytokine storm' mediated by interleukin-6 (IL-6) reported in some COVID-19 patients [70,84,87,138]. This is not, however, restricted to IL-6 and in fact elevated levels of a bundle of pro-inflammatory cytokines has been reported in severe COVID-19 cases [25,26]. Indeed, researchers have proposed that "reduced innate antiviral defenses coupled with exuberant inflammatory cytokine production are the defining and driving features of COVID-19" [19]. In terms of the chronology of events, we could consider an initial "immune defense-based protective phase" and a subsequent "inflammation-driven damaging phase" in the disease [116].
How could these relate to hepcidin biology? First and foremost, an early set of findings on hepcidin was that it is regulated by anemia, hypoxia and inflammation [92]. A general understanding of this regulatory network is that inflammation brought about by infections increases hepcidin production, which in turn can lead to the anemia of inflammation [50,52]. Hepcidin production in the liver is induced by IL-6 [51,94], and it has been reported that hepatic heparan sulfate affects and regulates IL-6-stimulated hepcidin expression [103]. Furthermore, heparin, the anticoagulant glycosaminoglycan that is a highly sulfated form of heparan sulfate, has been shown to be a potent inhibitor of hepcidin expression [104]. Of interest, anticoagulant treatment has been reported to be effective in a subset of severe COVID-19 patients [120][121][122], and researchers have determined that the procoagulant transferrin [120][121][122] is upregulated in SARS-CoV-2 infections [83]. Related to such concerns with inflammation and coagulation in COVID-19, there have also recently been proposals centered on ACE2 and the vasopressor system protein bradykinin [53,110]. These are various leads requiring further investigation.
It is pertinent to note that dysregulated hepcidin is a defining feature of hemochromatosis, a condition characterized by hepcidin deficiency, increased plasma iron levels and transferrin saturation [21]. In addition, a notyet-fully-established link of relevance here is the observations of a Kawasaki-disease-like systemic vasculitis syndrome in children infected with the novel Fig. 3 Summary of salient facets of coronavirus spike protein and human hepcidin biology. On the top left of the figure, three trimeric spike proteins are depicted on a segment of the viral membrane, two in a figurative style and the middle trimer in a more detailed domain-specific format. One palmitate is shown attached to the cytoplasmic tail of the spike proteins, as per Fig. 1c. Following the binding step to ACE2, a stylized viral fusion step is depicted based on refs. [120][121][122], where an early fusion scenario (as opposed to endocytosis) is envisioned. The assembly of new viral spike proteins is shown to be taking place in an ER-Golgi intermediate compartment. On the right half of the figure, various elements of hepcidin and iron biology are depicted, which could result in potential outcomes such as coagulopathy, ferroptosis and hyperferritinemia. In terms of hepcidin's binding to ferroportin, two possible scenarios are depicted, whereby hepcidin could bind to the extracellular face of ferroportin or deeper inside the transmembrane cavity. Lastly, the geometric shapes for the spike and ACE2 proteins were adapted from ref. [119], for ferroportin from ref. [30], and for ferritin from ref. [29], all with permission from the publisher (Springer Nature) Ehsani Biology Direct (2020) 15:19 coronavirus [64,127], which is also being called multisystem inflammatory syndrome in children (MIS-C) [37,44]. Incidentally, an association between a novel human coronavirus and Kawasaki disease was reported in 2005 [43], although other investigators were apparently not successful in confirming the link [38]. Nevertheless, if the association with the Kawasaki-disease-like syndrome is real, then it is noteworthy that increased hepcidin levels have been suggested as a biomarker for Kawasaki disease [61].

Hepcidin, iron biology and hypoxia
As opposed to the inflammation-based upregulation of hepcidin, anemia and hypoxia are usually taken to have the opposite effect on hepcidin's expression [92]. Further to cytokine storms and characteristic immune reactions, hypoxemic respiratory failure could be another major warning sign in COVID-19 patients [31]. Less established, but still important to mention, is also the debate surrounding the issue of hypoxia/hypoxemia and certain symptoms resembling, but differentiable from, altitude illness [8,9,15,54]. Congruously, hepcidin expression levels have also been investigated in the context of highaltitude acclimatization [59] (see also ref. [27]). Moreover, hepcidin levels are known to increase in patients with acute respiratory distress syndrome (ARDS) [48,86]. It may also be important to point to a number of circumstantial but perhaps important findings in the literature pertaining to lung disease. These include (i) a link between SARS and liver function abnormalities [74], (ii) the association of pulmonary iron overload and restrictive lung disease [49,90,91], (iii) the role of iron in pulmonary fibrosis [3], (iv) hepcidin's modulation of the proliferation of pulmonary artery smooth muscle cells [107], and (v) the vital role of hepcidin in alveolar macrophage function [98]. Furthermore, (vi) hepcidin upregulation along with serum iron reduction has been reported in influenza infections [10,45]. Importantly, however, iron dysregulation changes may only be at a local cellular/tissue level and not reach a systemic response [71], and may demonstrate selective tissue tropisms during different viral infections [11].

Hepcidin, iron biology and diagnostics
Presently, dexamethasone and other corticosteroids are among the very few widely-used treatments for COVID-19 [130], and it would be of great significance to have robust diagnostic correlates of the course of the disease. Could systemic changes in serum iron levels [60,114,126,141] (with a possible view on the degree of abovenormal serum ferritin in patients [25,26]) or levels of hepcidin itself be detected in patients with varying COVID-19 severities? First, on the latter point, researchers have recently reported that increased serum levels of hepcidin and ferritin are indeed associated with the severity of COVID-19 [143,144]. This combined hepcidin/ferritin diagnostic might potentially be promising. Serum ferritin levels alone are usually considered a general indicator of inflammation and infection, and as pointed out by Baron and colleagues, "the use of ferritin to diagnose iron deficiency may be problematic in patients with COVID-19 disease, who may have normal or high ferritin levels despite very low iron stores" [14,143,144]. In fact, COVID-19 has been proposed to be a part of the hyperferritinemic spectrum of conditions [56,99].
As for correlations with plasma iron levels, Shah and colleagues have reported that "compared with patients with non-severe hypoxemia, patients with severe hypoxemia had significantly lower levels of serum iron", and that "the association of serum iron with lymphocyte counts could reflect the requirement of the adaptive immune response for iron and may contribute to possible T cell dysfunction reported in COVID-19" [114]. The authors did not find significant differences in transferrin saturation or serum ferritin levels between the nonsevere and severe hypoxemia patient groups. Congruently, a retrospective study of COVID-19 patients and serum iron deficiency has found that "the severity and mortality of the disease was closely correlated with serum iron levels" [142].
Beyond the immediate issue of diagnostic markers, two broader questions that could follow from the present work can be raised: first, does the spike protein, similar to hepcidin, potentially promote iron sequestration in (alveolar) macrophages [85] and hence impede the host's initial immunological response? Second, could reports of the common presence of digestive symptoms in COVID-19 patients (e.g., ref. [95]) be explainable in part by a link to hepcidin? With respect to the biology surrounding these gastrointestinal symptoms, it is important to point out that a recent study concluded that in gut epithelial cells, the "expression of two mucosaspecific serine proteases, TMPRSS2 and TMPRSS4, facilitated SARS-CoV-2 spike fusogenic activity and promoted virus entry into host cells" [140]. It is also known that the liver-specific serine protease TMPRSS6 (matriptase-2) negatively modulates hepcidin [16,22,36]. Again, what if any significance these connections might hold remains to be determined.
Overall, the observations in this report suggest that, as a starting point, serum iron status would be a critical data category to be systematically collected from patients at various stages of the disease's progression. Furthermore, iron dyshomeostasis in the case of COVID-19 may be a more specific pathobiological feature of this infection, and we can only know the answer to this scenario if more data related to iron biology is collected during the current pandemic. Could, for instance, clinical evidence be gathered to clarify if disorders of iron homeostasis might exacerbate symptoms in COVID-19 patients? As a final note, given the many known and yetto-be-discovered intricacies of iron homeostasis in the body, research on therapeutic strategies that, for example, propose to utilize iron chelation or hepcidin antagonists should proceed cautiously [1,52].

Conclusion
Theoretical analyses in new areas may necessarily entail reasonable speculations based on limited or disparate data. This is expected, but one should remain cognizant of overinterpretation, and pursue a rational course of theoretical inquiry to hopefully inform subsequent experimental investigations. Here, a purposeful and restricted protein sequence search revealed a potential sequence similarity between the relatively less-studied cysteine-rich cytoplasmic domain of coronavirus spike proteins and the vertebrate hepcidin protein. This is quite unlikely to be a spurious and random similarity. There are many cysteine-rich protein sequences in vertebrates, but the motif identified here is unique and specific, and also appears to tentatively set apart the disease-causing strains from the milder coronavirus strains. Following from this link, a number of emerging clinical strands of evidence (summarized in Table 1) were discussed which further link a biology surrounding hepcidin with coronavirus-caused pathobiology. While each piece of clinical evidence discussed does not by itself provide overwhelming corroboration for the hypothesis of the paper, the totality of evidence presented we believe make a strong case that if sequence and/or structural mimicry to hepcidin is taking place upon viral attachment to and entry in the host cell, then perhaps a local disease condition resembling iron dysregulation (e.g., iron overload) might ensue in the infected tissue(s). This hypothesis can be immediately tested in one of three ways. First, in the clinic, levels of various serum markers for iron biology could be more systematically and comprehensively measured and analyzed. This strand of investigation has already produced corroborating evidence in the form of increased serum levels of hepcidin and significantly lower levels of serum iron in COVID-19 patients. Second, computational investigations could examine potential structural mimicry between the two proteins and explore the effect of differing post-translational modifications. And third, the potential link to hepcidin could be relied upon in cellbased assays to determine the possibility of the involvement of the spike protein in iron biology.

Acknowledgements
The first draft of this manuscript was posted on 27 March 2020 on the arXiv preprint server (arxiv.org/abs/2003.12191v1). Thanks are due to Gerold Schmitt-Ulms (University of Toronto) and Farnaz Fahimi Hanzaee (University College London) for very helpful discussions.
Author's contributions S.E. performed the analysis and wrote the paper. The author(s) read and approved the final manuscript.
Author's information S.E. holds a Ph.D. in laboratory medicine and pathobiology from the University of Toronto, and was a postdoctoral fellow at the Whitehead Institute for Biomedical Research and the Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology. He is currently a Ph.D. student at the Department of Philosophy at University College London.

Funding
The author is supported by a University College London Overseas Research Scholarship.
Availability of data and materials All sequences utilized are from publicly available repositories. Accession numbers are provided in the relevant sections. Table 1 Points of similarity and divergence between hepcidin and the coronavirus spike protein cytoplasmic domain based on current knowledge of the viral protein and its pathobiology (see main text for references) Potential Similarities Potential Differences 1. Protein sequence: A unique but restricted sequence similarity exists between mature hepcidin and the cysteine-rich cytoplasmic tail of coronavirus spike proteins.
1. Protein length and domains: Coronavirus spike proteins are much larger, multi-domain, proteins compared to the relatively short-length hepcidin proteins.

2.
Post-translational processing: The proprotein convertase furin cleaves prohepcidin, and has been reported to also activate (the ectodomain of) the SARS-CoV-2 spike protein.
2. Post-translational modification: Cysteines in the cytoplasmic tail of coronavirus spike proteins are thought to be palmitoylation acceptor residues, and this modification has thus far not been reported to be reversible. Cysteines in hepcidin are used to form disulfide bonds.
3. Cytokine storm: IL-6-mediated inflammatory responses have been reported in COVID-19 patients. Hepcidin production in the liver is induced by IL-6 and is well-studied in the context of the anemia of inflammation.
In addition, COVID-19 may be associated with a Kawasaki-disease-like systemic vasculitis manifestation in children, and hepcidin levels have also been suggested as a biomarker for Kawasaki disease.
3. Localization: Hepcidin is thought to interact with its main (iron-bound) interactor, ferroportin, extracellularly, whereas the spike protein cytoplasmic tail does not face the environment outside the viral membrane. (However, the cytoplasmic tail associates with the plasma membrane itself aided by its palmitoylation modifications, and hepcidin may interact with ferroportin close to ferroportin's transmembrane regions in addition to extracellular residues.) 4. Hypoxia/Hypoxemia: COVID-19 may lead to symptoms resembling, but differentiable from, altitude illness, and hepcidin expression levels have also been studied in the context of high-altitude acclimatization.