There could be a five-fold increase in the number of proteins translated from what was formerly called non-protein-coding DNA, according to a report in Science.
Remember when the Human Genome Project was shocked at the low number of protein-coding genes? Well, they didn't know half of what they were looking at. In her article "'Dark proteome' survey reveals thousands of new human genes," Elizabeth Pennisi exclaims, "Database confirms that overlooked segments of the genome code for a multitude of tiny proteins."
One of the biggest surprises to emerge when the human genome was first sequenced more than 20 years ago was how few genes it contained, less than one-third the number some scientists had predicted. Fewer than 30,000 genes and the proteins they encode are enough to build and operate the human body, it seemed; recent tallies have moved even lower, to about 20,000. But a new systematic analysis of what some call the "dark proteome" suggests scientists have missed thousands of nontraditional genes that lurk in previously overlooked stretches of the genome and make smaller than average proteins. [Emphasis added.]
What is the new gene count when the previously dubbed "noncoding" parts of the genome are included? By counting "nontraditional genes," Thomas Martinez of UC Irvine says, "My gut feeling it is probably not as high as 100,000, but 50,000 is in the realm of possibility."
Pennisi had just said prior to that, however, "The dark proteome has clearly boosted the total, but no one knows the true number." Given the trend away from the myth of junk DNA, it seems premature to underestimate the amount of function in previously discounted regions.
Early indications of function were noticed in "noncanonical ORFs" (open reading frames) that didn't match the traditional ORFs for standard proteins. Pennisi does not use the outmoded phrase "junk DNA" as she relates what geneticists are finding in the nontraditional sections that make "miniproteins" but she relates some history that suggests that the myth was a hindrance.
Some researchers, she writes, were still falling into the trap of assuming "noise" (junk) for sections of code that made short proteins: i.e., amino acid chains less than 100 base pairs (bp). Then a new process called Ribo-Seq showed that these short genes were being transcribed into RNA that attaches to ribosomes, presumably for translation.
Even then, many scientists dismissed the resulting miniproteins as unimportant, expecting they were "noise" that were quickly degraded. It's been very challenging to convince people these ORFs are worth a serious look, says Ji-Young Youn, a biochemist at the Hospital for Sick Children in Toronto.
Old habits die hard. Assuming evolutionary junk has been a science stopper before: for instance, when it came to junk DNA and vestigial organs. A new generation of researchers, though, is starting to think that if something is found in a genome, cell or organism, it must have a purpose.
One of the open-minded researchers, John Prensner, "got interested in what the rest of the genome had to offer." Some of the newly discovered snippets -- as short as 12 bp -- could even be called "microproteins." What were they doing?
But about 3 years ago, Prensner and colleagues demonstrated that cancer cells contained about 550 of these microproteins. Two years earlier, Sebastiaan van Heesch, a systems biologist at the Princess Máxima Center for Pediatric Oncology, found similar numbers of the tiny proteins in heart tissue. "Sebastiaan and I were finding these genes that are very, very cool and we thought the world should know about them," Prensner says.
Soon the world of genetics was beginning to know about them. Interest in miniproteins accelerated between 2021 and the present. Pennisi writes that "several dozen other researchers from 20 institutions across four continents" joined hands in a "superconsortium" to "bring order to a relatively new field" and to get the growing number of genes annotated in GENCODE, the database of recognized genes. By 2022 they had tallied 7,264 noncanonical ORFs in the human genome (see the preprint at bioRxiv, with Prensner and Martinez listed among the 43 authors).
All told, they confirmed that one-quarter of the 7264 noncanonical ORFs they had tallied made proteins, some 3000 in all. (An ORF can be read multiple ways to make more than one protein.)
Pennisi's article only discusses miniproteins implicated in certain cancers and metabolic diseases that medical researchers hope can become targets for new drugs. It may be too early to blame these diseases on miniproteins, however. Researchers have only seen "slivers of light" on these "overlooked proteins" constituting an "unseen population" of information-bearing sequences. Will they find that the miniproteins that appear implicated in cancer result from the disease instead of causing it? Youn is looking for "miniproteins that play a role in brain development." The authors of the bioRxiv article expect major discoveries to come:
This work will provide a platform to advance ncORF-derived proteins in biomedical discovery and, beyond humans, diverse animals and plants where ncORFs are similarly observed.
Logically, if animals, plants, and humans have been thriving successfully with thousands of miniproteins in their genomes, the short transcripts must be doing something useful.
Messenger RNA (mRNA), the edited transcripts of DNA, are used as templates in ribosomes to build proteins. That much is well known. But what to do with spent mRNA? If not destroyed, they could keep making copies of their proteins like out-of-control assembly lines. Researchers at the University of Würzburg have discovered "a process that breaks down mRNA molecules in the human body particularly efficiently," according to a press release. A specific RNA modification called m6A triggers the ribosome to destroy the mRNA after it has been translated into a protein. Not only that, m6A is location specific:
The Würzburg researchers were the first to discover and observe this degradation process: It couples the degradation of an mRNA directly to the proteins produced and is significantly faster and more efficient than previously known mechanisms for mRNA degradation.
Crucially, this particular pathway only works when m6A is present in specific regions of the mRNA. In this way, m6A particularly "comments" on the blueprints for proteins involved in cell differentiation - that is, whether a cell will exist as a nerve cell, muscle cell, skin cell or some other form.
Details of this efficient regulatory process were published open-access in Molecular Cell.
We have just seen remarkable details in the genome that shout "design." To evolutionists, these details shout "chance." If it exists, it must have evolved! Here's a recent example from an article from the University of Arizona about the origin of the genetic code:
"The genetic code is this amazing thing in which a string of DNA or RNA containing sequences of four nucleotides is translated into protein sequences using 20 different amino acids," said Joanna Masel, the paper's senior author and a professor of ecology and evolutionary biology at the U of A. "It's a mind-bogglingly complicated process, and our code is surprisingly good. It's nearly optimal for a whole bunch of things, and it must have evolved in stages."
Come again? It's so good, it evolved? What makes this exercise in cognitive dissonance even more ironic is that the article lambastes previous evolutionary explanations for the genetic code. In particular, the author says that the Miller-Urey experiment (one of the Icons of Evolution) misled those trying to understand how the genetic code got started.
The authors argue that the current understanding of how the code evolved is flawed because it relies on misleading laboratory experiments rather than evolutionary evidence. For example, one of the cornerstones of conventional views of genetic code evolution rests on the famous Urey-Miller experiment of 1952, which attempted to simulate the conditions on early Earth that likely witnessed the origin of life.
Miller never got sulfur-containing amino acids, the article complains, "despite the element being abundant on early Earth." These gaps tainted the "conventional views of genetic code evolution."
By implication, Miller should have quit "laboratory experiments" and relied more on "evolutionary evidence" which, as we all know, exists in the imagination. A favorite tool for imagining this kind of evidence is to swear allegiance to the myth of progress (simple to complex) and then to visualize it happening in computer models. The U of A team was satisfied to model simple amino acids becoming complex, and short strings of amino acids becoming longer, in a "hypothesized population of organisms" that eventually gave rise to us. "This gives hints about other genetic codes that came before ours, and which have since disappeared in the abyss of geologic time."
Those uncomfortable with imagining hypothetical populations and codes vanishing into the abyss will want to pursue real-world evidence that is observable, testable, and repeatable. Will function be recognized in the newly discovered components of the genome? Based on prior trends, I propose two ID-based predictions: (1) Most of the miniproteins and microproteins will be found to be beneficial, and (2) they will defy attempts to arrange them as building blocks of larger proteins. Let us see what future research brings.