Exploring the first 50 sequenced plant genomes

Since 1995, nearly 200 organisms have had their full complement of hereditary information, including all of their genes, sequenced. The full set of genetic instructions—coded in DNA—for making a person, or a pathogen, or a pineapple plant is known as the genome.

Although efforts to sequence the genomes of humans have most often made the news, the genomes of nearly 50 plants have also been published now and the rate at which plant genomes are being decoded is steadily increasing. Just 10 plant genomes were sequenced between the years 2000 and 2008, while 13 new plant genomes were published in 2012 alone, and another 12 have been reported so far in 2013.

Read on to learn which plants genomes have been sequenced to date, what these first 50 genomes are like, and what plant geneticists and crop scientists hope to accomplish through these sequencing efforts.

First, a bit more on genome sequencing

Each plant cell contains the genome: a linear string of DNA base pairs (bp), which ultimately dictates that a corn seed will grow to become a corn plant, for instance, and not a banana or soybean plant. Determining the order of the DNA bases, or sequencing, allows researchers to decode the first layer of genome features such as protein-coding genes, repetitive areas called “repeats,” and the elements that regulate how genes are expressed in cells. Together, all of these features provide the genetic instructions that make each plant species unique.

An Arabidopsis plant

What was the first plant to have its genome sequenced?

The first plant to be sequenced was Arabidopsis thaliana, a wild member of the mustard family. Known as the lab rat of the plant world, Arabidopsis is considered the species for investigating plant genetics. This “model” plant is popular among researchers because it’s easily grown in the lab, completes its entire life cycle in about six weeks, and has a small genome of 125 megabase pairs (Mb), or 125 million base pairs (a megabase is 1 million base pairs). The genome of Arabidopsis was published in the journal Nature in the year 2000.

What other plant genomes have been sequenced since then?

Nearly three quarters of the sequenced plant genomes are from crop species, which isn’t too surprising given the importance of these plants to people. They include staple grain crops, such as maize (corn), rice, soybean, and wheat; fiber crops like cotton, hemp, and flax; fruits and vegetables including apple, watermelon, tomato, strawberry, potato, cucumber, grape, and Chinese cabbage; and crops that are primarily important in developing countries such as the pigeon pea.

Ninety four percent of the plants that have been sequenced are also flowering plants, or what are known scientifically as “angiosperms.” In addition, one “gymnosperm”—plants like pines, firs, and Gingko that produce seeds but not flowers—has been sequenced: the Norway spruce. Plus, two seedless plants that reproduce strictly via spores have had their genomes sequenced for research purposes: the “model” moss species, Physcomitrella patens, and the club moss, Selaginella.

How big are plant genomes?

The genomes of plants vary greatly in size (see the figure). The smallest known plant genome is that of the carnivorous corkscrew plant, Genlisea aurea, at 63 Mb; the largest is that of the rare Japanese plant, Paris japonica, at 148,000 Mb.

Of those plant genomes that have been published, the smallest so far belongs to a close relative of the corkscrew plant: the bladderwort, Utricularia gibba. A carnivorous denizen of nutrient-poor bogs that gets nutrition from feeding on insects, the bladderwort has a genome of 77 Mb in size. Contrast this with the largest plant genome sequenced to date—that of the Norway spruce at 19,600 Mb.

For comparison, the genome of the bacterium E. coli is about 4.6 Mb, while the human genome is 3,200 Mb—some 6 times smaller than the spruce genome.

Despite this great variation in genome size, plants tend to have roughly the same number of genes at about 32,000. Bladderwort, for example, retains a standard number of genes at 28,500, even though its overall gene is small.

So, if plants all have roughly the same number of genes, why do their genomes vary so much in size?

What scientists know from analyzing the sequenced plant genomes is that this broad range in genome size range appears to be driven by the proliferation of what are called “copy-and-paste long terminal repeat (LTR) retrotransposons.”

Retrotransposons are DNA sequences that can copy themselves to RNA and then back into DNA. The copied DNA may then integrate back to the genome, increasing its size. Retrotransposons are found in people and other animals, but they are especially abundant in plants. The corn genome, for example, is bloated with 75% LTRs. Bladderwort’s genome, on the other hand, is only 3% LTRs.

There’s another reason for this great size range. While people and most other animals are “diploid” meaning they contain 2 sets of chromosomes—one inherited from the mother and one from the father—many plants species, like wheat, are tetraploid (have 4 sets of chromosomes) or even hexaploid (carry 6 sets of chromosomes). The huge genome of wheat, in particular, makes it very hard to investigate (see below).

Why sequence the genomes of plants?

One of the biggest goals of plant genome sequencing is to make it easier and quicker for plant breeders to develop new crop varieties that better meet our growing needs for food, fiber, and fuel. Let’s look at a couple of examples.

In 2010, researchers published the genome of Brachypodium distachyon, a wild, annual grass, native to the Mediterranean and Middle East, with little agricultural importance. So, why sequence it? The reason is that Brachypodium is a close relative of other grasses, such as wheat, which are critical to world nutrition, but whose massive and complex genomes make them extremely hard to work with.

Brachypodium, on the other hand, has one of the smallest known genomes among grasses, is easy to grow in the lab and manipulate genetically, and has a short life cycle. Thus, by working with Brachypodium instead, scientists can more quickly make advances that can then be used to improve vital cereal crops, such as wheat and oats.

It’s a similar story with the Chinese cabbage, a close relative of oilseed rape. Used for cooking and industrial applications, oilseed rape is the second most important vegetable oil in the world and its production has doubled in the last 15 years. But it's an unusual hybrid that contains the entire genomes of two other plants: Brassica rapa and a closely related species called Brassica oleracea. Again, this makes the genome of oilseed rape difficult to study.

But by sequencing Chinese cabbage—a variety of Brassica rapa—researchers now have ready access to half of oilseed rape’s genes, without having to wrestle with oilseed rape’s genome itself. And because all the Brassica relatives—including broccoli, turnip, Brussels sprouts and cabbage—are closely related, the insights scientists gain by from sequencing Chinese cabbage are expected to improve the breeding efficiency of a range of crops essential to global food security.

This story is adapted from “The First 50 Plant Genomes,” which appeared in the July-Aug. 2013 issue of The Plant Genome.