Recent demand for biofuel, in light of perceived Brazilian success with sugarcane, has caused a re-evaluation of sweet sorghums as a source of energy (Rooney et al., 2007; Vermerris et al., 2007). Up to 13.2 t/ha of total sugars, equivalent to 7682 L of ethanol per hectare can be produced by sweet sorghum under favorable conditions (Jackson et al., 1980). Sweet sorghum and other sugar crops have been researched for biofuel production in the U.S. for over 30 years (Lipinsky et al., 1977). Primary research, development, and breeding began in the late 1970s when the high cost of oil spurred interest in alternative energy sources. These investigations were ended by 1987 when petroleum costs had decreased (DOE-OSTI, 2008).
Sweet sorghums, also called sorgos, were originally brought to the U.S. as landraces from China (cv. Chinese Amber) and Africa (cvs. Orange, Sumac/Redtop, Gooseneck /Texas Seeded Ribbon Cane, Honey, White African, and others) via France in the 1850s for producing syrup (sirup) and forage (Winberry, 1980; Maunder, 2000). Many of these original sweet sorghum landraces continued to be selected by farmers regionally in the U.S. and were renamed. Other cultivars were introduced later: ‘Collier’ from South Africa, ‘McLean’ from Australia, and others with unknown origin such as ‘Folger,’ ‘Coleman,’ ‘Sugar Drip,’ and ‘Rex,’ referenced as early as 1923 (Sherwood, 1923; Vinall et al., 1936; Maunder, 2000).
Almost all sweet sorghum cultivars improved with modern methods were bred at the USDA-sponsored U.S. Sugar Crops Field Station in Meridian, MS, from the 1940s until it closed in 1983. The Meridian station used landraces for plant improvement and released improved syrup lines. A few lines were also selected for sugar production and energy (biomass tonnage) in collaboration with others across the U.S., notably Texas and Georgia. Of the syrup lines bred and released by the Meridian station, release notes suggest primary improvement was focused on improving disease resistance in high sugar lines. Disease can alter sorghum juice, reducing the desirability of syrup and contributing to lodging. Besides disease resistance, other selected traits include high brix (very few report stem sugar), low purity juicy stalks, high yields, stalk erectness, and good quality syrup.
The Meridian, MS, station additionally curated a “sweet sorghum world germplasm collection.” When it closed, materials were transferred to the USDA sorghum collection in Griffin, GA (Freeman, 1979; USDA-ARS, 2008). Many accessions from this collection, used in later breeding, were obtained in a 1945 collecting trip by Carl O. Grassl around the African center of sorghum domestication (Freeman, 1979). Six of these African landraces, specifically MN960, MN1048, MN1054, MN1056, MN1060, and MN1500 were used in the pedigrees of many U.S. released improved sweet sorghum lines (Table 1). This suggests that there may be a narrow genetic base for U.S. sweet sorghum cultivars resulting in close genetic relationships. If the genetic base is too narrow there may be difficulty in breeding from this material to develop energy types.
|Name||Full name||Source||Source 2||Type||Parentage or place of origin||Reference|
|Bailey||Bailey||K||NSL 187557||MS||Wiley, Tracy||Duncan et al., 1984|
|Brandes||Brandes||T||NSL 29336||MS||Collier 706-C, MN1500||Coleman and Broadhead,1968|
|Brawley1||Brawley||U||PI 533998||MS||Rex, White-seeded Collier||USDA, 1958|
|CAmber1||Chinese Amber||U||PI 22913||A||Maunder, 2000|
|CAmber2||Chinese Amber||U||PI 248298||A||Maunder, 2000|
|CAmber3||Chinese Amber||T||ASA.45||A||Maunder, 2000|
|Colier1||Collier||U||PI 19770||HS||Maunder, 2000|
|Colier7||Collier 706C||U||PI 563032||HS||Maunder, 2000|
|Colier3||Collier Meridian||T||HS||Maunder, 2000|
|Colier4||Collier||T||PI 19770||HS||Maunder, 2000|
|Colman2||Colman (Young Meridian)||T||HS||Sherwood, 1923|
|Cowley||Cowley||T||MS||Collier 706-C, MN1054, MN960, MN 1056, MN 1054, Early Folgers Hodo, MN 1060||Kresovich et al., 1985|
|Dale||Dale||K||NSL 74333||MS||Tracy, MN960||Broadhead et al., 1970|
|Della1||Della||K||MS||BTx622, Dale||Harrison and Miller, 1993|
|Della2||Della||T||MS||BTx622, Dale||Harrison and Miller, 1993|
|Della3||Della||U||PI 566819||MS||BTx622, Dale||Harrison and Miller, 1993|
|EllisSo||Ellis Sorgo||T||HS||Leoti, Atlas||Karper, 1949|
|Fremont||Freemont Sorgo||T||Akron, Co||HS|
|GaBlueR||Georgia Blue Ribbon||T||HS||Freeman et al., 1973|
|HoneyS1||Honey Sorghum||U||A||Freeman et al., 1986|
|HoneyS2||Honey Sorghum||T||PI 181080||A||aka MN2931|
|Iceberg||Iceberg Sorgo||T||HS||Orange type|
|KColier||Kansas Collier||T||Anthony, Ks||HS||Maunder, 2000|
|KOrange||Kansas Orange||T||ASA.51||HHS||Maunder, 2000|
|Keller1||Keller||K||MS||MER 50–1, Rio||Broadhead et al., 1979|
|Keller2||Keller||T||MS||MER 50–1, Rio||Broadhead et al., 1981|
|M81E||M81E||K||NSL 174431||MS||Brawley, Rio||Broadhead et al., 1981|
|Mn1054||MN 1054||U||PI 152965||LMN||Sudan||Freeman, 1979|
|Mn1056||MN 1056||U||PI 152967||LMN||Sudan||Freeman, 1979|
|Mn1060||MN 1060||U||PI 152971||LMN||Sudan||Freeman, 1979|
|Mn1500||MN 1500||U||PI 154844||LMN||Uganda-aka Grassl||Kresovich et al., 1988|
|Mn2812||MN 2812||U||PI 167093||LMN||Egypt/Turkey|
|Mn291||MN 291||U||Grif 14968||LMN||Extra Early Sumac|
|Mn3046||MN 3046||U||PI 195754||LMN||China|
|Mn3083||MN 3083||U||PI 196586||LMN||India/Taiwan|
|Mn410||MN 410||U||PI 145619||LMN||S. Africa|
|Mn4125||MN 4125||U||PI 250583||LMN||Egypt|
|Mn4466||MN 4466||U||PI 255744||LMN||Turkey, Taslik village|
|Mn822||MN 822||U||PI 152694||LMN||Kordofan, Sudan|
|Mn856||MN 856||U||PI 152728||LMN||Sudan|
|Mn960||MN 960||U||PI 534165||LMN||Sudan||Freeman, 1979|
|N100||N100||T||PI535785||MS||Waconia, Wray||Gorz et al., 1990|
|N108||N108||T||PI535793||MS||Saccharum Sorgo||Gorz et al., 1990|
|N109||N109||T||PI535794||MS||White Collier, Grain Sorghum Line||Gorz et al., 1990|
|N110||N110||T||PI535795||MS||Red X||Gorz et al., 1990|
|N111||N111||T||PI535796||MS||Waconia||Gorz et al., 1990|
|N98||N98||T||PI535783||MS||Rio, Waconia, Fremont, AN39, N4692||Gorz et al., 1990|
|N99||N99||T||PI535784||MS||Fremont, Theis||Gorz et al., 1990|
|Orange1||Orange||U||PI 2363||HHS||Maunder, 2000|
|Orange2||Orange||U||PI 533902||HHS||aka MN 604||Maunder, 2000|
|Ranchr1||Rancher 3||T||Brookings, SD||A||Karper, 1949|
|Ranchr2||Rancher 3||T||ASA.93||A||Karper, 1949|
|RedTopT||Red Top Tennesse||T||HS||Winberry, 1980|
|Rex||Rex||U||PI 534163||HS||Sherwood, 1923|
|Rio1||Rio||T||MS||Rex, MN 1048||Broadhead, 1972|
|Rio2||Rio||T||MS||Rex, MN 1048||Coleman et al., 1965|
|Saccaln||Saccaline||T||HS||Vinall et al., 1936|
|Sapling||Sapling||T||ASA.55||HS||Vinall et al., 1936|
|Smith||Smith||U||PI 511355||MS||MN4004 (Grif 16302), MN 2754,Wiley, MN 48, MN 1056, others||Kresovich and Broadhead, 1988|
|SucreDm||Sucre Drome||U||PI 197542||LMN|
|SgrDrp1||Sugar Drip||U||PI 586435||HS||Freeman et al., 1986|
|SgrDrp2||Sugar Drip||U||PI 146890||HS||Freeman et al., 1986|
|SgrDrp3||Sugar Drip||K||HS||Freeman et al., 1986|
|SgrDrp4||Sugar Drip||T||HS||Freeman et al., 1986|
|SgrDrp5||Sugar Drip||T||Oklahoma A&M||HS||Freeman et al., 1986|
|SgrDrp6||Sugar Drip||T||Oklahoma A&M||HS||Freeman et al., 1986|
|Sumac1||Sumac||U||PI 63715||HHS||Maunder, 2000|
|Sumac2||Sumac||U||PI 35038||HHS||Maunder, 2000|
|Sumac3||Sumac||U||PI 534120||HHS||Maunder, 2000|
|TxDblSw||Texas Double Sweet||K||HS|
|Top76||Top 76–6||K||PI 583832||MS||Brandes, Collier 706-C, MN 1500, MN 1056||Day et al., 1995|
|Tracy||Tracy||T||NSL 4029||MS||White African, Sumac||Stokes et al., 1953|
|WcAmber||Waconia Amber||T||ASA.47||A||Maunder, 2000|
|WhtAfr1||White African||U||PI 52606||G|
|WhtAfr3||White African||T||Oklahoma A&M||G|
|WileyRL||Wiley R Line||K||HS||Stokes et al., 1956|
|WileySo||Wiley Sorgo||T||MS||Collier, MN 822, MN 2046||Coleman et al., 1956|
|Wiliams||Williams Sorgo||T||Ky. Certified||MS||Freeman et al., 1973|
|Wray||Wray||T||MS||Brawley, Rio, MN 856||Broadhead et al., 1978|
|BTx623||B.Tx623||T||G||BTx3197, SC170–6||Miller, 1976|
|BTx635||B.Tx635||T||G||Miller et al., 1992|
|Sureno||Sureno||T||G||S423,CS3541,E35||Meckenstock et al., 1993|
|EA1074||Rio 9188||T||Rio 9188||G|
|Ramada||Ramada||U||NSL 107377||MS||MER 45–45, MN 1056, MN 1054, MN1060||Freeman et al., 1974|
|Sart||Sart||U||NSL 91616||MS||Sudan||Stokes et al., 1951|
Although published pedigree information is available for some of the more recent sweet sorghum lines, the relationships with historic sweet cultivars and grain sorghums are poorly documented. A few genetic studies (Anas and Yoshida 2004, Casa et al., 2008) investigated grain sorghum germplasm panels that included some sweet sorghums. Further work by Seetharama et al. (1987) and Ritter et al. (2007) suggested that sweet sorghums are of polyphyletic origin, with relatives among kafir, caudatum, and other grain sorghum types.
Currently, there are no discrete objective criteria, such as a molecular marker or sugar concentration level, to differentiate sweet sorghums from grain sorghums. There are multiple generalized phenotypic differences: sweet sorghums are always tall, have high biomass and juicy stem [juicy versus dry stem is controlled by a major gene (Bennetzen et al., 2001)], and most importantly have high stem-sugar concentrations. Stem-sugar concentration may be quantitatively measured by high performance liquid chromatography (HPLC) or as brix, a measurement of soluble solids which in sorghums is mostly sucrose. Stem-sugar concentration inheritance is not simple; environment, genetic × environment interaction, and the genetic background (epistasis) all play a role. Within mapping populations, few QTL have been identified and they explain little variation given the moderate heritability (0.51 to 0.86) reported for the trait (Schlehuber, 1945; Clark, 1981; Natoli et al., 2002; Bian et al., 2006; Ritter, 2008; Murray et al., 2008a). In two different populations, Natoli et al. (2002) and Murray et al. (2008a), both identified the strongest QTL for stem sugar on chromosome 3, explaining 18, and 25% of the trait variance, respectively. Natoli et al. (2002), in an F2 population derived from a sweet sorghum × sweet sorghum cross, estimated the chromosome 3 QTL effect was 56% additive and 44% dominant. Murray et al. (2008a) used a recombinant inbred-line population derived from a sweet sorghum × grain sorghum cross, so only additive effects could be calculated. We chose to follow up the stem-sugar QTL on chromosome 3 as a candidate for association mapping in a diverse panel of sorghums.
Association mapping uses diverse material to associate genetic markers with a phenotype of interest, taking advantage of lower levels of linkage disequilibrium than are present in linkage populations. Association mapping has been used to identify genes of interest in many plant species with varying degrees of success (Wilson et al., 2004; Aranzana et al., 2005; Breseghello and Sorrells, 2006). In sorghum, a diverse grain sorghum germplasm panel for association mapping was previously reported by Casa et al. (2008). However, only eight of the 356 accessions could be considered “sweet sorghum” types. Though there likely was variation for brix, the panel was mostly dwarf grain sorghum uncharacteristic of tall and high-biomass sorghums of interest. We therefore assembled a panel that represents historically important U.S. sweet-sorghum cultivars, important sweet-landrace progenitors, and cultivars that would serve as non-sweet controls.
In this study we were interested in addressing three questions. (i) What are the genetic relationships among sweet sorghums in the United States? (ii) What are the genetic relationships among sweet and grain sorghums across grain racial classifications? (iii) Can we confirm the major QTL for total stem sugar (brix), or any of the QTL for height previously identified using association mapping?
Materials and Methods
Plant Material and Phenotypic Analysis
Two replicates of 125 diverse accessions were planted in College Station, Texas in 2006 (CS06) and 2007 (CS07), and one replicate was planted in Ithaca, NY in 2007 (ITH07). These accessions were primarily historical and modern sweet-sorghum cultivars, though grain, and forage sorghums were also included (Table 1). These accessions will subsequently be referred to as the “sweet sorghum panel.” Literature and the GRIN database (USDA-ARS, 2008) were used to identify cultivars as amber, historical sweet, modern sweet, modern sugar and energy, MN landraces (brought to Meridian, MS from Africa by C.O. Grassl), or grain types. We use the term “modern” to denote improved lines that have published pedigree information. Seed was obtained from a variety of sources for CS06 (Table 1), and seed bulked from self pollinated plants was planted for CS07 and ITH07. In CS06 and CS07, 3-m rows with 76 cm spacing (∼160,000 plants ha−1) were planted in a randomized complete block design. In ITH07 30 seeds were hand planted in 1.5-m rows with 76 cm spacing.
Some material was photoperiod sensitive and, depending on environment, there was a wide range for time of maturity. Plants were harvested when most accessions were in the soft-dough to hard-dough stage. By harvesting without regard to specific cultivar maturity we minimized the environmental effect, but likely caused biases in stem-sugar phenotypes due to flowering time, which peaks right before the hard dough stage (unpublished data). This would be expected to decrease our power but not create false positives. In each location, 1 m per row was harvested by cutting within 3 cm of the soil. Stems were separated from panicles and leaf tissue. Stem juice was extracted using a three roller mill. Brix was measured using a handheld refractometer. Measurements were collected on 1 m of row in CS06 and CS07. Measurements were collected from three random plants in ITH07. HPLC was performed according to Murray et al. (2008a). No HPLC analysis was performed for CS07 or ITH07. Plant height was averaged across each row from the soil to the top of the panicle for all three locations.
Leaf tissues were collected from plants grown at the CS06 location. DNA was extracted from pooled tissue of five or more plants using a standard CTAB protocol (Doyle and Doyle, 1987). Forty-six polymorphic SSRs, used in the diverse association panel of Casa et al. (2008), were evaluated using the same equipment and published methods (Xcup19, Xtxp065, Xtxp287 were not included). One SSR, Xcup55, was not polymorphic in the sweet-sorghum panel and was excluded from further analysis, resulting in 45 SSRs shared with Casa et al. (2008). Two additional SSRs, Xtxp120 (Menz et al., 2002) and a new SSR were successfully added (Xcup75; primers sequences: TTGCTTCATTCAACGGGAATACA, TTCGATGCAGCGAGCTTTGG). An additional 384 SNP genotypes were collected using an Illumina Goldengate assay (Fan et al., 2006) at Cornell's Life Sciences Core Laboratories Center (Ithaca, NY) using recommended procedures (Illumina Inc., San Diego, CA). These 384 SNP assays were developed from SNPs discovered in previously published (Hamblin et al., 2004, 2005, 2006, 2007a) and unpublished [Murray, this study (sucrose pathways); Salas Fernandez et al., 2009 (carotenoid pathways)] resequencing studies, and were chosen both to provide genome-wide coverage and to survey variation in genes of interest. A total of 226 loci are represented in the panel, of which 39 loci are candidate genes; the remainder is distributed across all ten linkage groups. Genetically mapped loci were chosen from resequencing studies of unannotated restriction fragment length polymorphism (RFLP) probes (see Schloss et al., 2002). Supplemental Table 1 shows the GenBank accession numbers for reference sequences and map position, where available. Of the 384 Illumina SNP assays, 329 were successful, and 322 were polymorphic in the sweet-sorghum panel.
To identify candidate genes for brix, the major QTL for brix in a cross between a grain sorghum and a sweet sorghum from Murray et al. (2008a) was located on the sorghum genome sequence (Phytozome, http://www.phytozome.net/sorghum; verified 26 Jan. 2009) using BLAST analysis with sequence-based markers (Menz et al., 2002; Feltus et al., 2006). More than 100 starch and sucrose metabolism enzymes (Kanehisa et al., 2006) and sugar transport candidate genes from maize (Zea mays L.), sugarcane, tomato (Solanum lycopersicum L.), and rice (Oryza sativa L.) (NCBI, http://www.ncbi.nlm.nih.gov/; verified 26 Jan. 2009) were also placed on the sorghum genome using BLAST to identify co-localization with the chromosome 3 QTL. New SSRs within the chromosome 3 QTL were identified from Phytozome contig sequences using the program Tandem Repeats Finder (Benson, 1999). Primer 3 (Rozen and Skaletsky, 2000) was used to design all primer sequences. All sequencing was performed on sweet-sorghum cultivar Rio at Cornell University's Bioresource Center using a 3730 capillary sequencer. Trace files were investigated for polymorphisms between Rio and grain sorghum ‘BT×623’ in Sequencher 4.0 (Gene Codes Corp., Ann Arbor, MI).
Genetic Distance and Principal Coordinate Analysis
The program PowerMarker version 3.0 (Liu and Muse, 2005) was used to evaluate FST (Wright, 1965) and create genetic distance matrices (Nei, 1972). Distance matrices were double-centered, and used to obtain eigenvectors, which were plotted in NTSYS-pc Version 2.02 (Rohlf, 1990).
To compare sweet sorghums with the larger sorghum panel of Casa et al. (2008), Nei's 1972 genetic distance matrix was created in PowerMarker using the polymorphic SSRs that had been scored in all accessions in both studies. Eigenvectors were obtained implementing the cmdscale function (eig = TRUE) and then plotted using R (R Development Core Team, 2005). R cmdscale was used rather than NTSYS-pc for this analysis because the data set was so large. Using smaller test data sets, the two principal coordinate analyses (PCoA) gave identical results (Gower, 1966).
Population Structure, Relatedness, and Association Mapping
To minimize false positives in association mapping it is important to control for population structure and relatedness (Falush et al., 2003; Yu et al., 2006). Three programs were used to estimate the number of populations and assign cultivars’ membership in them: Structure, version 2.1 (Pritchard et al., 2000), InStruct (Gao et al., 2007), and NTSYS-pc. Because population structure estimates assume unlinked markers, SNP assays from the same physical locus were converted into 208 haplotypic loci. Phase ambiguities were called as missing alleles and loci with more than 20% missing alleles were eliminated. Excluding brix candidate gene markers on chromosome 3, and including SSRs, a total of 241 markers were used. In both Structure and InStruct, five independent runs having 5 × 105 burn-in and sampling iterations were conducted allowing k (number of populations) to vary between 1 and 15. For Structure, the ancestry model allowed for population admixture and correlated allele frequencies. For Instruct, population structure and individual selfing rates were inferred. Optimal k was identified using the marginal improvements in estimated logarithm of the likelihood of the data, greater than 0.5 posterior population assignment probability, and on consistency of the five independent runs. k was additionally inferred using the DIC criterion in InStruct. Once k had been determined for both Structure and InStruct, a run of 5 × 106 burn-in and sampling iterations were used. PCoA eigenvectors from haplotypes were also used as population assignments.
Using the package SPAGeDi 1.2 (Hardy and Vekemans, 2002), a kinship coefficient estimation matrix was created according to J. Nason (described in Loiselle et al., 1995). Association mapping was performed using the GLM and MLM procedure in TASSEL (Bradbury et al., 2007). Six Q (population structure) matrices, with different numbers of populations, were separately tested for model percent variation explained of brix and height phenotypes. Positive tests were reported using a significance threshold of p < 1.3 × 10−4, based on a stringent Bonferonni correction of 0.05 divided by 369 tests.
Between all pair-wise comparisons of SNPs from different loci, linkage disequilibrium (LD) was minimal (Supplemental Fig. 1) in this panel, as expected with this low density of markers. Perfect LD (r2 = 1) was observed between at least two SNPs within each of four genes (SB00037, SB00076, SB00114, SB00130) and between two other pairs of SNPs (SB00124 and SB00027; SB00076 and SB00103) due to close physical distance.
Seventy-seven of the 125 cultivars were heterozygous or heterogeneous at one or more marker loci. Two known to be F1 forage hybrids segregated at the most marker loci, 41% (Forage 73) and 37% (Forage 41). MN landraces as a group averaged 22% heterozygous markers, with only MN960 having no heterozygous marker loci and Mn1054 having the most (37%). Departure from 1:1 ratios of alleles in some SNP assay results suggested that levels of heterozygosity were increased by pooling tissue from multiple individuals within cultivars, as landraces are often heterogeneous.
Cultivars in the sweet sorghum panel with identical names but different seed sources all had at least one genetic polymorphism (Table 2). With Sugar Drip, of the loci that differed, almost every possible combination of allele sharing across the six lines was observed. A few cultivars had very different names but identical genotypes potentially due to human error. ‘N110’ and ‘Sugar Drip 4’ were found to be exactly identical except for one locus with missing data. ‘Rox’ ‘Orange 2,’ ‘Saccaline,’ and ‘Sapling’ were also genetically identical. The phenotypes of these cultivars were very similar, so it appears possible the seed unintentionally came from the same source in error for the CS06 planting.
|Cultivar||Accessions||Shared alleles at 369 markers|
To identify accessions for use in breeding, it is useful to understand the relationships within the sweet sorghums and between sweet sorghum and grain sorghum's racial types. Genetic relationships were most easily seen by plotting the first two PCoA eigenvectors generated with the full SSR and SNP data set (Fig. 1). Three separate groups were observed and delineated based on historical references and breeding objectives. These three groups included a tight cluster of historical and modern syrup cultivars, modern sugar and energy sorghums with MN landraces, and amber types, which were the most diverse. Grain sorghums did not cluster in any one group. The first 12 PCoA eigenvectors explained 35.7, 21.4, 7.2, 6.3, 5.3, 4.4, 4.3, 3.6, 3.2, 3.1, 2.6, and 2.4% of the variation, respectively, totaling more than 100% due to model overfitting. The same three clusters seen in Fig. 1 were also observed when using only SNPs or only SSRs, though a few individuals did shift groups (data not shown). No clear relationships were observed when additional eigenvectors were plotted (data not shown).
To objectively assess sweet sorghum genetic relatedness to grain sorghum racial groups, PCoA analysis of SSR genotypes was used to compare the sweet sorghum panel to Casa et al.’s (2008) pure racial group (138 accessions, Supplemental Fig. 2). Comparing these two panels, the sweet sorghum historical and modern syrup group appeared most similar to kafir and to a lesser extent to bicolor. The modern sugar and energy sweet sorghum group appeared most similar to caudatum and possibly guinea types. The amber sweet sorghum group looked most similar to bicolor racial types but was more divergent than most of the material in the Casa et al. (2008) panel. The sweet panel had little material that was similar to durra types.
Candidate Gene Identification and Sequencing
The primary brix QTL identified in a cross between Rio and BT×623 (Murray et al., 2008a) was localized to a 15Mb sorghum super contig (Phytozome). A sorghum homolog to maize shrunken2—the large subunit of ADP-glucose pyrophosphorylase (Hamblin et al., 2007a), and a rice hypothetical monosaccharide transporter (NM_001053738) (NCBI) were the only sugar metabolism genes found to align to this Phytozome contig. Furthermore, these sequences were both located in a 2 Mb region flanked by the SSR marker bordering the QTL on the left, and an SSR marker close to the 2LOD peak border on the right (Supplemental Fig. 3). The full-length genes (as annotated), the 5′ and 3′ ends, and genetically close non-coding sequence were sequenced in Rio (a total of ∼20,000bp) and no polymorphisms with BT×623 genome sequence were observed. We then identified nine SSRs spaced through the 2 Mb interval. Only one out of the nine was found to be polymorphic between Rio and BT×623. This marker was included in all analyses (Xcup75).
Brix and height values were recorded in three locations. For the sweet-sorghum panel in CS06, brix and HPLC-measured stem sugar had good correlation (r = 0.73, p > 2.2e−16), with outliers caused by bacterial degradation in HPLC samples. Height and brix were positively correlated across locations (Fig. 2). Height had higher correlations within and across locations than brix in this panel. For brix, ITH07 was more similar to CS06 than to CS07. ITH07 did not correlate well with CS locations for height, due to photoperiod sensitivity which delayed flowering in some cultivars.
Population Structure and Association Mapping
To control for false positives in association mapping, Q (population structure) and K (kinship) matrices were first constructed (Yu et al., 2006). K is unrelated to k, the number of populations used in the model for Q. Six separate Q matrices were calculated using the two most likely population assignments in each of three programs, InStruct, Structure, and NTSYS-pc. InStruct results suggested five or eleven populations were likely with little posterior probability increase after eleven (Fig. 3). InStruct DIC criteria also found eleven populations to be most probable. Structure results suggested either four or eleven populations as most probable. Structure posterior probability continued to increase marginally past eleven populations, but consistency of runs and population assignment decreased. Because the posterior probability is calculated differently in Structure and InStruct, these cannot be directly compared (H. Gao, personal communication, 2008). Using haplotypes for PCoA resulted in eigenvectors very similar to those obtained using individual markers in Fig. 1.
Association mapping was performed for brix and height using the GLM procedure in TASSEL (Bradbury et al., 2007). Of the six Q matrices tested, models with 11 populations as inferred by InStruct and Structure explained the highest percent variation (Table 3). Models based on the smaller number of populations inferred by InStruct (k = 5) and Structure (k = 4) decreased the percent variation explained; the model with k = 4 also had a larger number of positive tests. Models using PCoA eigenvectors explained more variation than those with no Q matrix but much less than models based on Structure and InStruct analyses.
|Q matrix||Number of populations (k)||R2 model|
|InStruct + K||11||0.45||0.54|
|InStruct + K||5||0.41||0.49|
|Structure + K||11||0.49||0.55|
|Structure + K||4||0.40||0.50|
|PCoA + K||12||0.39||0.55|
|PCoA + K||5||0.39||0.55|
|None + K||0||0.37||0.48|
The MLM model, which included the kinship matrix, K, explained more variation than with Q alone. With MLM, results were nearly identical even if no Q matrix was added.
Using MLM with a Bonferroni corrected cutoff of 0.05 (1.3 × 10−4), five significant associations were detected for height, and one was detected for brix (Table 4). One marker, SB00016.1, was most significant for height and nearly significant for brix. For brix the only significant marker was SB00166.1.
FST of Populations and Markers
Wright's (1965) classical FST (θ) was used to evaluate genetic differentiation between populations in the panel (Table 4). Four separate methods were used for dividing the material into populations to address different biological questions.
1) Based on the a priori expectation of sorghum types [Table 1 (amber, historical syrup, grain, diverse)]. FST averaged 0.14 across loci (range: −0.04 to 0.47; negative FST values are likely due to imprecision in the estimation and should be interpreted as no genetic differentiation). Markers with high FST would be useful for distinguishing these a priori groups and might also be linked to traits important within only one population.
2) Using the three groups identified in PCoA analysis (Fig. 1). FST averaged 0.26 (range: −0.02 to 0.77). Markers had higher FST than our a priori division. Markers with the highest FST would be useful for assigning germplasm with unknown background to these groups.
3) Using a grouping based on brix. Cultivars in the top half highest brix in CS06, CS07, and ITH07 were in Population 3, cultivars in the bottom half for all locations were in Population 0. FST averaged 0.03 (range: −0.03 to 0.19).
4) Using the number of times a cultivar was in the top half of average height for a location, similar to divisions for brix. FST averaged 0.02 (range: −0.04 to 0.23). Markers with high FST when separated by brix and height may be linked to the phenotype of interest, and useful for characterizing different germplasms.
Relationships between these estimates of FST and association results may suggest incomplete correction. Markers with high FST did not have significant associations with traits, except in the case of SB00016.1.
From historical publications on sweet sorghum, it initially seemed likely that sweet types might be closely related to each other and distant from grain sorghums. Two recent publications have suggested otherwise. Casa et al. (2008), using 377 diverse sorghums including eight sweet cultivars, found that while a few sweet sorghums clustered together they were generally as diverse as grain sorghums. (A. Casa, personal communication, 2007). This finding was supported by Ritter et al. (2007) who, using amplified fragment length polymorphism (AFLPs), showed that 31 sweet sorghums clustered within three of the five clusters containing 64 diverse grain sorghums.
Harlan and deWet (1972) and others have classified sorghums into five major races: bicolor, caudatum, durra, guinea, and kafir. These divisions are mostly based on panicle and grain characteristics as well as the regions of Africa and India where the races are commonly found. Sweet sorghums have not been bred for panicle or grain characteristics, and the referenced origins of sweet sorghum provide little insight. Therefore, the relationship of sweet sorghum to the traditional classification of major sorghum races was inconsistent.
Our study, like that of Ritter et al. (2007), identified three separate groups of sweet sorghum which often are classified together. We classify these major types as syrup (historical and some modern), modern sugar and energy types with associated landrace parents, and amber types. These divisions were supported by PCoA, measures of FST, phenotypic observations, and structure analysis. Structure analysis and association results suggested that, within these three sweet sorghum groups, as many as eight additional subpopulation divisions exist (Supplemental Table 2). Population structure analysis is somewhat subjective and depends on the criteria used and the germplasm evaluated. Although InStruct and Structure assigned these subpopulations similarly, we did not observe a historical or biological basis for this further subdivision excepted where noted below.
Historical and Modern Syrup
Within the sweet sorghum panel, the historical and modern syrup population had the best representation but the least diversity. Among sweet sorghums cultivars the historical cultivars are best known, and the modern cultivars are some of the most common for syrup, Orange, Sumac, White African, Collier, Sugar Drip, ‘N98’ through N110, ‘Della,’ and ‘Bailey.’ Phenotypically, this material generally had straight, tall, very juicy, medium-large diameter stalks. Across the cultivars the juice had high average brix, but lower than the sorghums developed for sugar production. Two of the sorghums developed for sugar and having very high brix, ‘Keller,’ and ‘Wray’ were near classification in this group based on PCoA. The clustering of the syrup types reflects selections from historical material and shared pedigrees from syrup × syrup crosses. Furthermore, cultivar release notes show that most modern syrup sorghums were developed within the Meridian, MS breeding program. InStruct and Structure divided this population into 4 subpopulations of 19, 18, 14, and 12 individuals (Supplemental Table 2). An interesting case is Sugar Drip, which is divided into two groups. Based on polymorphism data Sugar Drip was likely heterozygous at many loci, which became fixed as different sets of seeds were isolated and maintained separately.
Sugar and Energy
Modern sweet sorghum cultivars for sugar and energy production such as Rio, ‘Ramada,’ ‘Top76-6,’ and ‘M81E’ tended to cluster together with MN landrace cultivars. Most MN landraces in the panel were specifically chosen because they were in the pedigrees of modern sweet sorghum cultivars. These MN cultivars were also from the center of sorghum domestication around Sudan, Ethiopia, and Uganda. This population was very diverse for brix and height. Nearly all of the cultivars were photoperiod sensitive, and had very thick stalks, some with hard rinds like sugarcane. The modern sugar and energy cultivars had very high brix while the MN landrace progenitors did not. Many of these cultivars, especially MN1500, produced very high biomass. We initially believed that MN1500 was ‘Grassl,’ a cultivar selected from MN1500, but the high heterozygosity suggested that it is likely the landrace MN1500 and that seed for Grassl are no longer available. In contrast to the expectation that the sweet sorghums derived from MN cultivars would have a narrow genetic base, the heterogeneity in these landraces likely contributed to the diversity seen in the modern cultivars. Population analyses (Supplemental Table 2) further divided this population into groups of 24 (most sugar energy and MN cultivars), nine (Rio, Keller, Wray), and six (grain and forage).
Amber and honey sorghums were very distinct from the other two populations but were also very diverse within the population. The weak clustering of amber may be partially the result of a limited number of cultivars being included in this study. Amber sorghums are not included in published pedigrees of modern sweet sorghum but were among the earliest sweet sorghums introduced to the U.S. Unlike most sweet sorghums, amber types tended to senesce in CS06 and CS07 locations, but did not in ITH07. Possibly as a result, amber cultivars had relatively higher brix in ITH07 than in either CS06 or CS07. Amber types among the sweet sorghums also had the least consistency of brix between environments with cultivars having a high brix in only one location. This is why no amber cultivars were identified as top sugar producers. Structure and InStruct (Supplemental Table 2) divided the ambers into subpopulations of 12 (all but one cultivar with amber in the name and ‘Sucre Drome’), six (Honey, ‘7035S’), and three (grain sorghums). PCoA suggested that Honey sorghums were most like race Durra, suggesting geographic genetic relationships, since Honey accessions and Durra are both from India. The amber population also had some of the most unusual cultivars, e.g., 7035S was the tallest cultivar in the panel, had a very large stalk, and was the only cultivar not to tiller at all and to senesce before it flowered in CS06. Sucre Drome was an interesting cultivar in this panel because it was the only one with a “dry” stalk, carrying a dominant gene that reduced stem moisture by 50% of the panel average and may be useful for cellulosic biofuel.
Sweet and Grain Sorghum Comparison
PCoA was useful to visualize genetic distances between sorghum races, between our sweet sorghum panel and the panel of Casa et al. (2008), and between individuals. Using PCoA, races tended to cluster together but were not distinctly separated as observed in rice, or maize (rice—Thomson et al., 2007; maize—Liu et al., 2003; Warburton et al., 2008; Hamblin et al., 2007b). Rio and BT×623 appeared to be closely related, and both were fairly distant from much of the other material. This suggests that variation found in the bi-parental population investigated in Murray et al. (2008a, 2008b) was more likely to be functional and not confounded by extreme divergence of genetic backgrounds.
The relationships in the sweet sorghum panel using only SSRs appeared to be similar to what was seen when the 322 SNPs were also included. In contrast, the PCoA eigenvectors explained less genetic variation using only SSRs. This discrepancy likely resulted from more rare alleles per locus, fewer loci, and a larger and more diverse germplasm set. From the combined data sets it appeared that the syrup sweet sorghums clustered best with kafirs, and modern sugar energy sorghums and the landraces cluster best with caudatums. Amber types appeared to be poorly represented in the panel of Casa et al. (2008) but clustered most like bicolor types. In general, the SSR PCoA shows that the panels are structured very differently, the sweet sorghum panel has greater diversity from amber types, the panel of Casa et al. (2008) has much more diversity from durra and caudatum types.
Population Structure and Relatedness in the Sweet Sorghum Panel
We attempted three separate methods for population assignment of cultivars, Stucture, InStruct, and PCoA. Though they use different algorithms for calculation, all three methods suggested that three populations were an absolute minimum, and both four to five and 11 to 12 populations met our selection criteria. Though Structure is widely used for identifying population structure, the program was developed for natural outcrossing populations. The sweet sorghum panel violates Structure's assumption of Hardy-Weinberg equilibrium and many lines share close pedigrees. InStruct, based on Structure, is a more valid method for a self-pollinated domesticated crop such as sorghum, because it relaxes the assumption of Hardy-Weinberg equilibrium (Gao et al., 2007). It was therefore surprising that Structure and InStruct resulted in nearly identical conclusions in this study. Finally, principal component analysis has been proposed to correct for population structure (Price et al., 2006) and similarly PCoA has been used in association mapping by Cockram et al. (2008). PCoA explained far more variation in this study than in Cockram et al., but the results of this approach were still disappointing in controlling for population structure.
Two main problems with population structure estimates are that they are subjective, on the basis of selection criteria, and they reduce very complex relationships into only a few numbers for population assignment. Thus, it is difficult to completely correct for genetic relationships using structure alone. From our results and model fit, it appears that using the kinship matrix (K matrix; Yu et al., 2006) better controlled for relatedness than any measure of population structure (Q matrix). In fact, we had better fit and fewer positive tests using K without Q than with any Q alone. It seems likely that this will be true for most bred material where admixed diverse crosses are routine, and closely related material has been selected.
Brix and Height QTL Association
The Sorghum bicolor genome is estimated to contain 811Mbp of DNA (Price et al., 2005). With 369 markers, the coverage in this study averaged one marker in 2.2Mbps. Although sorghum has much greater LD than maize, extending from a few kb to over 35kb, on the basis of the results of Hamblin et al. (2005) we would need at least 55,000 polymorphic markers for a saturated whole genome scan. However, LD is expected to vary greatly across genomic regions and different germplasms investigated. On the basis of the pairwise linkage disequilibrium between markers (Supplemental Fig. 1) the linkage blocks were not saturated in this population.
Given the average extent of LD in sorghum (Hamblin et al., 2005), it is unlikely that any marker locus tested was a causal polymorphism for phenotypic variation, but instead likely linked to the causal polymorphism. Two of the five positive height associations Xgap72, and Xtxp265, were on the same chromosome about 10Mb apart. QTL for height and/or flowering time have been found in this location on chromosome 6, corresponding to the photoperiod sensitivity gene ma1 (Lin et al., 1995; Rami et al., 1998; Brown et al., 2006; Murray et al., 2008a). This gene has undergone extremely strong selection for temperate adaptation in sorghum and detection over a long physical distance was not surprising.
The most significant QTL in this study was found on chromosome 9 for height. QTL for height in this location have been detected both by QTL linkage analysis (Pereira and Lee, 1995; Lin et al., 1995; Murray et al., 2008a) and by association analysis (Brown et al., 2008). Association analysis in the panel of Casa et al. (2008) detected a peak approximately 400kb away, with significant locus associations on both sides of the marker (SB00016.1) used in this study (Brown et al., 2008). This locus would also be expected to have long range LD given the strength of selection in sorghum for height.
The only significant association for brix, on chromosome 1, has also not been previously reported in linkage mapping studies. However, Murray et al. (2008a) did detect a QTL peak near this region in one location (the closest marker was Txp482, 5Mb away). This peak explained up to 9% of the variation for brix and sugar, but was slightly below the stringent threshold for significance (unpublished data). On the physical genome sequence, a sorghum homolog to glucose-6-phosphate isomerase (EC 220.127.116.11) is located ∼12kb away, the third closest predicted gene. Although this enzyme has not previously been implicated in stem sugar accumulation, it is known to convert D-glucose 6-phosphate into D-fructose 6-phosphate, both of which are important for synthesizing sucrose (Kanehisa et al., 2006).
We also attempted to identify additional markers for association mapping to support a QTL for Brix on chromosome 3 detected by Natoli et al. (2002) and Murray et al. (2008a), but were unsuccessful. Furthermore, association analysis using three SSRs and one SNP in this region did not detect any significant associations.
Implications for Germplasm Collection, Conservation, and Breeding
The results of this analysis suggest that for genetic studies, and/or core collection development, as few as five cultivars from the sweet sorghum panel could be selected to represent 90% of the SNP alleles identified. Thus, within the sweet sorghum panel, many of the accessions could be considered redundant for germplasm conservation, especially in the population of syrup cultivars. These differences reflect close pedigrees with similar parentage.
To identify the most informative markers to differentiate the three main groupings, FST for each marker was calculated between populations defined on the basis of PCoA. A few of the markers having high FST (PCoA column in Table 4) could be applied to identify a population for sweet sorghums not included in this panel.
The diversity partitioned within sweet sorghum and between sweet and grain sorghum has implications for how this germplasm should be maintained. An interesting observation regarding same named accessions, the six Sugar Drips for example, is that older cultivars were more diverse than the newer ones. There are two obvious explanations, residual heterozygosity would be greater for landraces than for elite cultivars, and over time more outcrossing is likely to occur. Inexpensive DNA markers may make testing easy, but it may be prudent, to reduce redundancy in core collections that duplicates of modern named materials should be removed before historical landraces.
For crop improvement, understanding the diversity present within the three identified groupings and their subgroupings is important. For breeding of syrup cultivars, a larger and less diverse selection of elite material from the modern syrup cultivars would be most useful. For breeding energy types for biofuel (lignocellulose and sugar), further selections from within the sugar and energy population and hybrids across groupings would be most appropriate.
We have identified three major groupings within sweet sorghum, each with multiple subgroupings. This information is beneficial for understanding the origin of sweet sorghums and to identify material for further improvement. These groupings showed some clustering similar to racial types within grain sorghums, but sweet and grain sorghums remain distinct in phenotype and origin. We have identified a marker with significant association for brix and identified a nearby candidate gene, glucose-6-phosphate isomerase, to be tested in the future. Future work within and across these populations may enable molecular cloning of genes responsible for stem-sugar accumulation in sorghum. Understanding the genetic basis for variation in stem sugar may ultimately allow genetic improvement of relatives with more complex genomes such as sugarcane, maize, switchgrass, and miscanthus.