Novel plant varieties can be generated through genetic or transgenic strategies or some combination thereof. Genetic strategies could include crosses with elite varieties, landraces, or wild accessions or mutational breeding techniques such as somaclonal variation or chemical mutagenesis. Transgenic strategies could include overexpression of some genes of interest while silencing others, preferably in elite germplasm with excellent agronomic qualities. Assessing progress can be difficult, especially for complex traits that may not be fully defined at the molecular and cellular levels. However, it is in this area that systems biology approaches can accelerate plant varietal development through filling in key gaps in knowledge (Fernie et al., 2006; Toubiana et al., 2012). Taking as comprehensive a view as possible also allows for the determination of both positive (e.g., enhancement of the trait of interest) and negative (e.g., unintended effects that may diminish the impact or acceptance of a variety) changes to crop quality and composition. While large-scale positive changes may be obvious, small-scale negative effects may be of equal importance to producers, regulators, consumers, or other stakeholders. Transgenic (genetically modified) crops continue to elicit controversy despite over a decade of widespread cultivation. Although contemporary genetic engineering techniques appear to induce small genomic perturbations relative to widely accepted conventional technologies (e.g., genetic crosses between wild relatives and cultivated varieties, somaclonal variation, and mutation breeding), enduring concern focuses on the unknown potential for transgenic technologies to generate unintended effects in ecosystems and food composition (Beachy et al., 2002; Schubert, 2002; Brookes and Barfoot, 2006; Hoekenga, 2008). The possibility of detecting such unintended effects is complicated by a lack of comprehensive knowledge as to the range of “normal” biochemical phenotypes that occur in important crop species. This is illustrated by the recent discovery of new compounds in a few very well characterized nontransgenic crops (Parr et al., 2005).
The concept of substantial equivalence attempts to bridge this divide by hypothesizing that little risk is associated with transgenic genotypes that are essentially the same as their nontransgenic relatives. Therefore, in the absence of large and unexpected differences, extensive safety testing of novel transgenic varieties may not be warranted (Kuiper et al., 2001). It is unclear, however, how different a transgenic variety can be from its nontransgenic counterparts while remaining “substantially equivalent” or what methods should be used to assess this (Kok and Kuiper, 2003). As a starting point, many studies have investigated the relative impacts of transgenes on crop phenotypes relative to other sources of variation, as measured with a broad array of molecular and physiological techniques. Such studies have consistently found that variation associated with the transgene does not exceed other common sources of variation: environment, genetic background, induced mutations, and agronomic practices (Defernez et al., 2004; Catchpole et al., 2005; Lehesranta et al., 2005; Baker et al., 2006; Baudo et al., 2006; Shewry et al., 2006; Batista et al., 2008; Cheng et al., 2008; Di Carli et al., 2008; Barros et al., 2010; Kogel et al., 2010).
While these and other studies suggest that transgenic modifications may impact phenotypes less than other common sources of variation, exploration of unexpected differences and the extent to which small yet biologically important differences may occur in the absence of large, systemwide differences has been limited. This is due, in part, to the inherent constraints of popular statistical approaches such as principal component analysis (PCA) and ANOVA to highly multivariate “omics” datasets that may contain tens of thousands of variables (Lay et al., 2006). For example, while PCA is well suited to visualizing the relative similarity of multiple genotypes or treatments within easily understood plots, such plots are dominated by the influence of those variables that differ most among observations, thus marginalizing the impact of smaller but potentially important differences between certain classes, for example, transgenic and nontransgenic genotypes (Jolliffe, 2002). This limitation also makes PCA poorly suited to combining multiple datasets from different sources, as a single dataset that describes large differences among samples can obscure smaller but potentially important differences present within other datasets. While such smaller differences may still be described in higher principal components (i.e., beyond principal components 1 and 2), the inability to directly query for them makes this approach less than ideal. Moreover, while ANOVA approaches are capable of directly identifying all variables that differ among sample classes, they commonly detect hundreds or thousands of statistically significant variables. Comprehensively investigating such long lists of variables inherently requires additional filtering steps, such as limiting further investigation only to those variables with the greatest dynamic ranges or statistical significance. As previously mentioned, such an approach that assumes biologically significant differences will also be among the most statistically significant fails to test the central assumption of substantial equivalence, that small differences are unlikely to be biologically important.
Co-expression network analysis is one such approach that we believe is especially complementary to more commonly used approaches such as PCA and ANOVA. Co-expression network analyses cluster known and unknown variables (typically but not exclusively messenger ribonucleic acid transcripts) by expression pattern, which allows differences among samples to be examined at multiple scales, thus providing a truly systems view of network structure and function (Aoki et al., 2007). Although many co-expression network analysis methods are available, we chose to use the comprehensive yet accessible weighted correlation network analysis (WGCNA) package, which has been implemented in the open source language R (R Development Core Team, 2009; Langfelder and Horvath, 2008; DiLeo et al., 2011; Shen et al., 2013).
Here, we examine the relative effects of genetic and transgenic variation on the metabolomes of field-grown tomato fruit with nontargeted liquid chromatography–mass spectrometry (LC-MS) metabolic fingerprinting. A diverse set of genotypes was selected from modern breeding lines, commercially important varieties, obsolete commercial varieties, and vintage and specialty varieties to estimate the range of metabolic diversity commonly achieved by conventional plant breeding. Single gene mutants of large effect were also used with inbred backgrounds; the influence of the ripening-inhibiting alleles rin and nor were compared in multiple genetic backgrounds to transgenic lines containing silencing constructs engineered to mimic the rin mutant phenotype (Vrebalov et al., 2002; Giovannoni, 2004). Following LC-MS analysis, peak detection, and alignment, PCA was used to illustrate the overall metabolic similarity among genotypes. Subsequently, partial least squares discriminant analysis (PLS-DA) was used to identify those metabolic features that differed most consistently between otherwise similar transgenic and nontransgenic genotypes. Finally, WGCNA was used to find additional metabolic features that were associated with identified differences, both to explore the extent of transgene-induced variation and to better annotate the identities of differing metabolic features.
MATERIALS AND METHODS
Tomato genotypes were chosen to represent the broad phenotypic and genetic diversity of vintage, modern breeding lines, and commercially grown varieties (Supplemental Table S1). Some genotype substitutions were made in 2009 due to poor performance at our site in 2008 (Homer Thompson Research Farm, Cornell University, Freeville, NY). The commercially relevant homozygous ripening mutant rin was included, as was the phenotypically similar nor/nor mutant, both in various genetic backgrounds. Transgenic plants (Alisa Craig [AC] transgenic [tg] rin red) were created with the pHellsgate12 silencing constructs targeted to the Rin 3′ untranslated region (UTR) (corresponding to 979–1133 nucleotide [nt] of LeMADS-RIN, AF338522.1) and 5′ UTR (2–236 nt of LeMADS-RIN) but failed to obviously alter fruit ripening when grown under field conditions (Wesley et al., 2004). Additional transgenic plants (AC tg rin orange) created with full-length complementary DNA (cDNA) silencing constructs targeted to Rin were obtained and displayed a semiripe phenotype when grown under field conditions (Vrebalov et al., 2002).
Tomato seeds were germinated in a climate-controlled greenhouse and then transplanted and cultivated under standard horticultural practices in Ithaca, NY, during the summers of 2008 and 2009. During 2008, two plants per genotype were planted in each of two replicated field blocks, with 2 m spacing between plants and 4 m spacing between rows. One pool of 10 fruit was obtained from the two plants in each block. Fruit were picked when field-ripe. The same strategy was used in the conventional variety field in 2009. Additionally, a second field was planted in 2009 that included all transgenic plants in addition to cultivar Ailsa Craig (wild-type, rin and nor mutants) and cultivar Debarao plants as controls. A randomized, duplicated design was used for the transgenic evaluation plot. Ten independent transgenic events were selected for both the 3′ UTR and 5′ UTR pHellsgate12 constructs; replicate plants were created by vegetative propagation. Presence of the transgene was confirmed by Southern blotting in each plant before transplanting in the field (Supplemental Table S2). In this second field, five fruit were picked from each individual plant and pooled. All of the fruit from the transgenic test plot were developmentally staged and picked 45 d past the 1 cm fruit stage. Whole tomato fruit were flash frozen in liquid N2, pulverized, pooled as indicated, and extracted in a methanolic solution as described previously (DiLeo et al., 2011).
Nontargeted Liquid Chromatography–Mass Spectrometry Metabolic Fingerprinting for 2008 Field Season
Methanolic extracts were concentrated using a SpeedVac (Labconco) and reconstituted with an equal volume of water, with 2% acetonitrile and 0.5% formic acid, resulting in a pH of approximately 3 (De Vos et al., 2007). Ten fruit were pooled for each genotype and injected as three technical replicates from each of the duplicated plots. High performance liquid chromatography (HPLC) was performed on an LC Packings UltimatePlus capillary HPLC system equipped with a Switchos valve switching unit and a Famous auto sampler (Dionex). The system was modified for bypass of the calibrator for direct pump flow at 60 μL min–1 through a Vydac C18 RP column (1 mm i.d. by 150 mm). The column temperature was maintained at 40°C. A hybrid triple quadrupole linear ion trap mass spectrometer, the 4000 Q Trap from ABI/MDS Sciex equipped with Turbo (standard) ion source, was used for mass spectrometry. Buffer A contained 0.2% formic acid and buffer B contained 80% acetonitrile with 0.2% formic acid. A 35-min gradient of 3 to 50% buffer B was used for elution followed by a ramp of buffer B to 80% within 1 min and a hold for 5 min before re-equilibration of the column at 3% buffer B. High performance liquid chromatography and mass spectrometer parameters were optimized using standard solutions of naringenin and chlorogenic acid (Sigma Chemical).
Data acquisition on the 4000 Q Trap was performed using Analyst 1.4.2 software (Applied Biosystems) in the positive ion mode with information dependent acquisition (IDA). A 5 kV spray voltage was used for all experiments. Nitrogen was used as both the curtain gas (value of 10) and collision gas (set to high), with a source temperature of 200°C. The declustering potential was set at 40 V to minimize in-source fragmentation. The ion source nebulizer gas was set to 35 (arbitrary unit) and heated gas 2 was set to 30. Information dependent acquisition analysis was set so that each survey scan at 1000 amu s–1 across m/z 100 to m/z 800 was followed by two tandem mass spectrometry (MS/MS) at 4000 amu s–1 covering m/z 50 to 800. Collision energy was fixed at 15 eV with CES at 10 eV. The IDA criteria are to exclude former targeted ions after one occurrence for 30 s with the Dynamic Background Subtraction (DBS) script on.
Markerview 1.2 (Applied Biosystems) was used to identify, align, normalize, and filter ion peaks in LC-MS spectra, producing a list of 1373 features corresponding to all ions reliably observed at each retention time, accompanied by the semiquantitative intensity of each ion (Supplemental Table S3).
Nontargeted Liquid Chromatography–Mass Spectrometry Metabolic Fingerprinting for 2009 Field Season
Methanolic extracts of whole tomato fruits were obtained as described above but resolved in 2009 on a Quantum Access triple quadrupole system (Thermo Finnegan LLC) (Zhang et al., 2009). Samples were separated on a ThermoFisher Accela HPLC equipped with a Gemini C18 reversed phase column (3 µM i.d., 150 by 4.6 mm; Phenomenex). Buffer C contained 0.1% formic acid in acetonitrile and buffer D contained 0.1% formic acid in water. A 20 min gradient of 5 to 75% buffer C was used for elution followed by 5 min elution in 75% buffer C/25% buffer D, with a 1 min ramp to 5% buffer C/95% buffer D and a 4 min hold with 5% buffer C before re-equilibration of the column. Data were acquired in positive ion mode, with 5 kV spray voltage, and survey scans across m/z 100 to m/z 1500.
MZmine version 1.97 was used to identify, align, normalize and filter ion peaks in the 2009 LC-MS spectra, producing a list of 992 features corresponding to all ions reliably observed at each retention time, accompanied by the semiquantitative intensity of each ion (Supplemental Table S4) (Pluskal et al., 2010). Metabolites were tentatively identified by comparing mass spectra to the online database, MassBank, and other available literature (Moco et al., 2006; Yamanaka et al., 2008, 2009; Horai et al., 2010).
The tentative identification of specific metabolites was accomplished by comparing pseudomolecular ions and their daughter fragments to the literature and available databases (Moco et al., 2006; Yamanaka et al., 2008, 2009; Horai et al., 2010). Daughter fragments were identified directly by tandem mass spectrometry and indirectly by exploring WGCNA clusters of highly co-expressed features with nearly identical retention times (DiLeo et al., 2011). The tentative identities of metabolites references in this study include tomatoside A (1065.4, 903.3, 741.3, 579.2 m/z [M + H]+, and 9.96 m 2009 LC-MS and 741.3 with MS/MS fragments 579.4, 434.4, 285.2, 273.3, 255.3 m/z [M + H]+, and 35–42 m 2008 liquid chromatography-tandem mass spectrometry [LC-MS/MS]), dehydrotomatoside (1063.4, 739.2, 517.0 m/z [M + H]+, and 9.86 m 2009 LC-MS and 739.2 with MS/MS fragments 577.4, 433.3, 397.3, 283.3, 271.3, 253.3 m/z [M + H]+, and 35 m 2008 LC-MS/MS), lycoperoside H (918.3, 594.2, 432.2 m/z [M + H]+, and 8.15 m 2009 LC-MS and 594.2 with MS/MS fragments 576/7.4, 558/9.4, 547/8.4, 435/6.4, 287.1, 273.4, 255.3 m/z [M + H]+, and 36–39 m 2008 LC-MS/MS), and possibly esculeoside A, F, or G or lycoperoside F or G (1270.5, 814.3, 652.2, 250.0 m/z [M + H]+, and 7.75 m 2009 LC-MS) (Eich, 2008; Yamanaka et al., 2008, 2009).
Unintended Effects on Growth Experiment
Plants were grown in the greenhouse to confirm the observations of reduced root biomass made in the 2009 transgenic evaluation field plot. Four representative transgenic lines from each of the 5′ UTR and 3′ UTR silencing constructs were grown in the greenhouse using a sandy potting mix for 28 d under standard horticultural conditions. Wild-type AC plants were sown and grown alongside. At the end of the experiment, soil was washed from roots so that fresh weight estimates could be made for both root and shoot biomass.
All statistical analyses were completed in R (version 2.11.0) (R Development Core Team, 2009). Principal component analysis and partial least squares regression were performed on Pareto-scaled metabolic fingerprints using the R packages prcomp and plsr. Boxplots and Kruskal–Wallis rank sum tests were based on autoscaled metabolic fingerprints, where technical replicates were first averaged (van den Berg et al., 2006). Weighted correlation network analysis was performed according to developer provided training materials (http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/index.html) on autoscaled metabolic fingerprints, where technical replicates were first averaged, using the R package WGCNA (Langfelder and Horvath, 2008; DiLeo et al., 2011). Weighted correlation network analysis was used to examine metabolic network topology of ripe, consumer-accepted tomato fruit and, accordingly, excluded ripening mutants AC rin, AC nor, and North Carolina State University released germplasm (NC) rin but included all transgenic genotypes. Weighted correlation network analysis was additionally used to compare the local topology surrounding metabolite “1065” between AC tg red and its nontransgenic parental genotype although the full metabolic networks of these two genotypes could not be compared due to a lack of statistical power. Associations among features were projected as nodes and edges with Cytoscape (version 2.6) (Smoot et al., 2011). Weighted correlation network analysis assigns metabolites to unsigned, co-regulated “modules,” which includes features that are both positively and negatively correlated. Due to logistic limitations, genotypes were often grouped based on genetic background and phenotype to maximize statistical power, for example, AC tc and AC Rin/rin were combined with AC normal for specific comparisons between the AC tg rin red class and the nontransgenic parental genotype. Three technical replicates from a single AC Rin/rin individual grouped with the AC rin and AC nor mutants in Fig. 1. This pool was created from a plant that produced only green fruit, possibly due to an epigenetic effect. Like AC rin and AC nor fruit, this unexpectedly nonripening heterozygote was not included in PLS-DA or Kruskal–Wallis rank tests of AC tg rin red versus its nontransgenic parental genotype or in WGCNA of AC vs. AC tg rin red.
Tomato Metabolomes Cluster by Genetic Background, Ripening Gene Alleles, and Ripening Phenotype
Nontargeted LC-MS metabolic fingerprints were generated from whole fruit methanolic extracts of a diverse panel of field-grown tomatoes over two seasons (Supplemental Table S1). One thousand seventy-three metabolic features were reliably detected from these extracts in 2008 and 992 were detected in 2009. These methanol-extracted metabolites describe a limited yet biologically important subset of the tomato fruit metabolome that includes sugars, organic acids, amino acids, phenylpropanoids, and alkaloids (Supplemental Tables S3 and S4; DiLeo et al., 2011). These metabolic features were analyzed using a progression of statistical techniques to characterize and visualize the variation in this alcohol-extractable fraction of the metabolome. Principal component analysis revealed consistent relationships among the measured genotypes over 2 yr despite the use of different field locations, LC-MS platforms, and software packages to collect the data (Fig. 1). Tomato genotypes clustered by genetic background, by Ripening-inhibitor (Rin) and nonripening (Nor) allele states and by the apparent ripeness of sampled fruit. For example, a series of modern tomato breeding lines and commercial hybrids developed at North Carolina State University (NCSU) that contained functional, wild-type alleles of Rin and Nor co-clustered in both datasets, reflecting their shared genetic background, Rin and Nor allele states, and red-ripe fruit phenotype (see Supplemental Table S1) (Gardner, 1992, 1999, 2000, 2006). However, a homozygous rin/rin mutant variety from this same collection (NC1 rin EC) produced ripening-inhibited fruit and was consistently separated from its red-ripe relatives. In commercial practice, this mutant variety is used to create heterozygous hybrid tomatoes with extended shelf life and other desirable agronomic traits (e.g., cultivar Mountain Crest, an F1 hybrid between NC84173 and NC1 rin EC) (Gardner, 2006). Mountain Crest, which produces red-ripe fruit and exhibits a Rin/rin allele state, clustered with red-ripe, Rin/Rin NCSU genotypes in both years. Likewise, the vintage variety Ailsa Craig (AC) (red-ripe, Rin/Rin, and Nor/Nor) and the obsolete commercial variety Rutgers (red-ripe, Rin/Rin, and Nor/Nor) were both separated from their isogenic counterparts that contained homozygous rin/rin or nor/nor allele states and displayed impaired ripening (AC rin, AC nor, Rutgers rin, and Rutgers nor). Unlike other rin/rin and nor/nor genotypes, however, the NC1 rin EC breeding line (rin/rin) was created specifically to maximize ripeness through selection on light-responsive ripening pathways that are not dependent on Rin (Gardner, 2006). This variety produced fruit that were often yellow or orange while other rin/rin mutants in the AC and Rutgers backgrounds produced consistently green fruit with small flecks of yellow. Accordingly, NC1 rin EC (rin/rin) was more similar to its Rin/rin and Rin/Rin relatives (Mountain Crest and NC84173, respectively) than AC rin (rin/rin) or Rutgers rin (rin/rin) were to theirs. In all three of these genetic backgrounds, fruit with Rin/Rin and Rin/rin allele states were found to co-cluster except for a singular 2009 AC × AC rin plant (Rin/rin), which produced uncharacteristically unripe fruit that co-clustered with visually similar AC rin (rin/rin) fruit.
Caro Rich, a variety descended from Rutgers and the wild relative Solanum habrochaites S. Knapp & D. M. Spooner, has an orange-fleshed fruit with high β-carotene and co-clustered with Rutgers (Tigchelaar and Tomes, 1974). Green Zebra, a modern specialty variety that produces fully ripe fruit that soften but remain green, appeared intermediate between other ripening and ripening-impaired genotypes. Other non-red fleshed, fully ripening vintage and modern specialty varieties (Banana Legs, Peach, and White Beauty) also held intermediate positions. As it was difficult to determine when these last four varieties, particularly Green Zebra, attained complete ripeness their intermediate PCA locations may be partially due to the inclusion of incompletely ripe fruit in the samples analyzed. Hank, a specialty variety that produces small, lobed, pink fruit, was located far from the other ripe varieties in both years. Tomato genotypes were not separated by the compositional differences associated with the different market classes for tomato, for example, the divergent selection for percentage of solids and epidermal thickness between fresh market and paste varieties. Principal component 2 was associated with differences in ripening in both 2008 and 2009. Principal component 1 in 2008 appeared to separate genotypes both by ripening and genetic background, particularly by clustering ripe NCSU genotypes apart from other red-ripe tomatoes. Overall, most tomatoes with functional Rin and Nor alleles were found to be very similar, with the closest associations existing among plants with similar genetic backgrounds. Genotypes with homozygous nonfunctional alleles of Rin and Nor failed to complete climacteric ripening and displayed consistently different metabolic fingerprints.
Ripening Delayed Fruits Have Similar Metabolomic Phenotypes Whether Mutant or Transgenic
Following this initial characterization of metabolomic variation among diverse, consumer-relevant tomato genotypes, specific impacts of transgenes and transformation were considered. Two transgenic rin constructs were created in the AC background so that transgenic plants could be compared directly to controls with the same genetic background and phenotype. These constructs used the pHellsgate12 vector to create hairpin RNAs to elicit gene silencing against LeMADS-RIN based on either the 5′ UTR or 3′ UTR sequences of this gene (Wesley et al., 2004). Ten transgenic events for each of the UTR silencing target sequences were selected for trials in the 2009 field season based on observed gene silencing in the 2008/2009 greenhouse. However, little or no gene silencing was observed from these constructs under field conditions and as such, this collection is referred to as “AC tg rin red.” In addition, a nontransformed control, which was exposed to Agrobacterium and regenerated through tissue culture (AC tc), was grown in the field in 2009 (Supplemental Table S3). These 21 lines representing three genotypes were not included in the PCA in Fig. 1 representing the 2009 field season. For clarity, the PCA for these varieties is shown in Fig. 2 solely with the other genotypes that were planted in an isolated test plot in 2009 for transgenic evaluation. These other varieties included the wild-type AC variety, rin and nor isogenic mutant AC lines, a Rin/rin heterozygous AC line (AC × AC rin), and a second, red-ripe vintage variety, Debarao. A third transgenic construct was used to silence LeMADS-RIN, based on a full-length cDNA antisense construct (Vrebalov et al., 2002). While previous reports for this antisense cDNA construct indicated very strong gene silencing, in the 2009 field a mild impairment in ripening was observed (“AC tg rin orange” as the fruit were orange-yellow colored and had partial softening). The metabolic fingerprints of the fully ripe AC and Debarao fruit co-clustered in the PCA for the transgenic field experiment as they did in the analysis of conventional tomato varieties (Fig. 2). As mentioned above, one of the AC × AC rin heterozygous plants produced only fruit that failed to ripen and was found to be grouped in the PCA with the ripening-inhibited AC genotypes AC rin and AC nor. AC tg rin orange was found to be intermediate to those of red-ripe and ripening-inhibited Ailsa Craig genotypes, as would be expected by the transgenic line’s semiripe phenotype. Likewise, the AC tg rin red fruit, which were not visibly inhibited in ripening, clustered with nontransgenic AC in spite of the confirmed presence of the pHellsgate12 construct in all 20 transgenic lines (Supplemental Table S2). Untransformed AC controls that were exposed to Agrobacterium and tissue culture (AC tc) fell almost completely within the measured range of the wild-type AC plants. Overall, metabolic fingerprints of transgenic rin tomato genotypes fell completely within the measured range of consumer- or breeder-acceptable tomato genotypes. According to the PCA, fruit from the transgenic plants were less divergent from wild-type plants than natural null mutants of rin and nor were, reflecting the magnitude of changes due to ripening competence had far larger effects on the metabolome than the presence of a transgene.
Unique Metabolite Accumulation Patterns are Identified in Transgenic Tomatoes
While PCA is useful for comparing the relative similarity of different individuals in an unbiased fashion and identifying the major sources of variation among them, it is not particularly effective at identifying systematic differences between classes that are relatively similar. Accordingly, the supervised approach of PLS-DA was then applied to find the major differences between red-ripe Ailsa Craig plants that did or did not contain the transgene constructs. Therefore, this comparison tested specifically for unintended effects of transgenic modification in an otherwise shared genetic background. The strength of this comparison was increased inadvertently as the construct failed to fully produce the intended effect of delayed fruit ripening. Partial least squares discriminant analysis is more effective than PCA in this case as it focuses specifically on the features that distinguish transgenic and nontransgenic fruit, rather than describing the features that vary most overall. The overwhelming similarity between metabolic fingerprints of transgenic and nontransgenic red-ripe Ailsa Craig fruit is revealed by the minimal separation achieved by PLS-DA (Fig. 3). Only 28 of the 992 chemical markers observed in the collected LC-MS data were identified by PLS-DA to exceed an arbitrary and absolute value of 0.10 for loading on component 1. None of these features were novel metabolites present uniquely in the transgenic genotypes. The semiquantitative concentrations of these 28 features were then compared between AC tg rin red fruit and the full measured range of tomato chemical diversity (Fig. 4). Fifteen of these features were found in the transgenic fruit at concentrations that were outside the range observed within the diverse conventional varieties and were pursued further (bold values in Fig. 4). Four of these features were at higher concentrations in AC tg rin red fruit than wild-type AC while the other 11 compounds were found at lower concentrations. These 15 features were then analyzed for significant differences in abundance between transgenic AC tg rin red and its nontransgenic parental genotype using the Kruskal–Wallis rank sum test. Five of these features were found at significantly different concentrations in AC tg rin red relative to its nontransgenic parental genotype at a Bonferroni-adjusted p-value of 0.00179: 579.2, 741.3, 1065.4, 1066.4, and 1067.4 m/z at 9.96 m elution time. All five features appeared to be ion fragments of a single metabolite, “1065,” which was expressed at approximately half the level in AC tg rin red as the AC wild-type controls, on average. Elution and mass fragmentation characteristics suggest that metabolite “1065” may be tomatoside A, a triterpenoid glycoside (Yamanaka et al., 2008). This difference in abundance was statistically significant by genotype, with threefold variation apparent among accessions (Fig. 5), which could be due to some combination of natural phenotypic variation, differences in degree of fruit ripeness, or the semiquantitative nature of our methods. According to a Tukey-Kramer test of means among the fully ripening capable varieties, only the comparisons between AC wild-type (wt) and AC tg rin red varieties were significant; all other comparisons were not significant. A related compound, α-tomatine, was also putatively detected (metabolite “1034”) and was found to have much greater variation in abundance among varieties (Fig. 5) (Yamanaka et al., 2008, 2009). A significant difference in mean abundance between AC wt and AC tg rin red varieties was also detected for α-tomatine; however, the greatest degree of difference observed was between the specialty varieties Hank and Black Plum, representing a 400-fold difference. Tomatoside A, the most abundant triterpenoid glycoside in tomato fruit, varied slightly among varieties while α-tomatine varied more strongly, partly in association with fruit ripeness (Fig. 5) (Kozukue et al., 2004; Yamanaka et al., 2008).
Weighted Correlation Network Analysis Visualizes the Tomato Fruit Metabolome
Principal component analysis and PLS-DA were very effective at characterizing broad trends and identifying specific differences within data sets. However, neither method provided detailed information on how variables were interconnected at multiple scales. Weighted correlation network analysis (i.e., weighted gene co-expression network analysis [WGCNA]) was created to find gene associations within transcriptomic data while integrating phenotypic trait information (Langfelder and Horvath, 2008). We have demonstrated the utility of WGCNA to analyze the tomato fruit metabolome, using a curated nuclear magnetic resonance (NMR) dataset developed from greenhouse grown plants (DiLeo et al., 2011). We have also used WGCNA to condense a larger MS-based profiling study in diverse maize lines, from >8000 metabolic markers to only 56 modules (Shen et al., 2013). To more fully investigate metabolic changes associated with the transgene, weighted correlation networks were created inclusively with all edible ripe fruit (excluding green, unripe genotypes) and individually for AC tg rin red and for its nontransgenic parental genotype. From the 992 metabolic features reliably detected by LC-MS, 443 could be assigned to one of seven arbitrarily named modules (Table 1). Membership in a module varied from 19 features in the purple module to 94 in the black module. Connectivity between the three most connected nodes was also examined (Table 2). While the purple module had the smallest number of nodes, its top three features had greater connectivity than those found in the larger orange module and similar connectivity to that seen in turquoise module, in spite of the fourfold difference in features. This analysis can be visualized both as a dendrogram and correlation heat map (Supplemental Fig. S1) and a node and edge graph (Fig. 6). A weighted correlation network created from all edible ripe fruit metabolic fingerprints revealed clustering of similar tentatively identified metabolites in distinct modules: flavonoids in the blue module, glycoalkaloids in the orange module, and amino acids in the purple module. Metabolite “1065,” the feature recognized by PLS-DA as a distinction between AC tg rin red and the AC parental variety, was located in the turquoise module of the shared network. Upon closer inspection of this module using networks constructed from all ripe tomatoes (Fig. 6), including all data for AC tg rin red and AC wild-type, metabolite “1065” was found to be well connected to a series of metabolites (Fig. 7A). An interesting implication of applying WGCNA to nontargeted metabolic fingerprint data is that ionization fragments, adducts, and associated isotopes of each given metabolite tend to appear as very tightly correlated clusters. For example, eight features were connected to metabolite “1065” with a correlation strength greater than 0.3 in either the AC tg rin red or AC wild-type network (Fig. 7B). Based on MS/MS and retention time data, several of these features were tentatively identified as ionization fragments of tomatoside A (m/z = 1065.37), dehydrotomatoside (m/z = 1063.35), and lycoperoside H (m/z = 918.31) (Yamanaka et al., 2008, 2009). While the observation of such correlated putative fragments provides intriguing hints as to the identity of unknown metabolites, we do not suggest that this should substitute for standard identification protocols. Finally, while the resolution of the networks created in this study did not permit us to directly inquire whether there were fundamental differences in network architecture between transgenic and nontransgenic plants (e.g., by consensus WGCNA), the identification of these additional metabolites, all of which accumulated to lower levels in the transgenic fruit, suggest that these transgenic plants do not exhibit a fundamentally different regulatory structure from their nontransgenic parent.
|Metabolomic marker (m/z, time)||Number of connections||Module||Connectivity||Scaled connectivity||Clustering coefficient||Maximum adjacency ratio|
Unintended Effects to Plant Growth and Development
During the course of the 2009 transgenic field evaluation, we observed that the AC tg rin red plants as a group were quick to wilt in the field and were easily uprooted at the end of season, suggesting an unintended impact on root growth or function. To test whether an unintended effect impacted plant growth or development, eight plants descended from four transgenic events (3′ UTR 2-2, 2-7, 10-1, and 10-7 and 5′ UTR 28-4, 28-6, 42-3, and 42-4) were grown in a sandy potting mix that facilitated measuring root and shoot biomass. While none of these transgenic events had a fruit ripening phenotype in the field, all produced an obvious delay in ripening in greenhouse conditions suggesting that posttranscriptional gene silencing was occurring at some level. After 28 d of growth in potting mix, no difference was observed in shoot biomass between wild-type AC and AC tg rin red plants (Supplemental Fig. S2). However, root biomass was reduced by 43% in the AC tg rin red plants relative to wild-type AC plants (p = 0.021, two-tailed t test). The partitioning of biomass between the shoots and roots was nearly doubled in the AC tg rin red plants compared to the parental line (p = 0.0029). Under both field and greenhouse conditions, AC tg rin red plants also exhibited poor flowering and fruit set (i.e., frequent flower abscission and lack of flowering). These effects were not quantified and the connections to root growth or other unintended effects are unclear.
Unintended effects occur at some frequency with every method of plant improvement (Committee on Identifying and Assessing Unintended Effects of Genetically Engineered Foods on Human Health, 2004). While biologically significant impacts on morphological and agronomic traits are often obvious, more sophisticated data collection and analysis tools may be required to identify impacts on compositional traits or in cases where the acceptance of unintended effects is very low (Hoekenga, 2008). However, there is a distinction between cataloguing every difference observed between any two varieties for biological interest versus creating a summary of relevant information for biotechnology risk assessment (Hoekenga et al., 2013). Here we describe a comprehensive metabolomic/bioinformatic approach that is able to both describe the overall characteristics of a novel transgenic plant relative to a diverse panel of conventional varieties and also to identify small but significant differences between them. We have previously described a similar approach using NMR profiling that demonstrates the utility of WGCNA to describe the connections among metabolic features and its ability to aid in the recognition of patterns within data (DiLeo et al., 2011). We have also used WGCNA to condense a large and highly informative ultra-high performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) dataset into a more manageable number of derived traits based on module eigenvalues, which were then used for genomewide association studies (Shen et al., 2013). We assert that WGCNA, paired with PCA and PLS-DA, makes for an efficient and powerful statistical pipeline to understand complex metabolomic datasets in a variety of contexts and may have specific relevance to biotechnology risk assessment.
Principal component analysis separated genotypes by genetic background and ripeness but not by exposure to or transformation by Agrobacterium. Fruits that contained allele states associated with obvious impacts on ripening (e.g., AC tg rin orange, rin/rin, and nor/nor) were well separated from their isogenic wild-type relatives along with transgenic and heterozygous genotypes that ripened apparently normally (e.g., AC tg rin red and Rin/rin). The only differences that were found between transgenic and nontransgenic plants by PCA were those associated with the intended effect of delayed fruit ripening (as shown by the position of the AC tg rin orange fruit in Fig. 2). Previous descriptions of tomato fruit metabolomes by PCA have deemphasized the role of genetic background in favor of more pronounced compositional differences between tissues or through developmental time (Moco et al., 2007). While these observations are certainly consistent with the data presented here, due to the widespread adoption of single or polygenic mutants in tomato breeding, wider phenotypic variation can be observed (Fig. 1). Within the fully ripening competent varieties, Hank was a consistent outlier according to PCA (Fig. 1). This was likely due to the importance of the glycoalkaloid tomatines assigned to PC2 for both years; tomatines are normally present in higher concentrations in unripe fruit and are broken down through the course of ripening. Hank had high levels of tomatines, similar to that seen in the nonripening mutants while the remaining fully ripe fruit surveyed were essentially lacking these compounds (Fig. 5).
Principal component analysis is, however, fundamentally limited to identifying the major sources of variation among all observed individuals and is not appropriate for finding small magnitude but potentially critical differences among different classes, such as transgenic and nontransgenic plants. Accordingly, PLS-DA was applied to identify whichever features best distinguished transgenic and nontransgenic fruit that shared an apparently identical, red-ripe phenotype and genetic background. Only a single metabolite tentatively identified as tomatoside A, “1065,” was detected at an unexpected level in the transgenic plants. However, the concentration of this compound in transgenic plants was still within the range observed within closely related conventional varieties and certainly within our diverse study panel. Thus, while the comparison between AC tg rin red and the nontransgenic parent was significant, no other differences in mean concentration were found. This suggests that the difference observed was likely of little practical importance.
Subsequently, WGCNA was used to identify additional metabolites that were co-expressed with tomatoside A in networks assembled from both transgenic and nontransgenic fruit metabolomes. Such networks reveal the natural co-regulation that exists among metabolites when plants are subjected to environmental variation and, when diverse genotypes are included, genetic variation. Such correlation networks are especially powerful as they can reveal detailed systems-level changes that may not be visible when looking at individual features. Perhaps most importantly, when individual features are discovered to differ significantly between genotypes, WGCNA allows their co-expressed neighbors to be identified to allow one to test whether biologically significant effects exist beyond the primary feature. Although there was not sufficient statistical power to make large-scale comparisons between the metabolic networks of AC tg rin red and its nontransgenic parent, it was apparent that additional metabolites, which were closely connected to tomatoside A in the turquoise module, were additionally depressed in transgenic fruit although to less of a degree than tomatoside A itself. The same metabolites were consistently associated with tomatoside A in both transgenic and nontransgenic networks, suggesting that while some biosynthetic or regulatory unit surrounding this metabolomic marker is depressed in the transgenic genotypes, there does not appear to be a fundamental topological rewiring of the network.
It is unclear why a lower average concentration of tomatoside A was observed in transgenic fruit relative to the parental variety. It is possibly caused by the intended ripening-inhibiting properties of the gene-silencing construct, an unintended effect of the gene-silencing construct, or due to a transgene insertion effect. The fact that tomatoside A is not significantly different between the AC varieties with delayed ripening (AC × AC rin and AC tg rin orange) with the AC tg rin red varieties argues against this effect being a consequence of knocking down Rin expression (Fig. 5). Additionally, as 20 independent insertion events all produce the same effect on tomatoside A levels, the effect is not likely due to a specific insertion point in the genome. Given the difference in silencing constructs between the AC tg rin orange (full length Rin cDNA in an antisense orientation) and AC tg rin red plants (pHellsgate12 derived double stranded RNAs [dsRNAs] for each UTR), it is certainly possible than the lower mean tomatoside A levels are a unintended consequence of the choice in silencing vectors. However, the point remains that tomatoside A variation was within the range observed among the study panel, suggesting that the effect was of minimal importance. It is furthermore unclear whether the choice of sequences to elicit gene silencing caused the apparent alteration of root development and flowering, which could operate through nontarget silencing of other MADS-box transcription factors that are known to play roles in these processes (Hileman et al., 2006). Although these observed differences were not completely elucidated in this study, they would be straightforward to explore further if this transgenic genotype was being considered for commercialization, but given the observed increased susceptibility to drought stress the present transgenic events are clearly not acceptable from an agronomic perspective.
Therefore, while it is certain that the methanol-extracted fraction of the tomato metabolome is a subset of the whole and our ability to draw additional connections within the data are limited due to the lack of fully annotated metabolomic markers, we assert that this approach offers a comprehensive and sensitive solution to the difficult challenge of identifying systematic differences between transgenic and nontransgenic genotypes and between both diverse and highly related conventional varieties. We hope that the statistical pipeline described here helps future studies to more fully analyze metabolomic datasets and allows substantial equivalence to be discussed in a more objective and quantitative manner. Finally, as the biological significance of identified effects is ultimately more important than their mere presence, we hope this perspective allows biotechnology risk assessment efforts to identify suites of differences that need to be assessed for relevance within already acceptable degrees of phenotypic variation approved by consumers, producers, and regulators.
Supplemental Information Available
Supplemental material is available at http://www.crops.org/publications/tpg.
Figure S1. Weighted correlation network analysis (WGCNA) of the 2009 field plots. All metabolomic data collected from the 2009 field season plots were analyzed using WGCNA. The heat map depicts the correlations among the 992 metabolomic markers detected by non-targeted liquid chromatography–mass spectrometry (LC-MS) metabolic profiling. Nearly 60% of the markers detected could be organized in modules.
Figure S2. Unintended effects on root and shoot development. To evaluate unintended effects on plant growth, a selection of Alisa Craig (AC) transgenic (tg) rin red and AC wild-type plants were grown for 28 d. Fresh weight (g) and shoot to root ratios were calculated for each class. Least squared means and standard deviations are reported for each trait. There was no difference between AC tg rin red and AC wild-type plants for shoot biomass, while root biomass was significantly decreased in the transgenic plants (2-tailed t-test p-values are reported for each comparison).
Table S1: Study panel for tomato fruit metabolomic profiling.
Table S2. Metabolomic markers detected from 2008 field grown tomatoes. A group of 1373 metabolomic markers were reliably detected among the whole fruit methanolic extracts obtained from the 2008 field. Raw ion intensities are reported for each marker (i.e. prior to pareto scaling), which are identified by mass to charge (m/z) ratio, retention time (m), and feature number. Samples are referred to by both the tomato variety and whether they were harvested from northern (n) or southern (s) field plot.
Table S3. Metabolomic markers detected from 2009 field grown tomatoes. A group of 992 metabolomic markers were reliably detected among the whole fruit methanolic extracts obtained from the 2009 field. Raw ion intensities are reported for each marker (i.e. prior to pareto scaling) identified by mass to charge (m/z) ratio, retention time (m), and feature number. Nearly 4000 metabolomic markers were detected from all data collected, but only 992 markers met quality control standards and are reported here. Alisa Craig (AC) transgenic (tg) rin red genotypes are described in Table S4.
Table S4. Field list for 2009 transgenic test plot. Twenty-two transgenic events and six conventional check varieties were evaluated in the 2009 field. Southern blotting was used to confirm transgene presence and determine allelic state. Eighteen events were evaluated with homozygous individuals, while sixteen events used heterozygous individuals. A majority of events used both homozygous and heterozygous individuals.