# Chapter 1 Introduction

Dobzhansky’s famous quote “Nothing in biology makes sense except in the light of evolution” has become a cliché, but it is hard to find a statement that more elegantly captures the essence of biological study (1973). In a more reductionist sense, it is perhaps appropriate to state that nothing in evolution makes sense except in the light of mutation; that is to say that mutation in an organism’s genetic code is the seed upon which evolution acts (Lynch et al., 2016). Of course, a seed left alone will not grow; it needs the right environment and conditions. In turn, a mutation needs a population in order to thrive. A population can be defined as a group of interbreeding individuals, coexisting temporally and spatially (though we will see later that this is not always the best definition) (Milgroom & Fry, 1997). In a population, if the conditions are right, a new mutation can grow in frequency and become an established allele. The study of population genetics seeks to understand how evolution acts upon these alleles to shape populations as they grow, shrink, split, and adapt to their environment. Ultimately, these populations are what give rise to species.

## 1.1 Population Genetics and Clonality

The classical definition of a population is reliant upon the assumption that the individuals within the populations are panmictic, that is that they are randomly mating so that the individuals in the next generation will have a random assortment of alleles inherited from their parents (Hartl & Clark, 2007; Nielsen & Slatkin, 2013; Rice, 2002). The assumption that sex occurs applies to most major organismal groups (Heitman et al., 2012; Rice, 2002). However, sex does not occur in all organisms as many plants, fungi, protists, bacteria, animals, and viruses are known to exclusively reproduce clonally (Arnaud-Hanod et al., 2007; Arnaud-Haond et al., 2012; Heitman et al., 2012; Orive, 1993; Rice, 2002). Clonal reproduction here means reproduction that results in offspring that are genetically identical to the parents (note that this can include organisms derived from the selfing of a haploid organism). Thus, the definition of a population is perhaps best rephrased to simply indicate a group of individuals of the same species1 coexisting spatially and temporally (Milgroom & Fry, 1997).

There are several costs and benefits associated with both clonal and sexual reproductive strategies from an evolutionary perspective. For organisms whose main mode of reproduction is sexual, there is the two-fold cost of outcrossing: 1) only half of the available genetic material is inherited by the offspring and 2) the loss of advantageous gene combinations due to recombination (Heitman et al., 2012; Nieuwenhuis & James, 2016; Rice, 2002). These costs are balanced by the benefits that come with recombining genetic material, which allows these organisms to purge deleterious mutations. As clonally reproducing organisms do not undergo regular periods of recombination, they explicitly do not benefit from the previously mentioned advantage of recombination (Heitman et al., 2012; Nieuwenhuis & James, 2016). This leads to a theory known as Muller’s ratchet, which indicates that the overall fitness of small populations of clonally reproducing organisms will decrease over time as the number of deleterious mutations accumulate, eventually leading to extinction (Felsenstein, 1974; Loewe, 2006; Lynch & Gabriel, 1990; Lynch et al., 1993). Despite this, clonal organisms still have the advantage of passing on 100% of their genetic material to the next generation.

It’s easy to see the advantage of both of these strategies in the context of genetic adaptation to the environment. Clonal populations that are well adapted to their environment get to keep the combination of genes that allowed them to be so well adapted (Heitman et al., 2012). This strategy provides an advantage when resources are scarce and competition is high, but does not lend itself well to adaptation against changing environments due to low genotypic diversity (Milgroom, 1996). Conversely, while sexual populations lose the competitive advantage of conserved beneficial allelic combinations, they make up for it with high genotypic diversity, increasing the chances of surviving change in the environment. As we shall see in the following sections, many microbial pathogens have the ability to reproduce both sexually and asexually (Anderson & Kohn, 1995; Milgroom, 1996; Tibayrenc, 1995).

The question of sexual reproduction and population structure in microbial pathogens is important for the development of rational management strategies for pathogens or beneficial microbes (Milgroom & Fry, 1997; Taylor et al., 1999; Tibayrenc, 1995; Tibayrenc et al., 1990). If one were developing an antimicrobial agent on a known set of strains, the effectiveness of this agent will be diminished if these strains represent only a fraction of the genotypic diversity due to sexual recombination (Smith et al., 1993; Taylor et al., 1999; Tibayrenc et al., 1990). Since most inference of population structure is based on neutral models that assume sexual reproduction, it is important to identify whether or not the population in question is reproducing sexually or clonally before one attempts to assess population structure and infer evolutionary processes (Tibayrenc et al., 1990).

Neutral molecular markers are commonly used for population genetic analyses. Before molecular techniques became widely available for characterization of pathogens or microbes, the variation within most pathogenic microorganisms was described using phenotypic traits that reflected their pathogenicity (ability to cause disease), virulence (severity of disease), or morphotype placing them into different phenotypes (Levin, 1999; Milgroom & Fry, 1997). When considering markers for population genetic analysis, one of the most important criteria is that these markers are heritable.

A wide variety of heritable molecular markers exist for population genetic analysis (Halkett et al., 2005; Tibayrenc, 1995). It is not necessary to expound here on the variety of molecular markers available to us, but it is useful to take note of the common features. Generally, genetic markers come in two major flavors, selectively neutral and selected markers (McDonald, 1997; Milgroom & Fry, 1997; Tibayrenc, 1995). Selectively neutral markers are used as independent variables that can reflect the evolutionary history of descent. For selectively neutral markers, it is important that they be independent and unlinked within the genome (McDonald, 1997).

Because neutral genetic markers vary independently within a population, these are more appropriate than selected markers to detect sexual reproduction. The theory behind this is simple; sexual reproduction breaks up associations between markers due to recombination and independent, random assortment. This contrasts with clonal reproduction, which transfers the entire genome in tact to the next generation, preserving associations (Heitman et al., 2012; McDonald, 1997; Milgroom & Fry, 1997; Nieuwenhuis & James, 2016; Orive, 1993; Tibayrenc, 1995). This means that not only will there be significant associations between alleles in a clonal population, but because of their inability to purge mutations, we also expect to see excess heterozygosity (Tibayrenc, 1996; Tibayrenc et al., 1990).

By using these basic principles, core sets of statistical analyses have been proposed to detect sexual reproduction by analyzing diversity within loci (for diploid organisms), linkage among loci, and diversity of multi-locus genotypes within the population (Arnaud-Hanod et al., 2007; Grünwald et al., 2003; McDonald, 1997; Parks & Werth, 1993; Smith et al., 1993; Tibayrenc, 1996). None of these tests alone can give a definitive answer as to the nature of reproduction in a given population (de Meeûs & Balloux, 2004; Nieuwenhuis & James, 2016; Tibayrenc, 1995). Researchers still need to use biological evidence, such as the observation of sexual reproductive structures, to support the presence of sex.

## 1.2 Population Genetics Under the Lens of Plant Pathology

One of the over-arching themes in both ecology and evolution is that diversity is important for stable population and community structure (Magurran, 1988; McDonald & Linde, 2002a). Agricultural systems, however, behave very differently to natural ecosystems (Milgroom & Fry, 1997; Stukenbrock & McDonald, 2008). Within agricultural systems, the plants and the landscape are homogenized, providing a steady source of food for people, but this homogeneity combined with intensive management practices also provides fertile ground for the evolution of plant pathogens (Stukenbrock & McDonald, 2008). McDonald identified five major factors influencing the evolution of plant pathogens in agricultural systems: 1) mutation, 2) genetic drift, 3) gene/genotype flow, 4) mating system, and 5) selection due to host resistance (2002a).

An example of how changing one of these five factors influences plant health is clear in the oomycete Phytophthora infestans (Mont.) de Bary. The causal agent of potato late blight (and a poster child for plant pathology). P. infestans has changed the very demographics of human populations across the Atlantic ocean due to its role in the Irish potato famine (Erwin et al., 1996). It has a heterothallic (no selfing) life cycle that includes both clonally and sexually derived infectious propagules (Fig.1.1). The clonally derived propagules are swimming, short-lived zoospores that can be transmitted via wind or water, whereas the sexually derived propagules are double-walled oospores. While P. infestans can survive as hyphae in infected tubers, the sexually derived oospores can survive for years in the soil, highlighting the added risk presented by sexual reproduction.

From the time of the Great Famine to 1980, there was only one single mating type, A1, observed in Europe, indicating that it was reproducing in a strictly clonal fashion (Goodwin et al., 1994). Sexual reproduction became possible when the second mating type, A2, was introduced to Europe (Galindo & Gallegly, 1960). Because of this risk, population genetic tools and methods were needed to assess whether or not sexual reproduction had occurred. While evidence for sexual reproduction was found (Sujkowski, 1994), it additionally appeared that P. infestans was still reproducing and spreading clonally (Goodwin et al., 1994). Cooke et al. (2012) analyzed isolates from 1983 to 2008 and found a rapid shift in clonal population structure, showing a dramatic increase in the prevalence of a particularly aggressive lineage. While sexual reproduction was not found to be prevalent, it was only via population genetic techniques that this question could be addressed.

## 1.3 The Need For Reproducible Research

Computational methods are increasingly important for answering questions crucial to plant pathology, and as more and more of our analyses are becoming computational in nature, we need to ensure that these methods are reproducible. Computational methods are crucial for population genetic analyses in the 21st century (Excoffier & Heckel, 2006). There has been a call for standards in the analysis of clonal organisms for the past twenty years, which resulted in a plethora of computer programs (Arnaud-Hanod et al., 2007; Tibayrenc, 1996). It was not uncommon for a researcher to use several programs to answer a single research question, and each program would often require its own custom input format and would not necessarily run on all operating systems (Excoffier & Heckel, 2006; Kamvar et al., 2014b). The constant reformatting of data, and the inability to automate series of analyses made it not only challenging to communicate the methods, but also increased the chance for error (Adamack & Gruber, 2014; Excoffier & Heckel, 2006; Goecks et al., 2010; Kamvar et al., 2014b).

Biologists often think about reproducibility in terms of wet lab or bench work. Care is taken to ensure that all methods and data are faithfully recorded in a lab notebook, but this is not necessarily extended to computational analyses. The concept of reproducible research in scientific computing is attempting to take the principles of reproducibility we use in the wet lab and extend them to our downstream analyses in silico. This work attempts to introduce tools for reproducible research in clonal population genetics in an open manner. When researchers have open tools that can be easily shared, they can spend more time asking questions and less time formatting data.

## 1.4 Overview

The work presented here offers tools designed to answer questions related to clonal population genetics and plant pathology (Fig. 1.2). In this context, the term ‘tool’ refers to software code used to apply mathematical models or theory to population genetic data. The merit of this work lies within the context of reproducible science, which ensures that the computational environment used to create results can be replicated (Buckheit & Donoho, 1995). The end goal was not simply the development of the software, but rather, the development of software for the goal of analyzing our own data. Serendipitously, by analyzing our own data with this software, we are able to not only demonstrate that it can be used, but also demonstrate that an entire analysis can be conducted in an open and reproducible manner. We demonstrate the usefulness and flexibility of these tools by using them to show evidence for multiple introductions in the Curry County, OR outbreak of the sudden oak death pathogen, Phytophthora ramorum. We additionally assess the power of the multilocus linkage disequilibrium measure $$\bar{r}_d$$ to detect clonal reproduction.

### 1.4.1 Summary of Chapter 2

To address the lack of tools for reproducible research in clonal population genetics, we present the software package poppr in the R computing language (Kamvar et al., 2014b; R Core Team, 2016). Previously, tools necessary for the analysis of clonal populations were available in several stand-alone software programs, each requiring different data input formats. Moreover, each program had different levels of documentation and limited support for all computing platforms. The novelty of poppr was to introduce indices of multilocus genotypic diversity, the index of association, and a fast implementation of Bruvo’s genetic distance, and clone-correction over unlimited levels of user-specified population hierarchies (Agapow & Burt, 2001; Arnaud-Hanod et al., 2007; Bruvo et al., 2004). Because this was implemented in R these analyses can be performed in a reproducible manner on all computing platforms.

### 1.4.2 Summary of Chapter 3

The initial implementation of poppr contained basic tools for analysis of clonal populations (Kamvar et al., 2014b), but lacked tools for custom definitions of multilocus genotypes and performed poorly with genomic-scale data. Chapter 3 introduces an updated and improved poppr version 2.0. With high throughput sequencing (HTS) data, the amount of missing data and genotyping error increases, and the definition of a multilocus genotype becomes unclear. Moreover, the calculation of the index of association scales poorly with an increase in the number of loci. To address these limitations, we improved poppr with new functionalities to define multilocus genotypes based on genetic distance and calculate the index of association over random samples or windows of SNP loci.

### 1.4.3 Summary of Chapter 4

A newly-emerged disease of oak—called Sudden Oak Death—spread from California to the Southwest corner of Oregon in 2001. Because of intense management strategies, the epidemic was largely contained to Curry county for the next 15 years. In 2011, an isolated patch of disease appeared in Cape Sebastian, 12 miles from the nearest infected site. With microsatellite genotyping performed across 2 labs and 15 years, we sought to describe the spread of the epidemic in a population genetic context and ask the question of whether or not there was evidence for more than one introduction event. This work provided evidence supporting at least two introductions to Curry county forests. All the analyses were performed in an open-source and reproducible manner using R.

### 1.4.4 Summary of Chapter 5

The index of association is a measure of multilocus linkage disequilibrium, that is, a correlation coefficient across multiple loci. In sexual populations, loci are randomly assorting due to recombination, resulting in a near-zero value of the index of association. In clonal populations, recombination is non-existent, meaning that loci are passed from parent to offspring in a non-independent fashion, resulting in a significantly non-zero value of the index of association. De Meeûs & Balloux (2004) demonstrated that this index shows high variance with low levels of sexual reproduction, but due to limitations in software, they were not able to perform power analyses. We used poppr to investigate the power of the index of association to detect sexual reproduction in simulated data sets generated with microsatellite and genomic markers. This chapter provides novel insights on the power, sensitivity, and scope of the index of association.

### References

Dobzhansky, T. (1973). Nothing in biology makes sense except in the light of evolution. The American Biology Teacher, 75(2), 87–91.

Lynch, M., Ackerman, M. S., Gout, J.-F., Long, H., Sung, W., Thomas, W. K., & Foster, P. L. (2016). Genetic drift, selection and the evolution of the mutation rate. Nature Reviews Genetics, 17(11), 704–714. https://doi.org/10.1038/nrg.2016.104

Milgroom, M., & Fry, W. (1997). Contributions of population genetics to plant disease epidemiology and management. In Advances in botanical research (pp. 1–30). Elsevier BV. https://doi.org/10.1016/s0065-2296(08)60069-5

Hartl, D. L., & Clark, A. G. (2007). Principles of population genetics. Sunderland, MA, USA: Sinauer Associates.

Nielsen, R., & Slatkin, M. (2013). An introduction to population genetics: Theory and applications. Sinauer Associates, Incorporated. Retrieved from http://books.google.com/books?id=Iy08kgEACAAJ

Rice, W. R. (2002). Evolution of sex: Experimental tests of the adaptive significance of sexual recombination. Nature Reviews Genetics, 3(4), 241–251. https://doi.org/10.1038/nrg760

Heitman, J., Sun, S., & James, T. Y. (2012). Evolution of fungal sexual reproduction. Mycologia, 105(1), 1–27. https://doi.org/10.3852/12-253

Arnaud-Hanod, S., Duarte, C. M., Alberto, F., & Serrão, E. A. (2007). Standardizing methods to address clonality in population studies. Molecular Ecology, 16(24), 5115–5139. https://doi.org/10.1111/j.1365-294X.2007.03535.x

Arnaud-Haond, S., Duarte, C. M., Diaz-Almela, E., Marbà, N., Sintes, T., & Serrão, E. A. (2012). Implications of Extreme Life Span in Clonal Organisms: Millenary Clones in Meadows of the Threatened Seagrass Posidonia oceanica. PLoS ONE, 7(2), e30454. https://doi.org/10.1371/journal.pone.0030454

Orive, M. E. (1993). Effective population size in organisms with complex life-histories. Theoretical Population Biology, 44(3), 316–340. https://doi.org/10.1006/tpbi.1993.1031

Nieuwenhuis, B. P. S., & James, T. Y. (2016). The frequency of sex in fungi. Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1706), 20150540. https://doi.org/10.1098/rstb.2015.0540

Loewe, L. (2006). Quantifying the genomic decay paradox due to Muller’s ratchet in human mitochondrial DNA. Genetical Research, 87(02), 133. https://doi.org/10.1017/s0016672306008123

Lynch, M., & Gabriel, W. (1990). Mutation load and the survival of small populations. Evolution, 1725–1737.

Lynch, M., Bürger, R., Butcher, D., & Gabriel, W. (1993). The mutational meltdown in asexual populations. Journal of Heredity, 84(5), 339–344. Retrieved from http://jhered.oxfordjournals.org/content/84/5/339.abstract

Milgroom, M. G. (1996). Recombination and the multilocus structure of fungal populations. Annual Review of Phytopathology, 34(1), 457–477.

Anderson, J. B., & Kohn, L. M. (1995). Clonality in soilborne, plant-pathogenic fungi. Annual Review of Phytopathology, 33(1), 369–391. https://doi.org/10.1146/annurev.py.33.090195.002101

Tibayrenc, M. (1995). Population genetics of parasitic protozoa and other microorganisms. In Advances in parasitology volume 36 (pp. 47–115). Elsevier BV. https://doi.org/10.1016/s0065-308x(08)60490-x

Taylor, J. W., Geiser, D. M., Burt, A., & Koufopanou, V. (1999). The evolutionary biology and population genetics underlying fungal strain typing. Clinical Microbiology Reviews, 12(1), 126–146.

Tibayrenc, M., Kjellberg, F., & Ayala, F. J. (1990). A clonal theory of parasitic protozoa: the population structures of Entamoeba, Giardia, Leishmania, Naegleria, Plasmodium, Trichomonas, and Trypanosoma and their medical and taxonomical consequences. Proceedings of the National Academy of Sciences, 87(7), 2414–2418.

Smith, J. M., Smith, N. H., O’Rourke, M., & Spratt, B. G. (1993). How clonal are bacteria? Proceedings of the National Academy of Sciences, 90(10), 4384–4388. https://doi.org/10.1073/pnas.90.10.4384

Levin, B. R. (1999). Population biology, evolution, and infectious disease: Convergence and synthesis. Science, 283(5403), 806–809. https://doi.org/10.1126/science.283.5403.806

Halkett, F., Simon, J., & Balloux, F. (2005). Tackling the population genetics of clonal and partially clonal organisms. Trends in Ecology & Evolution, 20(4), 194–201. https://doi.org/10.1016/j.tree.2005.01.001

McDonald, B. A. (1997). The population genetics of fungi: tools and techniques. Phytopathology, 87(4), 448–453.

Tibayrenc, M. (1996). Towards a unified evolutionary genetics of microorganisms. Annual Review of Microbiology, 50(1), 401–429. https://doi.org/10.1146/annurev.micro.50.1.401

Grünwald, N. J., Goodwin, S. B., Milgroom, M. G., & Fry, W. E. (2003). Analysis of genotypic diversity data for populations of microorganisms. Phytopathology, 93(6), 738–746. https://doi.org/10.1094/phyto.2003.93.6.738

Parks, J. C., & Werth, C. R. (1993). A Study of Spatial Features of Clones in a Population of Bracken Fern, Pteridium aquilinum (Dennstaedtiaceae). American Journal of Botany, 80(5), 537. https://doi.org/10.2307/2445369

de Meeûs, T., & Balloux, F. (2004). Clonal reproduction and linkage disequilibrium in diploids: A simulation study. Infection, Genetics and Evolution, 4(4), 345–351. https://doi.org/10.1016/j.meegid.2004.05.002

Magurran, A. E. (1988). Why diversity? In Ecological diversity and its measurement (pp. 1–5). Dordrecht: Springer Netherlands. https://doi.org/10.1007/978-94-015-7358-0_1

McDonald, B. A., & Linde, C. (2002a). Pathogen population genetics, evolutionary potential, and durable resistance. Annual Review of Phytopathology, 40(1), 349–379. https://doi.org/10.1146/annurev.phyto.40.120501.101443

Stukenbrock, E. H., & McDonald, B. A. (2008). The origins of plant pathogens in agro-ecosystems. Annual Review of Phytopathology, 46(1), 75–100. https://doi.org/10.1146/annurev.phyto.010708.154114

Erwin, D. C., Ribeiro, O. K., & others. (1996). Phytophthora diseases worldwide. St. Paul, Minnesota, USA: American Phytopathological Society (APS Press).

Piepenbring, M. (2015). Biologische Schemata, gezeichnet und freigegeben von M. Piepenbring. Retrieved from https://goo.gl/XO7TmS

Goodwin, S. B., Cohen, B. A., & Fry, W. E. (1994). Panglobal distribution of a single clonal lineage of the Irish potato famine fungus. Proceedings of the National Academy of Sciences, 91(24), 11591–11595.

Galindo, A., & Gallegly, M. (1960). The nature of sexuality in Phytophthora infestans. Phytopathology, 50, 123–28.

Sujkowski, L. S. (1994). Increased genotypic diversity via migration and possible occurrence of sexual reproduction of Phytophthora infestans in Poland. Phytopathology, 84(2), 201. https://doi.org/10.1094/phyto-84-201

Cooke, D. E. L., Cano, L. M., Raffaele, S., Bain, R. A., Cooke, L. R., Etherington, G. J., Deahl, K. L., Farrer, R. A., Gilroy, E. M., Goss, E. M., Grünwald, N. J., Hein, I., MacLean, D., McNicol, J. W., Randall, E., Oliva, R. F., Pel, M. A., Shaw, D. S., Squires, J. N., Taylor, M. C., Vleeshouwers, V. G. A. A., Birch, P. R. J., Lees, A. K., & Kamoun, S. (2012). Genome analyses of an aggressive and invasive lineage of the irish potato famine pathogen. PLoS Pathogens, 8(10), e1002940. https://doi.org/10.1371/journal.ppat.1002940

Excoffier, L., & Heckel, G. (2006). Computer programs for population genetics data analysis: A survival guide. Nature Reviews Genetics, 7(10), 745–758. https://doi.org/10.1038/nrg1904

Kamvar, Z. N., Tabima, J. F., & Grünwald, N. J. (2014b). Poppr : an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ, 2, e281. https://doi.org/10.7717/peerj.281

Adamack, A. T., & Gruber, B. (2014). PopGenReport: Simplifying basic population genetic analyses in R. Methods in Ecology and Evolution, 5(4), 384–387. https://doi.org/10.1111/2041-210x.12158

Goecks, J., Nekrutenko, A., Taylor, J., & Team, T. G. (2010). Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology, 11(8), R86. https://doi.org/10.1186/gb-2010-11-8-r86

Buckheit, J. B., & Donoho, D. L. (1995). WaveLab and reproducible research. In Wavelets and statistics (pp. 55–81). Springer. https://doi.org/10.1007/978-1-4612-2544-7_5

R Core Team. (2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/

Agapow, P.-M., & Burt, A. (2001). Indices of multilocus linkage disequilibrium. Molecular Ecology Notes, 1(1-2), 101–102. https://doi.org/10.1046/j.1471-8278.2000.00014.x

Bruvo, R., Michiels, N. K., D’Souza, T. G., & Schulenburg, H. (2004). A simple method for the calculation of microsatellite genotype distances irrespective of ploidy level. Molecular Ecology, 13(7), 2101–2106.

1. This depends on your definition of a species, which is outside of the scope of this dissertation and therefore best debated elsewhere.