Thesis Type:Undergraduate Senior Thesis
Bacteriophage genomes are known to have a "mosaic" structure, which means that a given genome may contain regions highly similar to some phages interspersed with regions highly similar to other phages mixed with unique regions. This paper describes an algorithm to identify individual mosaic segments and quantifies the extent of phage mosaicism in phages that infect a common host, the bacterium M. smegmatis. The results verify previous work referencing the high amounts of phage mosaicism – on average, about half of any given genome of the over 800 in PhagesDB is made up of short mosaic segments that are shared with at least one other phage. Because novel phages are still being discovered frequently, the amount of mosaicism may be even higher. This paper also examines trends in the distribution of mosaic segments based on cluster, date found, and geography, and explores methods to build tree-based phylogenies based on the number of horizontal transfer events among phages.