(Photo Credit: Houston Zoo)
Few owls have their genome sequenced, but Rice University’s great horned owl is one of them.
Although Rice students affectionately call their mascot Sammy, the species is actually named Bubo virginianus and its extremely broad geographic range (from Alaska to South America) is surpassed not by other owls but only by the global footprint of Rice students themselves.
“I was very eager to work with our amazing Rice undergraduates on the sequencing and assembly of the great horned owl from the moment I arrived at Rice, four years ago,” said Todd Treangen, an assistant professor in the Computer Science Department.
While working at the University of Maryland College Park, Treangen had the opportunity to get involved in a student project assembling the genome of their mascot, a terrapin. The students’ excitement struck him, and he found it both motivating and interesting to talk with the undergraduates about this project.
“As soon as I got to Rice, I was convinced assembling the great horned owl genome was a unique opportunity to bring a computationally challenging project to students at Rice. It was scientifically significant because the genome had not yet been assembled, and it felt like a fun project that would appeal to undergraduates.
“This project started several years ago when I called up Kevin Hodge, VP of Animal Programs at the Houston Zoo and asked for some great horned owl blood. It understandably took some time to get all the approvals lined up even though the Houston Zoo was very excited about the prospect of a collaborative genome assembly project with my research group the Treangen Lab, Rice’s Data to Knowledge (D2K) Lab, and Baylor College of Medicine (BCM). Kevin helped guide me through the process, and thanks to his efforts back in 2019 we now have a draft genome assembly.”
“We were very excited to collaborate with our neighbors at Rice University as they work to sequence the great horned owl genome, giving us better insight into the diversity of bird species,” said Hodge.
While the Houston Zoo was processing the research approvals and determining practical opportunities to collect a blood sample, Treangen was fund-raising and building consensus with other potential collaborators. The D2K Lab had access to undergraduate students who could tackle deep research as a spring capstone project. BCM owned a critically important piece of equipment, a PacBio Sequel II that could produce long-read sequencing.
“The reason we need the longest possible read has to do with repetitive content,” said Treangen. “Think of a very complex jigsaw puzzle, one that has 10,000 pieces, many of which look exactly the same. Now think of the same puzzle but with only 100 pieces, inherently making the puzzle much easier to correctly solve. This is analogous to the benefits of long-read sequencing for the assembly of the roughly 1.5 billion basepair Great Horned Owl genome.”
The puzzle explanation is currently a popular analogy to help non-scientific audiences better grasp the idea of the long strings of characters necessary to accurately depict genomes.
Fritz Sedlazeck is a genomics-trained BCM associate professor whose research covers structural variations, and an adjunct associate professor within the Rice Computer Science department, so Treangen approached him with the idea of students assembling the great horned owl genome. In BCM’s Human Genome Sequencing Center (HGSC), Sedlazeck would extract strings of DNA from the blood sample, perform quality control, and set up the DNA for the PacBio sequencing.
Sedlazeck said, “Over the recent years, I have been active in helping people to assemble a genome of their interest, and quality control the outcomes. As such, I also contributed to the latest Telomere-to-Telomere (T2T) efforts to fully resolve the human genome. The horned owl genome struck me as compelling because there are interesting microchromosomes to study. In bird genomes, larger chromosomes and many smaller micro chromosomes co-exist; mammal genomes do not have microchromosomes.
“Another factor of the project that appealed to me was indeed the prospect to take this material to students, to give them a better sense about computational biology and genomics and hopefully just have some fun with data exploration.”
Treangen knew Sedlazeck’s work in sequencing the Zoo sample’s DNA and performing quality control would complete only the second stage of the process. In the third stage, the raw data would have to be interpreted; at this stage, Treangen planned to get Rice students involved. He approached Genevera Allen, founder and faculty director of the D2K Lab, as well as Jennifer Sanders, D2K program administrator, with funding and a proposal for students to work on the data in a semester-long research project.
Allen approved the great horned owl genome project, and it was added to the slate of projects students could request during a matching process. The D2K capstone teams are composed of 4-6 students from a variety of backgrounds and years of study, including both graduate students and undergraduates. If a student team and the owl genome project were matched, the students would require a mentor well-versed in the challenges of genome sequencing. Treangen recruited Huw Ogilvie, an assistant research professor specializing in computational biology in the field of phylogenetics.
Ogilvie’s research spans evolutionary biology and cancer genomics, and brought his expertise to both the evolutionary and genomic aspects of the project. He and Treangen agreed to share responsibility for guiding the students in weekly meetings, assigning relevant readings, teaching them how to understand and use the tools required to assemble a genome, and continue mentoring them throughout the semester.
According to Ogilvie, “watching and mentoring this group of Rice students coming together from different fields to assemble this genome is to see research and teaching at its finest. The resulting great horned owl genome assembly is phenomenal in its quality, and will stand out among all currently sequenced vertebrates for its completeness. The combination of Todd’s persistence and deep background in genomics, along with the passion for learning and data science on the part of the students, saw this project fly across the finish line once we had the sequence data without cutting any corners.”
Treangen highlighted that “Partnering with Huw was such a key part of the vision I had for this project as he provides complementary and necessary expertise, crucial for both mentoring the students but also for elucidating biology within the newly assembled genome.”
Each stage of the process had committed collaborators, and Treangen’s vision started to unfold. The Zoo completed its due diligence and found an appropriate time to collect a sample (during a routine health exam). Treangen delivered the sample to Sedlazeck, and the BCM HGSC lab began the detailed process of extracting multiple, lengthy strands of DNA from the sample and preparing it for analysis by the PacBio sequencer.
“In the second week of January 2022, I got an email from BCM HGSC saying my data was ready. They shared a file containing billions of base pairs pertaining to the great horned owl’s genome. I felt like a kid in a candy store. Then, Huw and I presented our project idea to D2K students and were matched with a team. The timing was perfect,” said Treangen.
It was an exciting time to be working on sequencing. Two months after the D2K team began analyzing data to map a great horned owl genome, the 20-year-old human genome mapping project was completed. For two decades, about 8% of the human genome sequence had been filled with gaps. Distilling the repetitive content had proved too difficult for available technology, but scientists and engineers persevered; with the help of long-read sequencing, they announced a 'gapless’ human genome on March 31, 2022.
In addition to the now gapless human genome, about 3,300 animals —typically those that most closely resemble humans— have been sequenced. Around 600 plant genome assemblies are available in public repositories, and just over 360 species of birds have been sequenced.
Sedlazeck said, “I hope that the data produced here at the BCM and the work done together with Rice will enable new insights into the genomic complexity of this owl. Nevertheless, I think it is more important that this year’s students obtain insights into genomics and computational biology and maybe we sparked interest for these fascinating fields.
“Another thing that this project demonstrated is that the maturity of the sequencing technology (PacBio) and algorithms have come a long way; a similar project just five years ago would have been a massive endeavor for multiple graduate students if not postdocs.”
Treangen was equally impressed with the outcomes achieved by their diverse capstone trio: Anthony Kang (Computational and Applied Math ‘22), Bill Huang (MCS '23), and Hannah Yin, a Ph.D. student in the Ecology and Evolutionary Biology graduate program.
“The students did a really great job,” said Treangen. “It was exciting to watch them learn to use all the tools and see them put together the genome. There is no auto-correct or spell-checker when you are assembling a new genome. It takes a lot of care and time. The students had to dig in, embracing both the computational and data science side of this project, and pay close attention to detail via plots and visualization of the genome assembly.”
“After generating the draft assembly, the students spent significant time “testing” the assembly with existing validation tools that could reveal stretches that were likely to be mis-assembled. We knew we wouldn’t fully close every single gap in a single semester, but the work the students have done is remarkable. They created an excellent genome assembly all within the Spring 2022 semester, and we can launch additional projects from this foundation.”
Participating in conference calls when his lab work permitted, Sedlazeck also felt the student’s growing delight with their progress. He said, “Science is serious enough; it is good to have fun with it every now and then.”
“I think it is important to invest the time and resources in projects like these, so that students experience more hands-on learning about the current state of algorithms and the algorithmic limitations, to spark their interest and to have fresh minds thinking about an ‘old’ problem. Thus, teaching students and leading workshops is something that I truly enjoy and is — I think— an essential part of being a scientist. What good are our results if we cannot communicate or fascinate people with them?"
Treangen fully agreed with Sedlazeck’s assessment and said he always envisioned the great horned owl genome assembly as a student-driven project for outreach and engagement. The research can also contribute to the bird tree of life, in which scientists are attempting to sequence every bird on the planet to help better understand their diversity.
“It has been extremely rewarding to launch this student research project alongside Huw, and in close partnership with the Houston Zoo, BCM HGSC, and Rice D2K,” said Treangen, whose fundraising paid for the sequencing at BCM. “I’ve been passionate about bringing collaborative research opportunities to students, many who might not even realize they enjoy research until a project captures their interest. Give them an opportunity, surround them with a great team, help them experience real-world problems and let them solve those problems.”
Written by Carlyn Chatfield on Rice CS news.