Zoogle Suggested We Try a Sea Squirt, So We Did
“All models are wrong, but some are useful” is a common saying in statistics. We think about this a lot when it comes to picking research organisms to model human disease biology. It’s true that no model is perfect, but how can we find ones that are less wrong?
Today, more than ever, we can use data to make model organism selection more systematic. Our Zoogle application resulted from comparing molecular properties across thousands of genes in 60 tractable species, quantitatively assessing similarities to human biology. Zoogle was our first stab at enabling more experimentation around this with the scientific community.
To that end, we’re excited to announce two research partnerships investigating predicted matches between diseases and research organisms that our partners can uniquely tackle. We landed on two species that aren’t intuitively at the top of most human models list: a tunicate (aka sea squirt!) and a single-celled choanoflagellate (found in pond scum!). We’re funding the groups of Alberto Stolfi (sea squirts) at Georgia Tech and David Booth (choanos) at UCSF to see how our predictions play out in the lab.
Stay tuned for updates from our partners soon. In the meantime, we’re sharing some of our high-level takeaways from our process with them. You can also read more about our underlying technical approach and experimental plans for tunicates and choanoflagellates.
Figuring out how to apply Zoogle
When we first released Zoogle, we hoped it would be useful to researchers: you’d plug in your favorite organism or gene and get a ranked list of potential starting points. But when we tested it ourselves, we found the optionality overwhelming. Unsurprisingly, moving from predictions to actual experimental directions needed a lot more analysis, especially around the biology of different organisms.
To enable more in-depth creative exploration, we reduced our options by filtering based on organisms, not genes. The next biggest filter was finding the right scientists to do this with. We wanted experts who were both game to try and quickly share results in accordance with our open publishing model. Partnering with organism experts would also be a good test of Zoogle’s usefulness, identifying steps that might help other scientists apply it in their own work.
Lessons for moving from predictions to projects
In the process of exploring and deciding on concrete directions, we learned a lot. Here are the four major lessons that we think need consideration, especially as automation and AI become more central to biology:
Organismal advantages must be matched with translational gaps
Each organism has its own unique set of practical superpowers in the lab, which means they each have a special edge for certain roadblocks over others. You have to ask: does this model give a clear tactical advantage over human cell lines or organoids (the instinctive first choice for those focused on human disease)? It’s important to consider that in vitro models, despite being human-derived, have drawbacks, such as missing tissue-level context, irrelevant artefacts due to immortalization or growth conditions, or sheer cost.
This is where organismal models come into play. Choanoflagellates enable efficient high-throughput screens. Tunicates offer complex in vivo tissue models. It wasn’t enough to look for similarities — we had to find where these experimental edges solved the key bottlenecks in studying particular diseases. In some cases, the models complemented human or mouse tools to create a more holistic portfolio of tractable options.
This helped us quickly narrow our search. Instead of sifting through 27k or 50K gene matches for choanos and sea squirts, respectively, we focused on where there was strategic alignment.Implicit knowledge is key, but hard to find
A related note on the above. The distinctive strengths and weaknesses of organisms aren’t always explicitly documented or compared. So the first issue we encountered was that our scientific corpus was missing information. Communities naturally tend to focus on showcasing what their organisms do well, not where they struggle or inherently fall short. Both aspects are necessary to identify where an organism has a true edge.
The second issue is that there was often a gap between explicitly documented information and what experts in the field know to be true. So there was incomplete or inaccurate information to sift through. There’s a lot of valuable “folklore” or tacit knowledge around what actually works (or what definitely doesn’t!), such as growth conditions, nuances around transformations, or even the specific vendor you might purchase a key media ingredient from. It was impossible to bridge this gap without talking to organismal experts.
Involving scientists more isn't a bad thing; in fact, it’s a great thing. Those conversations were invaluable and fun. However, poor access to such information can be confusing and demotivating to the point of preventing ideation or collaboration. Some aspects are also unnecessarily time-intensive and tough to scale. This is to the great detriment of organismal communities and all the resources they’ve built up, and we’d love to think more about how to bring this information out.Connecting basic science with disease is easier than ever before
Effectively evaluating gene–disease matches means taking into account more information from clinical and disease databases. But tools like ClinVar and OMIM are often totally foreign to organism-focused researchers. For many of the scientists we worked with, these resources were effectively black boxes, although they don’t have to be. We shared our approach to make this public data more accessible.
It’s exciting to note that more tools are also emerging to automate data extraction from these databases, making it easier for scientists to connect their work to human health. We encourage researchers to explore these resources to surface new questions and guide more impactful experiments from a translational perspective.Human judgment and creativity are essential for leveraging predictions
Parts of our process, like pulling in disease association information, are well-suited for automation. But a lot of the critical work is still human. Figuring out what’s truly worth testing takes creative synthesis: weighing partial data, gut-checking feasibility, spotting unexpected connections, and shaping a hypothesis that’s actually interesting (and possible) to pursue. AI can help, but it still takes a scientist’s judgment to ask the right questions and follow the most promising leads.
Not to mention, we had so much fun connecting with more of the scientific community through this process. It gave us a chance to learn more about organisms, research efforts, and scientists at all stages. The joy of biological discovery and the beauty of the natural world is what keeps us motivated in our work.
To celebrate this, we transformed some of our awe and learnings into short videos and open-source illustrations that we hope can be useful to organismal communities. We hope these visuals help fellow scientists tell the story of their work. Check out our first video highlighting the tunicate Ciona intestinalis below and stay tuned for a public repository where we’ll make our illustrations available to the community. If you’re eager to use these, make sure you’re subscribed to our Substack to get the latest updates (use the orange button at the end of this post!).
Growing Zoogle
Our work on research organism selection is just getting started. We’ve learned a ton from our first internal validations and external partnerships. But making this maximally useful and impactful, which has always been our long-term goal, will require participation from the whole community.
You can run the pipeline yourself and share your findings in an open pub or a comment on ours, but be sure to let us know about it. Maybe we’ll even feature your organism next in our spotlight series! If you want to join our team and help us expand on this effort through further testing and automation, check out our open roles.
