Drausin Wulsin, Director of ML at Immunai, joins Daniel & Chris to talk about the role of AI in immunotherapy, and why it is proving to be the foremost approach in fighting cancer, autoimmune disease, and infectious diseases.
The large amount of high dimensional biological data that is available today, combined with advanced machine learning techniques, creates unique opportunities to push the boundaries of what is possible in biology.
To that end, Immunai has built the largest immune database called AMICA that contains tens of millions of cells. The company uses cutting-edge transfer learning techniques to transfer knowledge across different cell types, studies, and even species.
Matched from the episode's transcript š
Drausin Wulsin: So in many ways, the single-cell ā and again, weāre going deep on the single-cell world, because especially itās what I know the best, and itās what Immunai focuses onā¦ In the single-cell world, if you look at the trajectory of models and techniques letās say over the last five years, the field literally didnāt exist five years ago, or it barely existed five years ago. Five years ago people published papers on 200 cells, now weāre publishing on 2 million. So just the scale of the data in the last five years has enabled certain new flavors of models that just didnāt exist, or we couldnāt do before.
But early on in the first couple two or three years people trained auto-encoders, paralleling sort of the earlier work, both in vision and in text. So initially, itās just train your auto-encoder as basically each cell is an observation, and maybe you select 5000 genes that are the most variable or the most active for a cell, or maybe you do all 20,000, depending on how much data you haveā¦ And you run that through a bottleneck where youāre just trying to reconstruct the gene expression. Then you take that middle bottleneck layer and you do something with it; maybe you fine-tune it for a specific task. Often, people use it for data exploration in an unsupervised way, maybe visualization, or clustering, and things this.
So this is sort of where it started. Itās just now in the last year starting to happen where people are āHuh, now we have enough data, and thereās this fancy transformer thing that Iāve been hearing so much aboutā¦ Maybe we can start building some of those.ā And in that, the task can vary a lot, but probably one of the most ā sort of analogous to the language world is just masking, right? So instead of masking words, as we do often in the language, or parts of sentences, or the second sentence after the first sentence, youāre masking individual genes, and you say, āHey, model, Iāve masked this 15%, or 25%, or 50% of the genes, and Iām gonna give you the other genes, and I want you to reconstruct.ā So thatās probably the simplest formulation, but there are a lot of alternatives that you can do.
[36:17] And the cool thing now is that ā you know, two years ago if you wanted to build some big transformer or big foundation model, you sort of had BERT as your template. But now even just the transformer world has completely blown up, and so we have BERTs, and GPT-3, and the perceiver, and lots of options to sort of choose from and customize, which is just ā you know, at Immunai weāre not leading the edge on the brand new transformer ARM architecture, weāre benefiting from other people doing this, like OpenAI, and Facebook, and Google, and DeepMind, and we sort of get to be like āOkay, yeah, this is the one I think that most benefits our application.ā