biotransfer: A repository for designing sub-nanomolar antibodies using machine learning-driven approach

Please refer to our paper Machine Learning Optimization of Candidate Antibodies Yields Highly Diverse Sub-nanomolar Affinity Antibody Libraries for additional information on the method and design of scFv sequences.

The intial training data (Dataset 1) and designed antibody dataset (Dataset 2) can be found here. Additional information about the design of Dataset 1 and experimental set-up for quantitative binding measurements can be found in our Data Descriptor Paper.

Overview

Therapeutic antibodies are an important and rapidly growing drug modality. However, the design and discovery of early-stage antibody therapeutics remain a time and cost-intensive endeavor. Machine learning has demonstrated potential in accelerating drug discovery. We implement an Bayesian, language model-based method for desiging large and diverse libraries of target-specific high-affinity scFvs.

The software includes four major components:

Large-scale antibody and protein language model training

Functional property prediction leveraging pretrained language models to construct a probablistic antibody fitness function

ScFv optimization and design

Results analysis