In Silico Randomized Antibody Variant Generator
Scripts to both parse the positions of the complementarity determining regions (CDRs) from light and heavy chain sequences (string format) and generate variants through in-place N-mutations across the CDRs.
Overview
This code package provides the user the ability to efficiently parse out the complementarity-determining regions (CDRs) of an antibody sequence given heavy and light chains, and design in silico large sets of antibody variants for a given seed sequence by introducing random mutations in the CDRs. To help overcome the scarcity of labeled antibody data in the public domain required to help advance the area of antibody engineering, one can first use this generator code to quickly produce a large set of antibody variant designs. These variant designs can then be used in downstream high-throughput methods (e.g., AlphaSeq, [1]) to generate large-scale quantitative datasets, such as antibody-antigen binding interactions.
Various methods exist to number the residues and determine the CDR positions in an antibody sequence, and each method suffers some limitations due to the varying nature of the CDR lengths. The Martin scheme attempts to overcome shortcomings in several prior approaches [2], and this rule set is implemented in the generator code package to extract the CDRs from their approximate positions in each chain. A set of permutations and combinations will be used to perform in silico randomization to introduce user-defined k mutations in the CDRs.
[1] D. Younger, "High-throughput characterization of protein-protein interactions by reprogramming yeast mating," Proceedings of the National Academy of Sciences, pp. 114(46): 12166-12171, 2017.
[2] A. Martin, "How to identify the CDRs by looking at a sequence," UCL, [Online]. Available: http://www.bioinf.org.uk/abs/info.html#martinnum. [Accessed 13 October 2021].