Antibodies protect from pathogens and are important diagnostics and therapeutics. An antibody repertoire is the collection of distinct B cell receptors and secreted antibodies, each represented by a sequence of amino acids. The theoretical diversity of such sequences counts for 10140 entities. B cells generation and kinetics result in an ever-changing personalized antibody repertoire and a dynamic scenario of the immune status.
The recording of the sequence diversity of antibody repertoires has been recently made available from high-throughput sequencing technologies. Antibody repertoires can now be represented as large-scale networks where antibodies are sequence-nodes connected by similarity-edges, have an exponential structure. Thus, network analysis can capture sequence relations in this complex system, track the potential proliferation and predict the disappearance of certain sequence features.
We used high-throughput sequencing data combined with network analysis and machine learning to measure, track and predict the change in sequence space of the antibody repertoire. Naïve antibody repertoire networks showed an exponential structure; thus, each antibody was exponentially similar to others in the repertoire. Furthermore, antibody repertoires presented intrinsic redundancy across similarity layers: the number of antibodies with 2 a.a. or 3 a.a. differences to a certain antibody in the network could be predicted by the number of antibodies that were 1 a.a. different from it. We show how this network model can serve as the base to track entire personalized antibody repertoires in the theoretical antibody sequence space, thus predicting immune status scenarios.