Quick Start: Select HLA Locus and enter mature protein
sequence (pre-aligned to reference sequence) for serology prediction. The current sequence in the text box is a recombinant allele, A*24:24, in place as an example.
This web tool is based on the random forest machine learning models built for serologic specificity prediction for the HLA Dictionary 2022 project. For that project, these models were used to predict serological specificity of alleles that were previously uncharacterized by serological methods. Here, we have made the models available for use on any novel HLA sequence.
For each HLA locus, there exists an independent random forest model for each serological specificity. When a sequence is entered into this tool, it is encoded and limited to positions that were determined to be "important" by the random forest mdoel (based on Gini impurity). The random forest then outputs a probability that the sequence matches the serological label in question. This is performed for each serological specificity of a locus.
As part of the HLA Dictionary 2022, we considered a 0.42 threshold as a positive call based on accuracy assessments of various thresholds. To allow users a more complete picture, we have included the prediction probability for each serological label as output from this tool.
Manuscript in preparation
Prototype Tool for Research Use Only