The immune system is a defense system that is present in vertebrates to protect them from invading pathogens and cancer. The protection against foreign molecules (antigens) can be both specific and non specific. Non-specific immunity includes, for example, skin (as a physical barrier), mucous membranes and inflammation. Specific immunity allows the organism to specifically recognize and selectively eliminate antigens. There are four types of molecules present in the immune system responsible for the specific recognition: antibodies produced by B-Cells, T-Cell receptors (TCRs), Major Histocompatibility Complex (MHC) class I molecules on all nucleated cells and MHC class II molecules on antigen presenting cells [38]. MHC is group of genes on a single chromosome that codes the MHC antigens and is a dense region of immune genes with high levels of polymorphism [34]. In the second line of defense in all the organisms, each cell continuously breaks a few of their old, obsolete proteins and displays the pieces on their surfaces. The small peptides are held in MHC, which grips the peptides and allow the immune system to examine them. In this way, the immune system can monitor what is going on inside the cell [42]. The physiologic function of MHC molecules is the presentation of peptide antigen to T lymphocytes. These antigens and their genes can be divided into three major classes: class I, class II and class III. Class I antigens are expressed on all nucleated cells (except those of the central nervous system) and platelets. The class II antigens are expressed on antigen presenting cells such as B lymphocytes, dendritic cells, macrophages, monocytes, Langerhans cells, endothelial cells and thymic epithelial cells. Cytokines, especially interferon gamma (IFN-?), increase the level of expression of class I and class II MHC molecules. Class III antigens are associated with proteins in serum and other body fluids (e.g.C4, C2, factor B, TNF) and have no role in graft rejection. Among all, the Class II MHC molecules on Antibody Presenting Cells present the peptide fragments to helper T cells, which stimulate an immune reaction from other cells [42]. So, the identification of MHC class II restricted peptide epitopes is an important goal in immunological research. Different computational methods have been developed; however, each has its own strength and weakness. In order to provide reliable prediction, it is important to design a system that enables the integration of outcomes from various predictors. So, our aim is to develop a novel algorithm and a prediction server utilizing several machine learning algorithms to predict MHC binding site for an antigenic protein and evaluate their performance.

Hypothesis and Origin of the proposal

Major histocompatibility complex (MHC) proteins, also known as human leukocyte antigens (HLA), are glycoproteins which bind within the cell short peptides, also called epitopes, derived from host and/or pathogen proteins, and present them at the cell surface for inspection by T-cells. T cell recognition is a fundamental mechanism of the adaptive immune system by which the host identifies and responds to foreign antigens [5]. T-cells are key players in regulating a specific immune response. Activation of cytotoxic T-cells requires recognition of specific peptides bound to Major Histocompatibility Complex (MHC) class I molecules. MHC-peptide complexes are potential tools for diagnosis and treatment of pathogens and cancer, as well as for the development of peptide vaccines. Only one in 100 to 200 potential binders actually binds to a certain MHC molecule, therefore a good prediction method for MHC class I binding peptides can reduce the number of candidate binders that need to be synthesized and tested [25]. MHC class II molecules are highly polymorphic proteins that bind peptides derived from processing of antigens and present them to T cells [9]. MHC class II proteins primarily present peptides derived from endocytosed extracellular proteins (exogenous processing pathway). More than 3500 molecules are listed 2 Journal of Biomedicine and Biotechnology in IMGT/HLA database [13]. MHC class I proteins are encoded by three loci: HLA-A, HLA-B, and HLA-C. MHC class II proteins also are encoded by three loci: HLA-DR, HLA-DQ, and HLA-DP. The peptide binding site of class I proteins has a closed cleft, formed by a single protein chain (a-chain) [4]. Usually, only short peptides of 8 to 11 amino acids bind in an extended conformation. In contrast, the cleft of class II proteins is open-ended, allowing much longer peptides to bind, although only 9 amino acids actually occupy the site. The class II cleft is formed by two separate protein chains: a and B [4]. Both clefts have binding pockets, corresponding to primary and secondary anchor positions on the binding peptide. The combination of two or more anchors is called a motif. The experimental determination of motifs for every allele is prohibitively expensive in terms of labor, time, and resources. The only practical and useable alternative is a bioinformatics approach. Many bioinformatics methods exist to predict peptide-MHC binding. Experimentally determined affinities data have formed the basis of many peptide-MHC binding prediction methods, able effectively to discriminate binding from nonbinding peptides. Such methods include so-called motifs, as well as highly sophisticated computer science algorithms and machine learning techniques [18] - artificial neural networks [16]. HMMs [11] and support vector machines (SVMs) [14] and methods derived from computational chemistry, such as QSAR analysis [1] and structure-based approaches [19].

Computational prediction of MHC class II epitopes is of important theoretical and practical value, as experimental identification is costly and time consuming [17, 29, 32]. The basis of a successful computational prediction is a sufficiently large set of high quality training data [37]. There are several databases hosting MHC epitope related data such as SYFPEITHI [26], MHCBN [2], Antijen [33], FIMM [28], MPID [10] and HLA Ligand [27]. Information from those databases is, for the most part, extracted from the literature. These databases typically combine data from different sources and different experimental approaches, which can complicate the generation of consistent training and evaluation datasets.

Major histocompatibility complex (MHC) molecules play an essential role in host-pathogen interactions determining the outcome of many host immune responses. Only a small fraction of the possible peptides that can be generated by the proteolysis of proteases of the host cell actually generate an immune response. MHC class II molecules present peptides derived from proteins taken up from the extracellular environment. So it has always been a curious task to identify the peptide epitopes for MHC class II molecules [22]. A number of methods are publicly available for quantitative MHC class II prediction, namely the ARB [3], BiodMHC [36], SVRMHC [35, 21], MHCpred [8] and NetMHCII [22, 24]. Other methods such as SVMHC [7] and Propred [30] are implementations of the TEPITOPE method [31] and provide prediction scores that are not in any direct way related to the peptide-binding affinity. Kasper et al. also proposed a method for prediction of MHC ligands combining MHC class II binding predictions with local structure prediction [15]. Despite recent progress in method development, the predictive performance for MHC-II remains significantly lower than what can be obtained for MHC-I. One reason for this is that the MHC-II molecule is open at both ends allowing binding of peptides extending out of the groove [20]. So to build an accurate prediction system, we have to gather a large dataset of antigen protein with prior information which binds with MHC. A system will be developed using novel computational algorithm. In the last decade, a numerous number of MHC/Peptide complex structure has been solved. Structural information such as binding surface (MHC/Peptide), affinities & complex stability will be added to it. To reduce the false positive predictions, a similar system will be developed for the protease and antigen reaction inside the cell.


1. To study the interaction between the proteosomes with foreign pathogenic protein
2. To study the antigenic site specific structural confirmation before binding
3. Computational simulation and performance analysis of MHC peptide binding
4. To develop a prediction server for MHC - peptide binding for accurate prediction of antigenic site

Review of Current Status of research and development in the subject

All those prediction tools discussed before perform their job on the basis of amino acid sequence properties of the peptide segment. To make it more accurate and useful, we will take both sequence, structure & dynamics level information of the MHC/Peptide complex. It will be more informative.
So to build an accurate prediction system, we have to gather a large dataset of antigen protein with prior information which binds with MHC. A system will be developed using HMM profiles on the basis of sequence similarities for each type. In the last decade, a numerous number of MHC/Peptide complex structure has been solved. Structural information such as binding surface (MHC/Peptide), affinities & complex stability will be added to it. To reduce the false positive predictions, a similar system will be developed for the protease and antigen reaction inside the cell.

Scope / International and National Status

Epitope site prediction is always a hot topic of research in immunology as it helps in the vaccine development. Computational prediction of MHC class II epitopes is of important theoretical and practical value, as experimental identification is costly and time consuming. In last few decades, lots of emerging approaches have been evolved due to Computational Biology and Bioinformatics. The basis of a successful computational prediction is a sufficiently large set of high quality training data. There are several databases hosting MHC epitope related data such as SYFPEITHI [26], MHCBN [2], Antijen [33], FIMM [28], HLA Ligand [27] already published on the web.

A number of computational tools have been developed for this purpose such as SVMHC [7], NetMHCII [22], NetMHCIIpan, Tepitope/Propred [30], SYFPEITHI [26], IEDB_ARB [43], IEDB_Comblib [43], IEDB_SMM-align [43], IEDB_Cons [43], Rankpep [39], HLA-DR4pred [41], EpiToolKit [40] etc. on the basis of Average relative binding (ARB) matrix, Pocket profile, Stabilized matrix, PSSM, Quantitative structure activity relationship (QSAR) regression and SVM. But there is a lack of large-scale systematic evaluation of their performance due to the unavailability of experimental information.

Utility / Importance of the proposed project in the context of current status

Vaccines continue to have an enormous and unprecedented positive impact on humanity and its wellbeing. Hundreds of millions of human lives have been saved since the first vaccine was discovered: Edward Jenner’s smallpox vaccine in 1796 [5]. Historically, vaccines have been attenuated whole pathogen vaccines such as BCG for TB or Sabin’s Polio vaccine. Issues of safety have led to the development of other strategies for vaccine development, separately focusing on antigen and epitope vaccines. The epitope is the minimal structure able to evoke an immune response. It is the immunological quantum that lies at the heart of immunity. Epitope-based vaccines have the advantage that many sequences able to induce autoimmunity or adverse reactions can be eliminated. Such vaccines are intrinsically safer: they contain no viable microorganisms and cannot induce microbial disease. MHC binding prediction illustrates the usefulness of computational tools for the everyday work of the immunologist [6]. However, several significant obstacles must be overcome before epitope-based vaccines can reach the market en masse. One such obstacle is MHC polymorphism [12]. To build an accurate MHC prediction tool will be helpful to the vaccinologist and immunologist from the drudgery of uninformed experimentation, allowing them to design better, faster, smarter ways of discovery of new reagents, diagnostics, and vaccines.

Methods and Screening

To build an accurate MHC prediction tool using theoretical model is really a tough task. Most of those above discussed models generally use the amino acid sequence information for their respective algorithms, which is very less informative for real case of MHC/Peptide interaction. The datasets are available to study is not appropriately validated informatively completed. Our proposal is to build a proper dataset with sequence, structure (Binding surface motif, Interaction free energy, topology, stability etc.) and dynamics level of information of MHC/Peptide complexes classifying with the various types of MHC classes and the algorithm will be designed on the basis of it. To reduce the false positive predictions, a similar system will be developed for the protease and antigen reaction inside the cell. And we think, by this method, we will achieve the best MHC epitope site prediction tool.

Phase I
In this phase we shall study the interaction between different enzymes present in antigen-presenting cells with pathogenic protein and the composition & properties of different antigenic peptides known so far and their interaction with MHC. We shall also study the different tools and servers developed so far, for MHC - peptide binding and also analysis the algorithms they used and their drawbacks. From this study, we can able to find out the specific pattern between antigenic peptides and MHC groove using computational algorithm.

Phase II
Based on the study in phase I, we design a novel algorithm for recognize antigenic fragment first and then for MHC binding site. Then using this algorithm we shall develop a prediction server for MHC-peptide binding site.

Phase III
In this phase we shall test our prediction server with different dataset and also evaluate the same with the existing tools & servers.

1. A. Doytchinova, V. Walshe, P. Borrow, and D. R. Flower, "Towards the chemometric dissection of peptide - HLAA* 0201 binding affinity: comparison of local and global QSAR models," Journal of Computer-Aided Molecular Design, vol. 19, no. 3, pp. 203-212, 2005.
2. Bhasin M, Singh H, Raghava GP (2003) MHCBN: a comprehensive database of MHC binding and non-binding peptides. Bioinformatics 19: 665-666.
3. Bui HH, Sidney J, Peters B et al. Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics 2005; 57:304-14.
4. C. A. Janeway, P. Travers, M. Walport, and J. D. Capra, "The recognition of antigen," in Immunobiology: The Immune System in Health and Disease, pp. 79-194, 1999.
5. D. R. Flower, "Vaccines: how they work," in Bioinformatics for Vaccinology, pp. 73-112,Wiley-Blackwell, Oxford, UK, 2008.
6. Dimitrov I, Garnev P, Flower DR, Doytchinova I. MHC Class II Binding Prediction-A Little Help from a Friend. J Biomed Biotechnol. 2010;2010:705821.
7. Donnes P, Elofsson A. Prediction of MHC class I binding peptides, using SVMHC. BMC Bioinformatics 2002; 3:25.
8. Doytchinova IA, Flower DR. Towards the in silico identification of class II restricted Tcell epitopes: a partial least squares iterative self-consistent algorithm for affinity prediction. Bioinformatics 2003; 19:2263-70.
9. Germain, K.N. 1994. MHC-dependent antigen processing and peptide presentation-providing ligands for T-lymphocyte activation. Cell. 76:287.
10. Govindarajan KR, Kangueane P, Tan TW, Ranganathan S. MPID: MHC-Peptide Interaction Database for sequence-structure-function information on peptides binding to MHC molecules. Bioinformatics. 2003 Jan 22;19(2):309-10.
11. H. Noguchi, R. Kato, T. Hanai, et al., "HiddenMarkov modelbased prediction of antigenic peptides that interact with MHC class II molecules," Journal of Bioscience and Bioengineering, vol. 94, no. 3, pp. 264-270, 2002.
12. Ivan Dimitrov, Panayot Garnev, Darren R. Flower, and Irini Doytchinova. MHC Class II Binding Prediction - A Little Help froma Friend. Journal of Biomedicine and Biotechnology.
13. J. Robinson, M. J. Waller, P. Parham, et al., "IMGT/HLA and IMGT/MHC: sequence databases for the study of the major histocompatibility complex," Nucleic Acids Research, vol. 31,no. 1, pp. 311-314, 2003.
14. J. Wan, W. Liu, Q. Xu, Y. Ren, D. R. Flower, and T. Li, "SVRMHC prediction server for MHC-binding peptides," BMC Bioinformatics, vol. 7, article 463, 2006.
15. Jorgensen KW, Buus S, Nielsen M. Structural properties of MHC class II ligands, implications for the prediction of MHC class II epitopes. PLoS One. 2010 Dec 30;5(12):e15877.
16. K. Gulukota and C. DeLisi, "Neural network method for predicting peptides that bind major histocompatibility complex molecules," Methods in Molecular Biology, vol. 156, pp. 201-209, 2001.
17. Lafuente EM, Reche PA. Prediction of MHC-peptide binding: a systematic and comprehensive overview. Curr Pharm Des. 2009;15(28):3209-20.
18. Lata S, Bhasin M, Raghava GP. Application of machine learning techniques in predicting MHC binders. Methods Mol Biol. 2007;409:201-15.
19. M. N. Davies, C. K. Hattotuwagama, D. S. Moss, M. G. B. Drew, and D. R. Flower, "Statistical deconvolution of enthalpic energetic contributions to MHC-peptide binding affinity," BMC Structural Biology, vol. 6, article 5, 2006.
20. Morten Nielsen, Ole Lund, Soren Buus and Claus Lundegaard. MHC Class II epitope predictive algorithms. Immunology, 130, 319-328.
21. Murugan N, Dai Y. Prediction of MHC class II binding peptides based on an iterative learning model. Immunome research 2005; 1:6.
22. Nielsen M, Lund O, Buus S, Lundegaard C. MHC class II epitope predictive algorithms. Immunology. 2010 Jul;130(3):319-28.
23. Nielsen M, Lund O. NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction. BMC Bioinformatics 2009; 10:296.
24. Nielsen M, Lundegaard C, Lund O. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics 2007; 8:238.
25. Pierre Donnes and Arne Elofsson. Prediction of MHC class I binding peptides, using SVMHC. BMC Bioinformatics 2002, 3:25
26. Rammensee H, Bachmann J, Emmerich NP, Bachor OA, Stevanovic S (1999) SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics 50: 213-219.
27. Sathiamurthy M, Hickman HD, Cavett JW, Zahoor A, Prilliman K, et al. (2003) Population of the HLA ligand database. Tissue Antigens 61: 12-19.
28. Schonbach C, Koh JL, Sheng X, Wong L, Brusic V (2000) FIMM, a database of functional molecular immunology. Nucleic Acids Res 28: 222-224.
29. Sidney J, Southwood S, Oseroff C, Del Guercio M, Grey H, et al. (1998) Measurement of MHC/peptide interactions by gel filtration. Current protocols in immunology. New york: John Wiley & Sons, Inc. pp 18.13.11-18.13.19.
30. Singh H, Raghava GP. ProPred: prediction of HLA-DR binding sites. Bioinformatics 2001; 17:1236-7.
31. Sturniolo T, Bono E, Ding J et al. Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nat Biotechnol 1999; 17:555-61.
32. Sylvester-Hvid C, Kristensen N, Blicher T, Ferre H, Lauemoller SL, et al. (2002) Establishment of a quantitative ELISA capable of determining peptide - MHC class I interaction. Tissue Antigens 59: 251-258.
33. Toseland CP, Clayton DJ, McSparron H, Hemsley SL, Blythe MJ, et al. (2005) AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data. Immunome Res 1: 4.
34. Van Oosterhout C. A new theory of MHC evolution: beyond selection on the immune genes. Proc Biol Sci. 2009 Feb 22;276(1657):657-65.
35. Wan J, Liu W, Xu Q, Ren Y, Flower DR, Li T. SVRMHC prediction server for MHC-binding peptides. BMC Bioinformatics. 2006 Oct 23;7:463.
36. Wang L, Pan D, Hu X, Xiao J, Gao Y, Zhang H, Zhang Y, Liu J, Zhu S. BiodMHC: an online server for the prediction of MHC class II-peptide binding affinity. J Genet Genomics. 2009 May;36(5):289-96.
37. Wang P, Sidney J, Dow C, Mothe B, Sette A, Peters B. A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach.PLoS Comput Biol 2008; 4:e1000048.

About Author / Additional Info:
Prof. Sachin N. Dharurkar,
Deogiri College, Dept. Bioinformatics,
Aurangabad. Maharashtra (India)