Gene-Disease Association

Document Type : Original Research Articles.


Faculty of Computers and Information, Computer Science Dept. Mansoura University, Egypt


Disease susceptibility prediction is defined as follows. Given training set S and a test case t∉S as a tuple (known as SNP,
unknown disease), trying predicting the unknown disease with maximum accuracy. DisGeNET is a proponent dataset in
disease susceptibility research. This paper reviews DisGeNET comprehensive information, before introducing a proposed
system operating atop it. First, vetting the dataset by consolidation, and removing genes with effects beyond a
certain threshold. Second, computing the empirical cumulative distribution function, using it for plotting and printing gene associations for many diseases such as, and not limited to, Alzheimer, Anemia, and Brain, breast cancer proposed
methods such as applying C4.5 & naïve Bayes give better accuracy then previous works
