Back close

Cloud Based Big Data Solution for Cancer Classification: Using Databricks on Large Scale Genomic Data

Publication Type : Conference Paper

Publisher : IEEE

Source : 2024 1st International Conference on Communications and Computer Science (InCCCS)

Url : https://doi.org/10.1109/incccs60947.2024.10593322

Campus : Bengaluru

School : School of Computing

Year : 2024

Abstract : This study explores the application of big data technologies and cloud computing in genomic cancer classification. The primary objective of this study is to revolutionize medical diagnostics by efficiently processing extensive genomic datasets using Apache Spark and Databricks platform to address the computational challenges. A variety of machine learning algorithms, including decision trees, logistic regression, and random forests, are used to perform cancer type classification. The quantitative results are particularly noteworthy, with logistic regression achieving an impressive accuracy and precision rate of 98.72%, showing its prowess in cancer classification. However, it's essential to consider the trade-off, as logistic regression has a higher computational time, clocking in at 182.65 seconds. In scenarios where time complexity is a critical factor, decision trees emerge as a viable option, offering a favourable balance between accuracy and computational efficiency. In summary, this research showcases the potential of logistic regression as the top-performing model for cancer classification, albeit with higher computational time requirements. It emphasizes the importance of selecting the appropriate model based on specific use cases. Furthermore, this study highlights the transformative impact of cloud-based platforms in addressing large-scale genomic analysis challenges. By seamlessly integrating advanced computational techniques into medical research, it opens promising avenues for the future of healthcare.

Cite this Research Publication : A Amruth, R Ramanan, Rhea Paul, C Vimal, B.M Beena, Cloud Based Big Data Solution for Cancer Classification: Using Databricks on Large Scale Genomic Data, 2024 1st International Conference on Communications and Computer Science (InCCCS), IEEE, 2024, https://doi.org/10.1109/incccs60947.2024.10593322

Admissions Apply Now