Journal of Data Mining and Management

A Comparative Evaluation of Machine Learning Architectures for Early Multi-Disease Prediction and Clinical Diagnostics

2026-06-16T06:57:24+00:00

Early disease detection is very important in modern healthcare because many serious illnesses are diagnosed only after they become severe. Diseases such as heart disease, diabetes, cancer, and kidney disease are responsible for millions of deaths every year across the world. Traditional medical diagnosis methods are often time-consuming, expensive, and sometimes unable to identify diseases at an early stage. Because of this, researchers are now focusing on machine learning techniques to help doctors make faster, more accurate decisions. This study examines the role of machine learning in predicting diseases using patient health data. Different machine learning models, including Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbor, Logistic Regression, Gradient Boosting, and Neural Network, were applied and compared. Publicly available healthcare datasets were used for model training and testing. Before model development, data preprocessing methods such as handling missing values, normalization, and feature selection were performed to improve prediction quality. The models were evaluated using accuracy, precision, recall, and F1-score measures. The experimental results showed that the Neural Network model achieved the highest prediction accuracy of 93.1%, while Gradient Boosting and Random Forest also produced strong results with accuracies of 90.7% and 89.4%, respectively. The study demonstrates that machine learning can become an effective support system for early disease diagnosis and may help healthcare professionals provide timely treatment and reduce mortality rates.

Smart Expense Tracker: Data Mining Intelligence in Areas of Intelligent Personal Finance Management

2026-07-14T04:41:30+00:00

The development of online dealings and the increasing sophistication of personal financial management have contributed to an increased necessity for clever and automated tools that can help people track, differentiate, and simplify their expenditures. The paper is a description of a smart expense tracker system design and development that relies on data mining techniques to provide valuable insights into the individual spending behaviour. The given system that is proposed is a combination of classification, clustering and association rule mining methods in the process of determining the spending pattern, future spending anticipation, and individual budgetary advice. The financial information that is obtained from the users is pre-processed and goes through various data mining processes that include classification into a decision tree using the C4.5 algorithm, K-Means Clustering and association analysis using Apriorism algorithm. Results would be presented in the shape of user-friendly dashboards where one can view how they spend their money at a glance. The Smart Expense Tracker is not founded on manual inputs and bare summaries as traditional expense management applications are, but instead on machine learning and categorization that reduces the human input and primarily maximizes the accuracy. The data of the users will be stored securely, and measures will be taken to ensure that the data is not contravened by the data protection laws. System trials have been conducted using a collection of 4,800 actual expense items, and the experiment showed that the classification accuracy of the system was high, with an average of 91.4 and clusters formed by the system were meaningful and formed in a manner that is substantially related to common categories of expenditure, which include food, transport, entertainment and utilities. The paper is useful to the research on the history of personal finance technology in offering a repeatable and empirical model that could be created in subsequent academic and business ventures.

SmartDTI: Deep Feature-based Prediction of Drug-Target Interactions

2026-06-24T12:16:37+00:00

Drug-target interaction (DTI) is critical in understanding how drugs interact with biological molecules, such as proteins and nucleic acids, to produce therapeutic effects. Accurate prediction of DTIs is a cornerstone in drug discovery and development, influencing both the effectiveness and safety of therapeutic agents. Branch Chain Mining-Drug Target Interaction (BCM-DTI) represents an innovative approach that leverages deep learning techniques to predict DTIs with enhanced accuracy and computational efficiency. By integrating structural, chemical, and sequence-based features through a branch chain mining architecture, BCM-DTI captures complex, non-linear relationships between drugs and targets that traditional methods may overlook. Notably, BCM-DTI exhibits superior performance compared to existing state-of-the-art methods on publicly available benchmark datasets, achieving these results with significantly reduced training time. This improvement in efficiency not only lowers computational resource demands but also holds the promise of expediting the drug discovery pipeline. Faster and more accurate DTI prediction could lead to earlier identification of viable therapeutic candidates, accelerating the development of novel treatments and potentially offering life-saving interventions to patients more quickly. The robust generalization ability of BCM-DTI also suggests its potential adaptability across diverse biomedical applications, including drug repurposing and personalized medicine.

AI-Based Real-Time Heart Stroke Prediction System with Chatbot Integration

2026-06-06T09:51:51+00:00

Stroke is a life-threatening medical emergency that requires early detection to prevent permanent disability or death. Traditional healthcare systems rely on periodic hospital visits, limiting real-time monitoring for high-risk individuals. This paper proposes an intelligent, real-time heart stroke prediction system integrated with an AI-driven health chatbot. The system continuously collects physiological data from wearable sensors—including heart rate, blood pressure, and SpO₂ levels—along with clinical and lifestyle inputs such as age, BMI, hypertension history, smoking status, and glucose level. A trained deep learning model classifies stroke risk into Low, Medium, or High categories. High-risk cases trigger emergency notifications and email alerts. The integrated AI chatbot interprets prediction results, provides medically relevant explanations in simple language, and offers personalized lifestyle recommendations. A historical health dashboard and downloadable PDF reports further support preventive care. An administrative module enables dataset management and algorithm comparison using metrics such as accuracy, precision, recall, F1-score, and support. The proposed system functions as a proactive, user-friendly, intelligent decision-support platform that empowers individuals and healthcare providers with real-time stroke risk awareness and timely intervention capabilities.

Automated Traffic Violation Detection Using Deep Learning and Computer Vision

2026-07-07T06:05:33+00:00

Road traffic violations such as triple riding on two-wheelers and failure to wear helmets are among the leading contributors to accident fatalities in India. Traditional enforcement methods relying on human observation are prone to error, inconsistency, and scale limitations. This paper presents a real-time automated traffic violation detection system that leverages deep learning and computer vision techniques to identify and report such violations from road imagery. The proposed solution employs a two-stage detection pipeline built on the YOLOv8s architecture. The first stage detects motorcycles, triple-riding instances, helmet status, and number plates simultaneously. The second stage performs targeted helmet verification on individual rider crops using a dedicated fine-tuned helmet model, reducing false positives for ambiguous cases. Experimental results demonstrate that the v2 system achieves precision of 82.7%, recall of 74.4%, and mAP@50 of 71.6% on the triple ride model, representing improvements of over 81 percentage points in precision compared to the heuristic-based v1 baseline. The helmet model achieves mAP@50 of 80.8% and precision of 81.2%. The complete pipeline operates at approximately 58 frames per second on a Tesla T4 GPU, making it suitable for real-time deployment. A Gradio-based web interface was developed to provide a user-friendly demonstration and inference platform.