



Abstract:Smoking continues to be a major preventable cause of death worldwide, affecting millions through damage to the heart, metabolism, liver, and kidneys. However, current medical screening methods often miss the early warning signs of smoking-related health problems, leading to late-stage diagnoses when treatment options become limited. This study presents a systematic comparative evaluation of machine learning approaches for smoking-related health risk assessment, emphasizing clinical interpretability and practical deployment over algorithmic innovation. We analyzed health screening data from 55,691 individuals, examining various health indicators, including body measurements, blood tests, and demographic information. We tested three advanced prediction algorithms - Random Forest, XGBoost, and LightGBM - to determine which could most accurately identify people at high risk. This study employed a cross-sectional design to classify current smoking status based on health screening biomarkers, not to predict future disease development. Our Random Forest model performed best, achieving an Area Under the Curve (AUC) of 0.926, meaning it could reliably distinguish between high-risk and lower-risk individuals. Using SHAP (SHapley Additive exPlanations) analysis to understand what the model was detecting, we found that key health markers played crucial roles in prediction: blood pressure levels, triglyceride concentrations, liver enzyme readings, and kidney function indicators (serum creatinine) were the strongest signals of declining health in smokers.




Abstract:The advent of the Internet of Things (IoT) has brought forth additional intricacies and difficulties to computer networks. These gadgets are particularly susceptible to cyber-attacks because of their simplistic design. Therefore, it is crucial to recognise these devices inside a network for the purpose of network administration and to identify any harmful actions. Network traffic fingerprinting is a crucial technique for identifying devices and detecting anomalies. Currently, the predominant methods for this depend heavily on machine learning (ML). Nevertheless, machine learning (ML) methods need the selection of features, adjustment of hyperparameters, and retraining of models to attain optimal outcomes and provide resilience to concept drifts detected in a network. In this research, we suggest using locality-sensitive hashing (LSH) for network traffic fingerprinting as a solution to these difficulties. Our study focuses on examining several design options for the Nilsimsa LSH function. We then use this function to create unique fingerprints for network data, which may be used to identify devices. We also compared it with ML-based traffic fingerprinting and observed that our method increases the accuracy of state-of-the-art by 12% achieving around 94% accuracy in identifying devices in a network.