Analyzing PCOS Symptom Interrelations Using Association Rule Mining
Abstract
Nowadays, social media is mostly used for healthcare applications. Polycystic Ovarian Syndrome (PCOS) is a common illness distressing women of procreative stage, classically from 15 to 35 years. PCOS is characterized by a variety of symptoms, including hormone disturbances, delayed periods, overweight, numerous follicles in the ovaries, excessive hair growth, hair thinning, acne, skin pigmentation, and psychological concerns such as depression. While earlier research analyzed PCOS predominantly through clinical texts and medical records using machine learning, the present work extends this perspective by examining social media data to identify symptom prevalence and establish symptom patterns with the Apriori Algorithm. The data collected from Reddit, being unstructured, is pre-processed and analyzed to extract PCOS-related symptoms through a Bag-of-Words approach. Subsequently, Apriori-based Association Rule Mining is used to discover frequent symptoms, derive meaningful rule sets, and establish unique symptom patterns. After experimenting with several support and confidence thresholds, the combination of 0.02 and 0.1 was found to produce the strongest rule sets and symptom associations. The distinctive aspect of this research is that symptom patterns are derived from feature extraction results instead of raw data, reducing dimensionality, and ensuring better scalability.