\n\n 0<\/p>\n<\/td>\n | \n 075<\/p>\n<\/td>\n | 0 = Patients without Diabetes symptoms<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n <\/p>\n What is a Rare Event?<\/b><\/h3>\nAn event is said to be rare if the number of times it occurs is very minimum or low<\/p>\n In both the scenarios mentioned above \u2013 Telecom & and Healthcare, the management was interested in predicting (modelling) CHURN customers & PATIENTS without Diabetes symptoms.\u00a0 These two events are called RARE EVENTs, since its overall presence is relatively less when compared to the levels of the other TARGET VARIABLE (Y).<\/p>\n How will you statistically evaluate whether the Target Variable is imbalanced \/ skewed?<\/b><\/h3>\nPerform a Chi-Square Test using the below command (*here it is being evaluated using R-Open Source software)<\/p>\n Chi-Square Test conducted using R-Software<\/b><\/p>\n Patient.Count<\/p>\n Diabetes\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 925<\/p>\n Without Diabetes\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 75<\/p>\n Chi-squared test for given probabilities<\/p>\n Null Hypothesis : Data is uniformly distributed<\/p>\n Alternative Hypothesis: Data is not uniformly distributed<\/p>\n data:\u00a0 Clinical.Test[, 1]<\/p>\n X-squared = 722.5, df = 1, p-value < 0.00000000000000022<\/p>\n \u00a0<\/b><\/p>\n Chi-Square Test conducted using Minitab<\/b><\/p>\n Chi-Square Goodness-of-Fit Test for Observed Counts in Variable: Count<\/b><\/p>\n <\/p>\n Using category names in Disease<\/h3>\n\n\n\nCategory<\/th>\n | Observed<\/th>\n | Test Proportion<\/th>\n | Expected<\/th>\n | Contribution to Chi-Sq<\/th>\n<\/tr>\n | \nY<\/td>\n | 925<\/td>\n | 0.5<\/td>\n | 500<\/td>\n | 361.25<\/td>\n<\/tr>\n | \nN<\/td>\n | 75<\/td>\n | 0.5<\/td>\n | 500<\/td>\n | 361.25<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n N\u00a0 DF\u00a0 Chi-Sq\u00a0 P-Value<\/p>\n 1000\u00a0\u00a0 1\u00a0\u00a0 722.5\u00a0\u00a0\u00a0 0.000<\/p>\n <\/p>\n As the \u2018p-value\u2019 < 0.05 (*which is commonly chosen Alpha value) we can Reject Null Hypothesis and conclude that \u2018Data is not uniformly distributed\u2019<\/p>\n How to overcome this problem?<\/b><\/p>\n This problem can be overcome by two main methods:<\/p>\n | |