The application of the weight of evidence and information value approach in predictive modeling for binning continuous variables
Keywords:
weight of evidence, information value, uplift modeling, logistic regressionAbstract
AIM OF THE PAPER
The aim of this study is to discuss the concept of weight of evidence and information value, to give an overview on their theoretical background and their possible practical applications, as well as to propagate the usage of R, which is a free software environment for statistical computing and graphics.
METHODOLOGY
The study deals with the use of weight of evidence (WOE) and information value (IV) for the binning of continuous predictors in logistic regression. The overview on the theoretical background and the practical applications basesd on a literature study. The effect of the binning is examined with a logistic regression model, which predicts the stock exchange trends.
RESULTS
The result of the WOE binning is relatively steady, but the information value of the binned predictors, and the ranking of the predictors based on their information value differs by sample size. The value of the information enterions are influenced by the usage of the binned predictors, but not necessarily the optimal WOE binning gives the best AIC or BIC result.
IMPLICATIONS
The WOE and IV method can be used in the exploratory data analysis too. It’s worth to calculate WOE and IV even if we want to use a binning which based on the business logic, but it is not clear, which categorization is better. WOE also can be used for categorization of those variables which we want to use for segmentation. By evaluation of the logistic regression models several factors has to be considered, because the effect of the changes in the predictors is not always clear. It would be worth to examine the relationships between the information criterions.