| Type: | Package |
| Title: | Collapses Levels, Computes Information Value and WoE |
| Version: | 0.3.0 |
| Author: | Krishanu Mukherjee |
| Maintainer: | Krishanu Mukherjee <toton1181@gmail.com> |
| Description: | Contains functions to help in selecting and exploring features ( or variables ) in binary classification problems. Provides functions to compute and display information value and weight of evidence (WoE) of the variables , and to convert numeric variables to categorical variables by binning. Functions are also provided to determine which levels ( or categories ) of a categorical variable can be collapsed (or combined ) based on their response rates. The functions provided only work for binary classification problems. |
| License: | GPL-2 |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.1.0 |
| Imports: | dplyr,lazyeval, ggplot2 |
| Depends: | magrittr |
| Suggests: | knitr, rmarkdown |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2020-06-04 13:09:20 UTC; User |
| Repository: | CRAN |
| Date/Publication: | 2020-06-04 13:20:02 UTC |
German Credit data set
Description
This data set classifies customers as "Good" or "Bad" as per their credit risks.This data set was contributed by Professor Dr. Hans Hofmann,and can be downloaded from the UCI Machine Learning Repository.
Usage
data("German_Credit")
Format
A data frame with 1000 observations on the following 21 variables.
Account_Balancea factor with levels
A11A12A13A14Durationa numeric vector
Credit_Historya factor with levels
A30A31A32A33A34Purposea factor with levels
A40A41A410A42A43A44A45A46A48A49Credit_Amounta numeric vector
Saving_Accounts_Bondsa factor with levels
A61A62A63A64A65Current_Employment_Lengtha factor with levels
A71A72A73A74A75Installment_Ratea numeric vector
MaritalStatusnGendera factor with levels
A91A92A93A94Guarantorsa factor with levels
A101A102A103- ‘Duration in Current Address’
a numeric vector
Valuable_Asseta factor with levels
A121A122A123A124Agea numeric vector
Other_Credita factor with levels
A141A142A143Housinga factor with levels
A151A152A153Existing_Creditsa numeric vector
Joba factor with levels
A171A172A173A174Dependentsa numeric vector
Telephonea factor with levels
A191A192ForeignWorkera factor with levels
A201A202Good_Bada numeric vector
Source
https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)
Examples
data(German_Credit)
str(German_Credit)
IVCalc
Description
This function displays the Information Values by the levels of an attribute This information is displayed for all attributes in the data set
Usage
IVCalc(dset, resp = "y", bins = 10, adjFactor = 0.5)
Arguments
dset |
The data frame containing the data set |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
bins |
A number denoting the number of bins.Default value is 10 |
adjFactor |
A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute |
Value
A list containing the tables of Information Values by levels for every attribute
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
l<-list()
# Call the function as follows
l<-IVCalc(German_Credit,resp="Good_Bad",bins=10)
# Information Value for the attribute Account_Balance in the German_Credit data
l$Account_Balance
IVCalc2
Description
This function displays the Information Values of all the attributes in the data set
Usage
IVCalc2(dset, resp = "y", bins = 10, adjFactor = 0.5)
Arguments
dset |
The data frame containing the data set |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
bins |
A number denoting the number of bins.Default value is 10 |
adjFactor |
A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute |
Value
A data frame containing the Information Values for every attribute
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
d<-data.frame()
# Call the function as follows
d<-IVCalc2(German_Credit,resp="Good_Bad",bins=10)
# Information Value for all the attributes in the German_Credit data
d
displayIV
Description
This function displays the Information Values of the levels of an attribute.
Usage
displayIV(dset, col = "xyz", resp = "y", adjFactor = 0.5, bins = 10)
Arguments
dset |
The data frame containing the data set |
col |
A character respresenting the name of the attribute . The attribute can either be numeric or categorical |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
adjFactor |
A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute |
bins |
A number denoting the number of bins.Default value is 10 |
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
displayIV(German_Credit,col="Credit_History",resp="Good_Bad")
displayResponseRatebyLevels
Description
This function displays the response percents of the levels of an attribute.
Usage
displayResponseRatebyLevels(
dset,
col = "job",
resp = "Good_Bad",
bins = 10,
adjFactor = 0.5
)
Arguments
dset |
The data frame containing the data set |
col |
A character respresenting the name of the attribute . The attribute can either be numeric or categorical |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
bins |
A number denoting the number of bins.Default value is 10 |
adjFactor |
A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute |
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
displayResponseRatebyLevels(German_Credit,col="Credit_History",resp="Good_Bad")
displayWOE
Description
This function displays the Weight of Evidence of the levels of an attribute.
Usage
displayWOE(dset, col = "xyz", resp = "y", adjFactor = 0.5, bins = 10)
Arguments
dset |
The data frame containing the data set |
col |
A character respresenting the name of the attribute . The attribute can either be numeric or categorical |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
adjFactor |
A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute |
bins |
A number denoting the number of bins.Default value is 10 |
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
displayWOE(German_Credit,col="Credit_History",resp="Good_Bad")
levelsCollapser
Description
This function displays the response rates by the levels of an attribute Levels with similar response rates may be combined
Usage
levelsCollapser(dset, resp = "y", bins = 10)
Arguments
dset |
The data frame containing the data set |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
bins |
A number denoting the number of bins.Default value is 10 |
Value
A list containing the tables of response rate by levels for every attribute
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
# Create an empty list
l<-list()
# Call the function as follows
l<-levelsCollapser(German_Credit,resp="Good_Bad",bins=10)
# response rate by levels of the Account_Balance in the German_Credit data
l$Account_Balance
# Collapse levels with similar response percentages.
numericToCategorical
Description
This function categorizes a numerical variable by binning
Usage
numericToCategorical(dset, col = "job", resp = "y", bins = 10, adjFactor = 0.5)
Arguments
dset |
The data frame containing the data set |
col |
A character respresenting the name of the numeric attribute which we want to categorize |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
bins |
A number denoting the number of bins.Default value is 10 |
adjFactor |
A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute |
Value
A list containing the categorized attribute,a table of Information Values for the levels of the categorized attribute,the Information Value for the entire attribute,a table showing the response rates of the levels of the categorized attribute
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
# Create an empty list
l<-list()
# Call the function as follows.
#This will categorize the numeric variable Duration in the German_Credit dataset.
l<-numericToCategorical(German_Credit,col="Duration",resp="Good_Bad")
# To view the categorized variable
l$categoricalVariable
# To view the IV table of the levels of the categorized variable
l$IVTable
# To view the total IV value of the categorized variable
l$IV
# To view the response rates of the levels of the categorized variable
l$collapseLevels