Question 2.1

Describe a situation or problem from your job, everyday life, current events, etc., for which a

classification model would be appropriate. List some (up to 5) predictors that you might use.

For example, I was working on the self-serve retention project and was tasked with improving

customer retention for the company’s e-commerce platform. The goal was to identify potential

customers who are likely to churn. The real scenario we observed from the backend data source was

that some customers stopped purchasing midway or disengaged from the platform after interacting with

a specific campaign. Therefore, I was asked to create useful interventions to retain these customers.

When considering the classification model, I aimed to predict whether a customer was likely to churn

within the next 30 days. The time threshold might sometimes extend to quarters, depending on the start

date and the length of each campaign involved in the customer retention metrics.

Predictors:

Purchase Frequency: By analyzing the frequency at which each customer purchases our service online

within a specific period, we can determine how active the user is in interacting with us. This can be

monitored through “amplify” and “drift” channels. Additionally, the time range can be extended to

improve accuracy.

Purchase Date Information: The number of days or hours since the customer made their last purchase

with us. This metric also reflects how active the user is.

Average Order Value: The average amount spent by the customer per order. This can help determine

the service level the company is providing to this customer. Some conglomerates, such as Allstate and

Meta, have customized plans with us, so we may exclude those outliers.

Engagement with Marketing Emails: The percentage of marketing emails opened or clicked by the

customer. We need to assess whether and to what extent the customer engaged with our email

campaigns.

Customer Support Interactions: The number of times the customer contacted support within a specified

period. These customers may not have made a purchase or churned during that time, but their active

engagement with us is also a positive indicator.

Question 2.2

The files credit_card_data.txt (without headers) and credit_card_data-headers.txt

(with headers) contain a dataset with 654 data points, 6 continuous and 4 binary predictor variables. It

has anonymized credit card applications with a binary response variable (last column) indicating if the

application was positive or negative. The dataset is the “Credit Approval Data Set” from the UCI Machine

Learning Repository (https://archive.ics.uci.edu/ml/datasets/Credit+Approval) without the categorical

variables and without data points that have missing values.

1. Using the support vector machine function ksvm contained in the R package kernlab, find a

good classifier for this data. Show the equation of your classifier, and how well it classifies the

Partial preview of the text

Download ISYE 6501 Course homework assignment one solution and more Assignments Data Representation and Algorithm Design in PDF only on Docsity!

Question 2. Describe a situation or problem from your job, everyday life, current events, etc., for which a classification model would be appropriate. List some (up to 5) predictors that you might use. For example, I was working on the self-serve retention project and was tasked with improving customer retention for the company’s e-commerce platform. The goal was to identify potential customers who are likely to churn. The real scenario we observed from the backend data source was that some customers stopped purchasing midway or disengaged from the platform after interacting with a specific campaign. Therefore, I was asked to create useful interventions to retain these customers. When considering the classification model, I aimed to predict whether a customer was likely to churn within the next 30 days. The time threshold might sometimes extend to quarters, depending on the start date and the length of each campaign involved in the customer retention metrics. Predictors: Purchase Frequency: By analyzing the frequency at which each customer purchases our service online within a specific period, we can determine how active the user is in interacting with us. This can be monitored through “amplify” and “drift” channels. Additionally, the time range can be extended to improve accuracy. Purchase Date Information: The number of days or hours since the customer made their last purchase with us. This metric also reflects how active the user is. Average Order Value: The average amount spent by the customer per order. This can help determine the service level the company is providing to this customer. Some conglomerates, such as Allstate and Meta, have customized plans with us, so we may exclude those outliers. Engagement with Marketing Emails: The percentage of marketing emails opened or clicked by the customer. We need to assess whether and to what extent the customer engaged with our email campaigns. Customer Support Interactions: The number of times the customer contacted support within a specified period. These customers may not have made a purchase or churned during that time, but their active engagement with us is also a positive indicator. Question 2. The files credit_card_data.txt (without headers) and credit_card_data-headers.txt (with headers) contain a dataset with 654 data points, 6 continuous and 4 binary predictor variables. It has anonymized credit card applications with a binary response variable (last column) indicating if the application was positive or negative. The dataset is the “Credit Approval Data Set” from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Credit+Approval) without the categorical variables and without data points that have missing values.

Using the support vector machine function ksvm contained in the R package kernlab, find a good classifier for this data. Show the equation of your classifier, and how well it classifies the

data points in the full data set. (Don’t worry about test/validation data yet; we’ll cover that topic soon.) Notes on ksvm  You can use scaled=TRUE to get ksvm to scale the data as part of calculating a classifier.  The term λ we used in the SVM lesson to trade off the two components of correctness and margin is called C in ksvm. One of the challenges of this homework is to find a value of C that works well; for many values of C, almost all predictions will be “yes” or almost all predictions will be “no”.  ksvm does not directly return the coefficients a 0 and a 1 …am. Instead, you need to do the last step of the calculation yourself. Here’s an example of the steps to take (assuming your data is stored in a matrix called data):^1 

call ksvm. Vanilladot is a simple linear kernel.

model <- ksvm(data[,1:10],data[,11],type=”C- svc”,kernel=”vanilladot”,C=100,scaled=TRUE)

calculate a 1 …am

a <- colSums(model@xmatrix[[1]] * model@coef[[1]]) a

calculate a 0

a0 <- – model@b a

see what the model predicts

pred <- predict(model,data[,1:10]) pred (^1) I know I said I wouldn’t give you exact R code to copy, because I want you to learn for yourself. In general, that’s definitely true – but in this case, because it’s your first R assignment and because the ksvm function leaves you in the middle of a mathematical calculation that we haven’t gotten into in this course, I’m giving you the code.

It’s also good to try a moderate parameter to see what is happening, the output achieves a balance between classification accuracy and margin size. It’s predicted to be a balanced model before I run the results. The final C might be higher, but I need to make a balanced test. The result is approximately a half by half case scenario. In this case, we are also less likely to use the moderate C approach is because the banking system cannot bargain on those clients that has a small possibility of not paying back their credit card or retain the loan to the future months which will sabotage the cash flow of the bank. This is useful when the credit card dataset has very little noise, and I think the model to classify most of the data points correctly. We could get: f(x)=0.08148382−0.0011026642⋅A1−0.0008980539⋅A2−0.0016074557⋅A3+0.0029041700⋅A8+1. 6 ⋅A9−0.0029852110⋅A10−0.0002035179⋅A11−0.0005504803⋅A12−0.0012519187⋅A14+0.1064404601⋅A15. Here we have a larger parameter, let’s say 3 to 20. In this case, I will be focusing on minimizing classification errors, which can lead to overfitting. It’s also important to note to not put a very large parameter so that we can avoid the severe overfitting as the result. This might filter out majority of the clients within the banking client data system (credit_card_data-headers in our case). This approach is not ideal here but can be useful in controlled environments where the model's predictions are tightly aligned with the training data, and the test data is expected to be very similar. In that case, even larger C value are worth to try. The tested coefficient are set to be C = 1.45 we can follow this process:

We could get: f(x)=0.08142887−0.0011602821⋅A1−0.0009655854⋅A2−0.0016017220⋅A3+0.0028041792⋅A8+1. 8 ⋅A9−0.0028790514⋅A10−0.0001536397⋅A11−0.0006017963⋅A12−0.0011218422⋅A14+0.1064268605⋅A15, by choosing this parameter, we can minimize the risk of classification errors by maximizing the size of the margin and by reducing error in a large extent. The accuracy is 86.39%. We can also evaluate the accuracy at the very end, it’s kind of a checking process to see how well the model classifies the data points. Also, If the accuracy is still not satisfactory, I can gradually increase C a little bit, observing how the accuracy changes by fine-tune the model. To sum up, it’s still very time consuming to test a range of C’s which is the parameter that we want to find. However, we can still get a very close C after multiple tries, and sees our data is slitted up in a reasonable way.

You are welcome, but not required, to try other (nonlinear) kernels as well; we’re not covering them in this course, but they can sometimes be useful and might provide better predictions than vanilladot. I can use polynomial kennel. We might get a different results compared to the one above. The accuracy of using polynomial kennel is 86.40% which is nearly the same as the KSVM approach. Except vanilladot, we can use RBF kernel to train the SVM model, since we can’t directly extract the coefficient of the RBF Kernel, I can start to look into the RBF kernel results by making the predictions.

I randomly tested different values of 𝑘 k and evaluate the model's results. It seems the accuracy results of the k-NN model are very low, this model might not effectively classify the data points. There is some way to improve the k-NN classification. I can alter the data scaling, test it to make sure all predictor variables are properly scaled. If there are any categorical variables was skipped for the predictors, it could affect model performance. Also, the value of picking k is important. The choice of k might affect how the model makes predictions. If k is equal to 1, this can lead to overfitting, while if k is large. It can smooth over important distinctions. Since the current k values is not working well, I can try a wider range. I can also try the odd numbers for the value of k. Since we have a fixed data source, we can’t increase the set size to increase this training. This dataset has 358 instances of class 0 and 296 instances of class 1. It has a slight imbalance, with the negative class being more prevalent. However, the distribution isn't heavily skewed, so the accuracy issue may because of the inadequate predictive features. We can make a full check for the scaling by checking the structure of the dataset, and we can also check the null and infinite values under the dataset.

Let’s try 25 values of k to see if there is any difference. K is getting larger, still quite small compared to the acceptable accuracy value. 

ISYE 6501 Course homework assignment one solution, Assignments of Data Representation and Algorithm Design

Related documents