In this ML project, the aim was to use different machine learning algorithms to predict which customers will be the most likely to subscribe to the bank’s new product which is a bank term deposit. To achieve this aim, the first thing done was to conduct an exploratory analysis to understand the data components better. Following this, columns with ‘object’ data type were one-hot encoded to numerical data as most ML models only work with numeric data. Following this, a train and test variable was set up along with a list of models and a function. I used a for loop to apply to test each model in the list using the function created.
RandomForest Classifier turned out to be the most accurate model tested and it was used in the next step which involved Fetching importance. Using the pd.dataframe features importance Index, I sort for the top 10 relevant demographics of clients that the product should be marketed to. Using these top 10 importance, the accuracy of the RF model was re-tested using its ability to fit and predict. After this, precision (the ability to measure the number of relevant items detected) and recall (the ability to count the number of relevant items detected) were also tested. The results show that the model is 88% accurate with a positive precision score of 19% and a positive recall score of 41%.
In this project, the libraries used were NumPy and Pandas for data analysis, Matplotlib and Seaborn for data visualization, and most importantly, Sklearn for statistical modeling.
This project was originally uploaded on Github with the Jupyter notebook, a Powerpoint presentation and a video that explains everything about the project. You can find it here:
Comments