Machine Learning Models in Stock Market Prediction (2202.09359v1)

Published 6 Feb 2022 in q-fin.ST and cs.LG

Abstract: The paper focuses on predicting the Nifty 50 Index by using 8 Supervised Machine Learning Models. The techniques used for empirical study are Adaptive Boost (AdaBoost), k-Nearest Neighbors (kNN), Linear Regression (LR), Artificial Neural Network (ANN), Random Forest (RF), Stochastic Gradient Descent (SGD), Support Vector Machine (SVM) and Decision Trees (DT). Experiments are based on historical data of Nifty 50 Index of Indian Stock Market from 22nd April, 1996 to 16th April, 2021, which is time series data of around 25 years. During the period there were 6220 trading days excluding all the non trading days. The entire trading dataset was divided into 4 subsets of different size-25% of entire data, 50% of entire data, 75% of entire data and entire data. Each subset was further divided into 2 parts-training data and testing data. After applying 3 tests- Test on Training Data, Test on Testing Data and Cross Validation Test on each subset, the prediction performance of the used models were compared and after comparison, very interesting results were found. The evaluation results indicate that Adaptive Boost, k- Nearest Neighbors, Random Forest and Decision Trees under performed with increase in the size of data set. Linear Regression and Artificial Neural Network shown almost similar prediction results among all the models but Artificial Neural Network took more time in training and validating the model. Thereafter Support Vector Machine performed better among rest of the models but with increase in the size of data set, Stochastic Gradient Descent performed better than Support Vector Machine.

Citations (26)

View on Semantic Scholar

Summary

The paper demonstrates that Linear Regression and ANN achieve high prediction accuracy while SVM and SGD excel with larger datasets.
It employs a systematic methodology with varied data subsets and cross-validation to assess models using metrics like MSE, RMSE, MAE, and R².
It highlights that ensemble methods and models like kNN and DT risk overfitting with extensive data, underscoring the need for careful algorithm selection.

Evaluating Supervised Machine Learning Models for Stock Market Prediction

The paper "Machine Learning Models in Stock Market Prediction" conducted by Gurjeet Singh aims to forecast the Nifty 50 Index of the Indian Stock Market utilizing eight different supervised machine learning algorithms. These include Adaptive Boosting (AdaBoost), k-Nearest Neighbors (kNN), Linear Regression (LR), Artificial Neural Network (ANN), Random Forest (RF), Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), and Decision Trees (DT). The experimental setup involved analyzing a substantial dataset of historical Nifty 50 stock market data spanning over 25 years, from April 22, 1996, to April 16, 2021. The dataset comprised 6220 trading days, allowing for a robust empirical evaluation across a diverse temporal landscape.

Methodological Framework

This research employed a structured approach, segmenting the dataset into four distinct subsets: 25%, 50%, 75%, and 100% of the complete data. Each subset underwent further division into training and testing datasets, maintaining an 80/20 split. These subsets facilitated a nuanced exploration of model performance across varied data volumes.

The paper utilized cross-validation, testing on training data, and testing on test data to evaluate the models' predictive efficiency. Critical metrics examined included Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and R-Square (R²), alongside execution times for training and testing phases.

Summary of Findings

The findings indicate varying levels of performance across different models. Linear Regression and Artificial Neural Network consistently demonstrated comparable results, both achieving high accuracy, although ANN required more computational time. SVM showcased superior performance to most models; however, its efficacy diminished with larger datasets, enabling SGD to outperform SVM in such scenarios.

Contrastingly, ensemble methods—namely AdaBoost and Random Forest—along with kNN and DT, exhibited decreased performance with increasing dataset size, indicating potential overfitting issues on larger data scales. This was notably evident given the negative R² values for AdaBoost, Random Forest, kNN, and DT in some tests, suggesting these models become particularly vulnerable to overfitting when trained on extensive datasets.

Implications for Stock Market Prediction

The paper's analysis underscores the importance of selecting an appropriate machine learning algorithm based on the dataset size and characteristics for effective stock market prediction. While Linear Regression and ANN provide reliable benchmarks, augmenting these models with insights from SVM and SGD can enhance predictive robustness, particularly in handling larger datasets without succumbing to overfitting.

Future Directions

While this paper provides a comprehensive evaluation of traditional machine learning models, future work could explore newer architectures like deep learning and ensemble techniques that adaptively combine multiple models to enhance prediction accuracy. Additionally, integrating sentiment analysis data or real-time updates could offer richer insights, potentially improving the predictive capabilities of these models within the dynamic financial markets.

In conclusion, this research delineates crucial insights into the application of supervised machine learning models within the stock market domain, emphasizing the significance of model selection relative to data size and the potential pitfalls of overfitting, thereby paving the way for more sophisticated predictive approaches in financial analytics.

PDF Markdown