- The paper demonstrates that Linear Regression and ANN achieve high prediction accuracy while SVM and SGD excel with larger datasets.
- It employs a systematic methodology with varied data subsets and cross-validation to assess models using metrics like MSE, RMSE, MAE, and R².
- It highlights that ensemble methods and models like kNN and DT risk overfitting with extensive data, underscoring the need for careful algorithm selection.
Evaluating Supervised Machine Learning Models for Stock Market Prediction
The paper "Machine Learning Models in Stock Market Prediction" conducted by Gurjeet Singh aims to forecast the Nifty 50 Index of the Indian Stock Market utilizing eight different supervised machine learning algorithms. These include Adaptive Boosting (AdaBoost), k-Nearest Neighbors (kNN), Linear Regression (LR), Artificial Neural Network (ANN), Random Forest (RF), Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), and Decision Trees (DT). The experimental setup involved analyzing a substantial dataset of historical Nifty 50 stock market data spanning over 25 years, from April 22, 1996, to April 16, 2021. The dataset comprised 6220 trading days, allowing for a robust empirical evaluation across a diverse temporal landscape.
Methodological Framework
This research employed a structured approach, segmenting the dataset into four distinct subsets: 25%, 50%, 75%, and 100% of the complete data. Each subset underwent further division into training and testing datasets, maintaining an 80/20 split. These subsets facilitated a nuanced exploration of model performance across varied data volumes.
The paper utilized cross-validation, testing on training data, and testing on test data to evaluate the models' predictive efficiency. Critical metrics examined included Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and R-Square (R²), alongside execution times for training and testing phases.
Summary of Findings
The findings indicate varying levels of performance across different models. Linear Regression and Artificial Neural Network consistently demonstrated comparable results, both achieving high accuracy, although ANN required more computational time. SVM showcased superior performance to most models; however, its efficacy diminished with larger datasets, enabling SGD to outperform SVM in such scenarios.
Contrastingly, ensemble methods—namely AdaBoost and Random Forest—along with kNN and DT, exhibited decreased performance with increasing dataset size, indicating potential overfitting issues on larger data scales. This was notably evident given the negative R² values for AdaBoost, Random Forest, kNN, and DT in some tests, suggesting these models become particularly vulnerable to overfitting when trained on extensive datasets.
Implications for Stock Market Prediction
The paper's analysis underscores the importance of selecting an appropriate machine learning algorithm based on the dataset size and characteristics for effective stock market prediction. While Linear Regression and ANN provide reliable benchmarks, augmenting these models with insights from SVM and SGD can enhance predictive robustness, particularly in handling larger datasets without succumbing to overfitting.
Future Directions
While this paper provides a comprehensive evaluation of traditional machine learning models, future work could explore newer architectures like deep learning and ensemble techniques that adaptively combine multiple models to enhance prediction accuracy. Additionally, integrating sentiment analysis data or real-time updates could offer richer insights, potentially improving the predictive capabilities of these models within the dynamic financial markets.
In conclusion, this research delineates crucial insights into the application of supervised machine learning models within the stock market domain, emphasizing the significance of model selection relative to data size and the potential pitfalls of overfitting, thereby paving the way for more sophisticated predictive approaches in financial analytics.