Deep Learning for Stock Market Sentiment Analysis
Big Data term describes, massive data sets having a large amount of continuously increasing varied data with a complex structure. With the current prevalence of big data, it has become crucial for various fields like banking, agriculture, chemistry, cloud computing, finance, marketing, stocks, healthcare to predict the outcomes affecting the future of each of these fields. Most of today’s big data is generated by Social media which is considered as one of the greatest assets of companies which can be integrated for effective decision making. Big Data can be used for analyzing public sentiment as they are totally generated by social media users.
The stock market price prediction is one of very crucial topic in financial sector which can used by many individuals and companies to get a vast profit. Owing to this fact, researches have paid their major attention for many years on the accurate prediction of stock market price fluctuation to decide whether to auction stocks or procure stocks. Many studies show that there is a positive correlation in between public sentiment and stock market. So, the sentiment analysis using highly fluctuating, massive social media big data by using the techniques of data mining, machine learning techniques and deep learning techniques can be used to address the non-linear stock market.. There are many financial social networks like StockTwits and non-financial social networks like Twitter which produce a great deal of unstructured big data that can be integrated into decision making regarding stock market movements.
This article emphasizes the suitability of the usage of Deep Learning techniques for the sentiment analysis over the other Big Data analytic methods like data mining and machine learning techniques.
Data mining based approaches are used mainly with lexical-based approaches. They uses textual data in annual and quarterly financial reports that are company generated and content rich financial news articles. A probabilistic rule-based prediction system uses data mining techniques and keyword tuple counting which finally can periodically forecasts about stock markets. After using tuple counting and transformed into weights these weights and training data-set is used to generate rules. Then these rules are applied on predicting the stock indices. Accuracy of 60%- 70% was recorded using data mining approaches. However, these approaches require predefined positive and negative list of words to extract the sentiment of new documents which consumes a lot of time and a unique list cannot be formed as the list differs in the context they are being used. In data mining, feature extraction is the most challenging task in dealing with Big Data. Using deep learning techniques, this problem can be solved.
In using machine learning techniques, the features extracted using a variety of unigrams and bigrams, part-of-speech (POS) are fed to the classifier model which is a machine learning model and trained using many supervised machine learning algorithms like Support Vector Machine (SVM), Naive Bayes (NB), Maximum Entropy (ME) algorithms, Random Forest Algorithm, Logistic regression algorithm and LibSVM. This classifier model is acting as the sentiment analyzer that can identify the sentiment type of the tweet whether it is positive, negative and neutral, finding any correlation between twitter sentiment and stock prices and which words in tweets correlate with stock prices using a post analysis of price change and tweets was expected. LibSVM( Library for support vector machines) showed an accuracy of 71.82% where LibSVM showed the highest accuracy.
A neural network which contributed to deep learning is a network generated by examining a database and by identifying and mapping all significant patterns and relationships that exist among different attributes. As shown in the fig 1, neural network has layers where nodes are connected end-end. There are 3 types of layers as input layer, hidden layer and output layer. The network then uses a particular pattern to predict an outcome. The patterns are input to the neural network using input layer and communicated through number of hidden layers and the result is output from the output layer. Actual processing occurs inside hidden layers.
In machine learning and data mining methodologies, feature extraction should be done first by using additional techniques in order to reduce the complexity of data to make it easier for the machine learning classifier to identify patterns more easily. However, deep learning include feature extraction and classification both at the same time to learn high-level features and complex patterns incrementally. Owing to all these facts, deep learning has become the supreme in accuracy of predicting stock market prices. Many studies show the application of deep learning techniques for stock market sentiment analysis such as doc2vec, Recurrent Neural Networks and LSTM and Convolutional neural networks(CNN) constantly. It finally showed that the best model for stock market price prediction using sentiment analysis using social media big data is CNN. This is due to the certain benefits of CNN in compared with other neural networks. Each neuron in the first hidden layer is only connected to a small region of other input neurons which can reduce the complexity which can reduce computational conflicts. Second, the same feature in different locations can be identified using same weight to every hidden layer. All these are helpful to make CNN to extract only the core details effectively.
In conclusion, deep learning is considered to be the best Big Data Analytic method in catering the dynamic, complicated and non-linear stock market which further solves many problems encountered in Big Data providing many advantages over the other Big Data analytic methods. Owing to the ability in dealing with fuzzy, uncertain and insufficient data which may fluctuate rapidly in very short period of time, deep learning techniques is more significant for stock market predictions. In addition, Deep Learning models are much simpler and they learn features during the process of learning which ends up with more accurate predictions.
On behalf of team LiveRoom, Written by Poojani Athukorala