SlideShare a Scribd company logo
Using support vector machine with a
  hybrid feature selection method
    to the stock trend prediction


                  Ming-Chi Lee
     Expert Systems with Applications . 2009

             Presenter: Yu Hsiang Huang
                  Date: 2012-05-17



                                               1
Outline
•   Introduction
•   Feature selection
•   Research design
•   Experimental results and analysis
•   Conclusion



                                        2
Introduction
• Stock market
   – Highly nonlinear dynamic system
• Application of AI
   – Expert system , Fuzzy system, Neuron network
   – Back propagation neural network (BPNN)
       •   Power of prediction is better than the others
       •   Require a large amount of training data to estimate the distribution of input pattern
       •   Over-fitting nature
       •   Fully depends on researcher’s experience of knowledge to preprocess data
             –   relevant input variables, hidden layer size, learning rate, momentum, etc.




                                                                                               3
Introduction
• In this paper
   – Support vector machine (SVM)
       • Captures geometric characteristics of feature space without deriving weights of
         networks from the training data.
       • Extracts the optimal solution with the small training set size
       • Local optimal solution vs. Global optimum solution
       • No over-fitting
       • Classification performance is influenced by dimension or number of feature variables
   – Feature selection
       • Addresses the dimensionality reduction problem by determining a subset of available
         features which is most essential for classification
       • Hybrid feature selection : Filter method + wrapper method  F_SSFS
       • F_SSFS : F-score + Supported sequential forward search
       • Optimal parameter search
   – Compare performance between BP and SVM

                                                                                          4
SVM-based model with F_SSFS
                                    Original feature variables
Hybrid feature selection


                                          Filter part
                                 Feature pruning using F-score

                                                      Pre-selected features

                                       Wrapper part
                           SSFS algorithm find best feature variables

                                                      Best Feature variables


   Data                                       SVM
                    Training , testing , evaluating the classification accuracy


                                                                                  5
Feature selection
• Filter method :
    – No feed back from classifier
    – Estimate the classification performance by some indirect assessments
         • Distance : reflect how well the classes separate from each other




                                                                              No feedback from
                                                                                  classifier

     Estimate the
     classification
performance : distance
                                                                                           6
Feature selection




                    7
Feature selection
• F-score and Supported Sequential Forward Search (F_SSFS)
   – F-score

                     Original feature variables



                         Calculate F-score



                           Sort F-score



                    Select top K F-score feature




                                                       8
SVM-based model with F_SSFS
                                    Original feature variables
Hybrid feature selection


                                          Filter part
                                 Feature pruning using F-score

                                                      Pre-selected features

                                       Wrapper part
                           SSFS algorithm find best feature variables

                                                      Best Feature variables


   Data                                       SVM
                    Training , testing , evaluating the classification accuracy


                                                                                  9
Feature selection
• Wrapper method:
  – Classifier-dependent
      • Evaluate the “goodness” of the selected feature subset directly (from classifier)
      • Should intuitively yield better performance
  – Have limit applications
      • Due to the high computational complexity involved




                                                                             Feedback from
                                                                                classifier




                                                                                            10
Feature selection
• F-score and Supported Sequential Forward Search (F_SSFS)
   – Supported sequential forward search (SSFS)
       • Play the role of wrapper
       • A variation of the sequential forward search (SFS) algorithm that is specially tailored to SVM
         to expedite the feature searching process
       • Support vector : training samples other than support vectors have no contribution to
         determine the decision boundary
       • Dynamically maintains an active subset as the candidates of the support vector
       • Training SVM using reduced subset rather than the entire training set - less computational cost




                                                                                                11
Feature selection
• F-score and Supported Sequential Forward Search (F_SSFS)
   – Supported sequential forward search (SSFS)


                        f1   f2    f3   f4    …   fk-2   fk-1   fk   label
                  r1    …    …     …    …     …    …      …     …     +
                  r2    …    …     …    …     …    …      …     …      -
                  …     …    …     …    …     …    …      …     …      -
                  rN    …    …     …    …     …    …      …     …     +




                                                                             12
Feature selection
• F-score and Supported Sequential Forward Search (F_SSFS)
    – Supported sequential forward search (SSFS)
Iteration = 1




Iteration = n+1




 Termination



                                                      13
Feature selection
• F-score and Supported Sequential Forward Search (F_SSFS)
   – F_SSFS
      •   Uses the F-score measure to decide the best feature subsets
      •   Uses the SSFS algorithm to select the final best feature subsets
      •   Reduces the number of features that has to be tested through the training of SVM
      •   Reduces the unnecessary computation time spent on the testing of the “no-informative”
          features by wrapper method




                                                                                            14
Research design
• Data collection and preprocessing
   –   Prediction target : the direction of change in the daily NASDAQ index
   –   Index futures lead the spot index
   –   Using 30 technical indices as the whole features set
   –   20 future contracts, 9 spot indexes and 1-day lagged NASDAQ Index
   –   Use “1” and “-1” to denote the next day’s index is higher or lower than today’s
   –   From Nov 8, 2001 to Nov 8, 2007 with 1065 observations per feature
   –   The original data are scaled into the range of (0,1)


                    f1   f2    f3     …    …     f28   f29   f30   label
              1     …     …     …     …     …    …     …     …      1
              2     …     …     …     …     …    …     …     …      -1
              …     …     …     …     …     …    …     …     …      -1
            1065    …     …     …     …     …    …     …     …      1
                                                                                 15
Research design




                  16
Research design




                  17
SVM-based model with F_SSFS
                                    Original feature variables
Hybrid feature selection


                                          Filter part
                                 Feature pruning using F-score

                                                     Pre-selected K features

                                       Wrapper part
                           SSFS algorithm find best feature variables

                                                      Best Feature variables


   Data                                       SVM
                    Training , testing , evaluating the classification accuracy


                                                                                  18
Experimental results and analysis
• Experimental result of F_SSFS
   – Threshold K determines how many features we want to keep after filtering.
       • K is equal to the number of all original features  filter part does not contribute at all
       • K is equal to 1  the wrapper method is unnecessary




                                                                                                      19
Experimental results and analysis




                                    20
Experimental results and analysis
• Experimental result of F_SSFS – wrapper part
   – Choose K = 22, after the process of wrapper part
   – 17 features are left, average accuracy rate 81.7%




                                                         21
Experimental results and analysis




                                    22
Experimental results and analysis
• Experimental result of SVM




• Experimental result of BPNN




                                        23
Experimental results and analysis
• Experimental result of feature selection
   – Key deficiency of neural-network models for stock trend prediction
       • Difficulty in selecting the discriminative features
         and explaining the rationale for the stock trend prediction

   – Relative importance of each feature




                                                                          24
Experimental results and analysis
• Conclusion
  – Stock trend prediction
  – Support vector machine with hybrid feature selection method (F_SSFS)
  – Reducing high computational cost and the risk of over-fitting
  – Need to investigate to develop the optimal value of the parameters in SVM for
    the best prediction performance
  – Generalization of SVM on the basis of the appropriate level of the training set
    size and give a guideline to measure the generalization performance




                                                                             25
Ad

More Related Content

What's hot (18)

Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
Reza Ramezani
 
Nbvtalkonfeatureselection
NbvtalkonfeatureselectionNbvtalkonfeatureselection
Nbvtalkonfeatureselection
Nagasuri Bala Venkateswarlu
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
Upekha Vandebona
 
Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...
IJTET Journal
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...
IRJET Journal
 
Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)
SungdoGu
 
Optimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setOptimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature set
ijccmsjournal
 
Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)
Kishor Datta Gupta
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Lionel Briand
 
Automating System Test Case Classification and Prioritization for Use Case-Dr...
Automating System Test Case Classification and Prioritization for Use Case-Dr...Automating System Test Case Classification and Prioritization for Use Case-Dr...
Automating System Test Case Classification and Prioritization for Use Case-Dr...
Lionel Briand
 
Introduction to Item Response Theory
Introduction to Item Response TheoryIntroduction to Item Response Theory
Introduction to Item Response Theory
OpenThink Labs
 
Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document Ranking
Andrea Gigli
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
Mahesh Vadhavaniya profmjv
 
Testing foundations
Testing foundationsTesting foundations
Testing foundations
Neha Singh
 
Metamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web SystemsMetamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web Systems
Lionel Briand
 
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Kishor Datta Gupta
 
Recommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learningRecommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learning
Arithmer Inc.
 
Implementing Item Response Theory
Implementing Item Response TheoryImplementing Item Response Theory
Implementing Item Response Theory
Nathan Thompson
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
Reza Ramezani
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
Upekha Vandebona
 
Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...
IJTET Journal
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...
IRJET Journal
 
Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)
SungdoGu
 
Optimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setOptimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature set
ijccmsjournal
 
Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)
Kishor Datta Gupta
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Lionel Briand
 
Automating System Test Case Classification and Prioritization for Use Case-Dr...
Automating System Test Case Classification and Prioritization for Use Case-Dr...Automating System Test Case Classification and Prioritization for Use Case-Dr...
Automating System Test Case Classification and Prioritization for Use Case-Dr...
Lionel Briand
 
Introduction to Item Response Theory
Introduction to Item Response TheoryIntroduction to Item Response Theory
Introduction to Item Response Theory
OpenThink Labs
 
Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document Ranking
Andrea Gigli
 
Testing foundations
Testing foundationsTesting foundations
Testing foundations
Neha Singh
 
Metamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web SystemsMetamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web Systems
Lionel Briand
 
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Kishor Datta Gupta
 
Recommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learningRecommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learning
Arithmer Inc.
 
Implementing Item Response Theory
Implementing Item Response TheoryImplementing Item Response Theory
Implementing Item Response Theory
Nathan Thompson
 

Viewers also liked (7)

Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
Allen Wu
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
AllenWu
 
Co-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachCo-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approach
Allen Wu
 
Transfer learning in heterogeneous collaborative filtering domains
Transfer learning in heterogeneous collaborative filtering domainsTransfer learning in heterogeneous collaborative filtering domains
Transfer learning in heterogeneous collaborative filtering domains
Allen Wu
 
Medical data diagnosis
Medical data diagnosisMedical data diagnosis
Medical data diagnosis
Bhargav Srinivasan
 
Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...
Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...
Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...
Pluribus One
 
DEVELOPMENT OF INTELLIGENT PREDICTIVE MODEL FOR STOCK DATA PREDICTION WITH FE...
DEVELOPMENT OF INTELLIGENT PREDICTIVE MODEL FOR STOCK DATA PREDICTION WITH FE...DEVELOPMENT OF INTELLIGENT PREDICTIVE MODEL FOR STOCK DATA PREDICTION WITH FE...
DEVELOPMENT OF INTELLIGENT PREDICTIVE MODEL FOR STOCK DATA PREDICTION WITH FE...
Richa Handa
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
Allen Wu
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
AllenWu
 
Co-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachCo-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approach
Allen Wu
 
Transfer learning in heterogeneous collaborative filtering domains
Transfer learning in heterogeneous collaborative filtering domainsTransfer learning in heterogeneous collaborative filtering domains
Transfer learning in heterogeneous collaborative filtering domains
Allen Wu
 
Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...
Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...
Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...
Pluribus One
 
DEVELOPMENT OF INTELLIGENT PREDICTIVE MODEL FOR STOCK DATA PREDICTION WITH FE...
DEVELOPMENT OF INTELLIGENT PREDICTIVE MODEL FOR STOCK DATA PREDICTION WITH FE...DEVELOPMENT OF INTELLIGENT PREDICTIVE MODEL FOR STOCK DATA PREDICTION WITH FE...
DEVELOPMENT OF INTELLIGENT PREDICTIVE MODEL FOR STOCK DATA PREDICTION WITH FE...
Richa Handa
 
Ad

Similar to Using support vector machine with a hybrid feature selection method to the stock trend prediction (20)

few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
few common Feature of Size Datum Features are bores, cylinders, slots, or tab...few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
DrPArivalaganASSTPRO
 
6811067.ppt6811067.ppt6811067.ppt6811067.ppt
6811067.ppt6811067.ppt6811067.ppt6811067.ppt6811067.ppt6811067.ppt6811067.ppt6811067.ppt
6811067.ppt6811067.ppt6811067.ppt6811067.ppt
nagalakshmig4
 
A Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionA Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature Selection
Davide Nardone
 
Bu-Refresher course PRESENTATION NEW.pptx
Bu-Refresher course PRESENTATION NEW.pptxBu-Refresher course PRESENTATION NEW.pptx
Bu-Refresher course PRESENTATION NEW.pptx
srideviramaraj2
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8
Roger Barga
 
Feature Selections Methods
Feature Selections MethodsFeature Selections Methods
Feature Selections Methods
zahramojtahediin
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - Recommendation
Cataldo Musto
 
Performance Comparison of Binary Machine Learning Classifiers in Identifying ...
Performance Comparison of Binary Machine Learning Classifiers in Identifying ...Performance Comparison of Binary Machine Learning Classifiers in Identifying ...
Performance Comparison of Binary Machine Learning Classifiers in Identifying ...
University of Hawai‘i at Mānoa
 
Detection of Seed Methods for Quantification of Feature Confinement
Detection of Seed Methods for Quantification of Feature ConfinementDetection of Seed Methods for Quantification of Feature Confinement
Detection of Seed Methods for Quantification of Feature Confinement
Andrzej Olszak
 
Support Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random ForestSupport Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random Forest
umarcybermind
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
AlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in PythonAlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in Python
Mark Conway
 
AlphaPy
AlphaPyAlphaPy
AlphaPy
Robert Scott
 
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A RathoreGRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
CGIAR Generation Challenge Programme
 
Click Log Mining CS598
Click Log Mining CS598Click Log Mining CS598
Click Log Mining CS598
Shih-Wen Huang
 
Feature Selection.pdf
Feature Selection.pdfFeature Selection.pdf
Feature Selection.pdf
adarshbarnwal5
 
Artificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition systemArtificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition system
REHMAT ULLAH
 
Xavier amatriain, dir algorithms netflix m lconf 2013
Xavier amatriain, dir algorithms netflix m lconf 2013Xavier amatriain, dir algorithms netflix m lconf 2013
Xavier amatriain, dir algorithms netflix m lconf 2013
MLconf
 
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
Xavier Amatriain
 
Innovating with Cisco Contact Center Architectures
Innovating with Cisco Contact Center ArchitecturesInnovating with Cisco Contact Center Architectures
Innovating with Cisco Contact Center Architectures
Cisco Canada
 
few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
few common Feature of Size Datum Features are bores, cylinders, slots, or tab...few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
DrPArivalaganASSTPRO
 
6811067.ppt6811067.ppt6811067.ppt6811067.ppt
6811067.ppt6811067.ppt6811067.ppt6811067.ppt6811067.ppt6811067.ppt6811067.ppt6811067.ppt
6811067.ppt6811067.ppt6811067.ppt6811067.ppt
nagalakshmig4
 
A Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionA Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature Selection
Davide Nardone
 
Bu-Refresher course PRESENTATION NEW.pptx
Bu-Refresher course PRESENTATION NEW.pptxBu-Refresher course PRESENTATION NEW.pptx
Bu-Refresher course PRESENTATION NEW.pptx
srideviramaraj2
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8
Roger Barga
 
Feature Selections Methods
Feature Selections MethodsFeature Selections Methods
Feature Selections Methods
zahramojtahediin
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - Recommendation
Cataldo Musto
 
Performance Comparison of Binary Machine Learning Classifiers in Identifying ...
Performance Comparison of Binary Machine Learning Classifiers in Identifying ...Performance Comparison of Binary Machine Learning Classifiers in Identifying ...
Performance Comparison of Binary Machine Learning Classifiers in Identifying ...
University of Hawai‘i at Mānoa
 
Detection of Seed Methods for Quantification of Feature Confinement
Detection of Seed Methods for Quantification of Feature ConfinementDetection of Seed Methods for Quantification of Feature Confinement
Detection of Seed Methods for Quantification of Feature Confinement
Andrzej Olszak
 
Support Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random ForestSupport Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random Forest
umarcybermind
 
AlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in PythonAlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in Python
Mark Conway
 
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A RathoreGRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
CGIAR Generation Challenge Programme
 
Click Log Mining CS598
Click Log Mining CS598Click Log Mining CS598
Click Log Mining CS598
Shih-Wen Huang
 
Artificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition systemArtificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition system
REHMAT ULLAH
 
Xavier amatriain, dir algorithms netflix m lconf 2013
Xavier amatriain, dir algorithms netflix m lconf 2013Xavier amatriain, dir algorithms netflix m lconf 2013
Xavier amatriain, dir algorithms netflix m lconf 2013
MLconf
 
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
Xavier Amatriain
 
Innovating with Cisco Contact Center Architectures
Innovating with Cisco Contact Center ArchitecturesInnovating with Cisco Contact Center Architectures
Innovating with Cisco Contact Center Architectures
Cisco Canada
 
Ad

Recently uploaded (20)

Top Hyper-Casual Game Studio Services
Top  Hyper-Casual  Game  Studio ServicesTop  Hyper-Casual  Game  Studio Services
Top Hyper-Casual Game Studio Services
Nova Carter
 
Secondary Storage for a microcontroller system
Secondary Storage for a microcontroller systemSecondary Storage for a microcontroller system
Secondary Storage for a microcontroller system
fizarcse
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptxIn-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
aptyai
 
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Wonjun Hwang
 
Build With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdfBuild With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdf
Google Developer Group - Harare
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdfICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
Eryk Budi Pratama
 
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
Toru Tamaki
 
Computer Systems Quiz Presentation in Purple Bold Style (4).pdf
Computer Systems Quiz Presentation in Purple Bold Style (4).pdfComputer Systems Quiz Presentation in Purple Bold Style (4).pdf
Computer Systems Quiz Presentation in Purple Bold Style (4).pdf
fizarcse
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Christian Folini
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
Master Data Management - Enterprise Application Integration
Master Data Management - Enterprise Application IntegrationMaster Data Management - Enterprise Application Integration
Master Data Management - Enterprise Application Integration
Sherif Rasmy
 
Cybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft CertificateCybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft Certificate
VICTOR MAESTRE RAMIREZ
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More MachinesRefactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Leon Anavi
 
Top Hyper-Casual Game Studio Services
Top  Hyper-Casual  Game  Studio ServicesTop  Hyper-Casual  Game  Studio Services
Top Hyper-Casual Game Studio Services
Nova Carter
 
Secondary Storage for a microcontroller system
Secondary Storage for a microcontroller systemSecondary Storage for a microcontroller system
Secondary Storage for a microcontroller system
fizarcse
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptxIn-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
aptyai
 
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Wonjun Hwang
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdfICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
Eryk Budi Pratama
 
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
Toru Tamaki
 
Computer Systems Quiz Presentation in Purple Bold Style (4).pdf
Computer Systems Quiz Presentation in Purple Bold Style (4).pdfComputer Systems Quiz Presentation in Purple Bold Style (4).pdf
Computer Systems Quiz Presentation in Purple Bold Style (4).pdf
fizarcse
 
Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)Design pattern talk by Kaya Weers - 2025 (v2)
Design pattern talk by Kaya Weers - 2025 (v2)
Kaya Weers
 
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Crazy Incentives and How They Kill Security. How Do You Turn the Wheel?
Christian Folini
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
Master Data Management - Enterprise Application Integration
Master Data Management - Enterprise Application IntegrationMaster Data Management - Enterprise Application Integration
Master Data Management - Enterprise Application Integration
Sherif Rasmy
 
Cybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft CertificateCybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft Certificate
VICTOR MAESTRE RAMIREZ
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More MachinesRefactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Refactoring meta-rauc-community: Cleaner Code, Better Maintenance, More Machines
Leon Anavi
 

Using support vector machine with a hybrid feature selection method to the stock trend prediction

  • 1. Using support vector machine with a hybrid feature selection method to the stock trend prediction Ming-Chi Lee Expert Systems with Applications . 2009 Presenter: Yu Hsiang Huang Date: 2012-05-17 1
  • 2. Outline • Introduction • Feature selection • Research design • Experimental results and analysis • Conclusion 2
  • 3. Introduction • Stock market – Highly nonlinear dynamic system • Application of AI – Expert system , Fuzzy system, Neuron network – Back propagation neural network (BPNN) • Power of prediction is better than the others • Require a large amount of training data to estimate the distribution of input pattern • Over-fitting nature • Fully depends on researcher’s experience of knowledge to preprocess data – relevant input variables, hidden layer size, learning rate, momentum, etc. 3
  • 4. Introduction • In this paper – Support vector machine (SVM) • Captures geometric characteristics of feature space without deriving weights of networks from the training data. • Extracts the optimal solution with the small training set size • Local optimal solution vs. Global optimum solution • No over-fitting • Classification performance is influenced by dimension or number of feature variables – Feature selection • Addresses the dimensionality reduction problem by determining a subset of available features which is most essential for classification • Hybrid feature selection : Filter method + wrapper method  F_SSFS • F_SSFS : F-score + Supported sequential forward search • Optimal parameter search – Compare performance between BP and SVM 4
  • 5. SVM-based model with F_SSFS Original feature variables Hybrid feature selection Filter part Feature pruning using F-score Pre-selected features Wrapper part SSFS algorithm find best feature variables Best Feature variables Data SVM Training , testing , evaluating the classification accuracy 5
  • 6. Feature selection • Filter method : – No feed back from classifier – Estimate the classification performance by some indirect assessments • Distance : reflect how well the classes separate from each other No feedback from classifier Estimate the classification performance : distance 6
  • 8. Feature selection • F-score and Supported Sequential Forward Search (F_SSFS) – F-score Original feature variables Calculate F-score Sort F-score Select top K F-score feature 8
  • 9. SVM-based model with F_SSFS Original feature variables Hybrid feature selection Filter part Feature pruning using F-score Pre-selected features Wrapper part SSFS algorithm find best feature variables Best Feature variables Data SVM Training , testing , evaluating the classification accuracy 9
  • 10. Feature selection • Wrapper method: – Classifier-dependent • Evaluate the “goodness” of the selected feature subset directly (from classifier) • Should intuitively yield better performance – Have limit applications • Due to the high computational complexity involved Feedback from classifier 10
  • 11. Feature selection • F-score and Supported Sequential Forward Search (F_SSFS) – Supported sequential forward search (SSFS) • Play the role of wrapper • A variation of the sequential forward search (SFS) algorithm that is specially tailored to SVM to expedite the feature searching process • Support vector : training samples other than support vectors have no contribution to determine the decision boundary • Dynamically maintains an active subset as the candidates of the support vector • Training SVM using reduced subset rather than the entire training set - less computational cost 11
  • 12. Feature selection • F-score and Supported Sequential Forward Search (F_SSFS) – Supported sequential forward search (SSFS) f1 f2 f3 f4 … fk-2 fk-1 fk label r1 … … … … … … … … + r2 … … … … … … … … - … … … … … … … … … - rN … … … … … … … … + 12
  • 13. Feature selection • F-score and Supported Sequential Forward Search (F_SSFS) – Supported sequential forward search (SSFS) Iteration = 1 Iteration = n+1 Termination 13
  • 14. Feature selection • F-score and Supported Sequential Forward Search (F_SSFS) – F_SSFS • Uses the F-score measure to decide the best feature subsets • Uses the SSFS algorithm to select the final best feature subsets • Reduces the number of features that has to be tested through the training of SVM • Reduces the unnecessary computation time spent on the testing of the “no-informative” features by wrapper method 14
  • 15. Research design • Data collection and preprocessing – Prediction target : the direction of change in the daily NASDAQ index – Index futures lead the spot index – Using 30 technical indices as the whole features set – 20 future contracts, 9 spot indexes and 1-day lagged NASDAQ Index – Use “1” and “-1” to denote the next day’s index is higher or lower than today’s – From Nov 8, 2001 to Nov 8, 2007 with 1065 observations per feature – The original data are scaled into the range of (0,1) f1 f2 f3 … … f28 f29 f30 label 1 … … … … … … … … 1 2 … … … … … … … … -1 … … … … … … … … … -1 1065 … … … … … … … … 1 15
  • 18. SVM-based model with F_SSFS Original feature variables Hybrid feature selection Filter part Feature pruning using F-score Pre-selected K features Wrapper part SSFS algorithm find best feature variables Best Feature variables Data SVM Training , testing , evaluating the classification accuracy 18
  • 19. Experimental results and analysis • Experimental result of F_SSFS – Threshold K determines how many features we want to keep after filtering. • K is equal to the number of all original features  filter part does not contribute at all • K is equal to 1  the wrapper method is unnecessary 19
  • 21. Experimental results and analysis • Experimental result of F_SSFS – wrapper part – Choose K = 22, after the process of wrapper part – 17 features are left, average accuracy rate 81.7% 21
  • 23. Experimental results and analysis • Experimental result of SVM • Experimental result of BPNN 23
  • 24. Experimental results and analysis • Experimental result of feature selection – Key deficiency of neural-network models for stock trend prediction • Difficulty in selecting the discriminative features and explaining the rationale for the stock trend prediction – Relative importance of each feature 24
  • 25. Experimental results and analysis • Conclusion – Stock trend prediction – Support vector machine with hybrid feature selection method (F_SSFS) – Reducing high computational cost and the risk of over-fitting – Need to investigate to develop the optimal value of the parameters in SVM for the best prediction performance – Generalization of SVM on the basis of the appropriate level of the training set size and give a guideline to measure the generalization performance 25