0% found this document useful (0 votes)
79 views

Symbolic Regression

Symbolic regression is a machine learning technique that aims to identify an underlying mathematical expression that best describes a relationship between variables in a dataset. It uses genetic programming and evolutionary algorithms to discover non-linear relationships without imposing a predetermined model. The algorithms generate a population of possible solutions and evolve them over multiple generations using processes like crossover and mutation to find the expression with the best fit to the data. An example shows symbolic regression correctly determining the function x^4 + x^3 + x^2 + x from sample data points without being told the target function beforehand.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Symbolic Regression

Symbolic regression is a machine learning technique that aims to identify an underlying mathematical expression that best describes a relationship between variables in a dataset. It uses genetic programming and evolutionary algorithms to discover non-linear relationships without imposing a predetermined model. The algorithms generate a population of possible solutions and evolve them over multiple generations using processes like crossover and mutation to find the expression with the best fit to the data. An example shows symbolic regression correctly determining the function x^4 + x^3 + x^2 + x from sample data points without being told the target function beforehand.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Symbolic Regression

Subhashree Rautray

Group :3540901/91701
School, Institute :Peter the Great St. Petersburg Polytechnic University
1
Introduction

• Symbolic regression is a machine learning technique that aims to identify an


underlying mathematical expression that best describes a relationship.

• Term "symbolic regression" (SR) represents process during which are measured
data fitted by suitable mathematical formula like x2 + C etc .

• So the goal of this to generate a function that describes a given set of datapoints
means to discover hidden , non-linear relationships in your data.

2
Evolutionary Algorithms

• Symbolic regression is in fact based on existence of so called evolutionary


algorithms.

• With Genetic Programming, this is regression without a model. It will discover the
model from the data during the analysis. Compared to symbolic regression, the other
classic regression techniques are more like optimizations since they are just trying to
find the best coefficients to the variables that are predetermined.

3
Evolutionary Algorithms

• This class of algorithms is based on Darwinian theory of evolution and one of


its main attributes is that there is no calculated only one solution, but a class
of possible solutions at once.

• This class of possible and acceptable solutions is called "population".


Members of this populations are called "individuals" and mathematically said,
they represent possible solution, i.e. solution which can be realised in real
world application.

4
Conventional vs Symbolic Regression

• Regression Analysis is a process for modeling the relationship between


variables. There are many types of regression such as Linear regression,
Logistic regression, Quantile regression, Ridge regression, and more. Each
type conforms to a specific model such as linear v.s. logistic. The biggest part
of using regression analysis is knowing what model to use.

• While conventional regression techniques seek to optimize the parameters for


a pre-specified model structure, symbolic regression avoids imposing prior
assumptions, and instead infers the model from the data. In other words, it
attempts to discover both model structures and model parameters.

5
How Does Symbolic Regression Work?

• Using Genetic Programming, we first determine what mathematical functions


we want to use, (e.g. +, -, *, /, sin(), cos(), ln(), etc.).

• Then, we determine the terminals, which are our variables and constants,
(e.g. x, e, PI, {1..5}). These serve as our building blocks for constructing the
GP parse trees.

• The Genetic Programming algorithm will discover the function that best fits
the data using an error function such as root mean square as part of the
fitness function.

6
Example

• Simple example of symbolic regression


• The function : x^4 + x^3 + x^2 + x

• For the parameters :


• Function Set: +,-,*,/,sin,cos,ln
• Terminal Set: X
• Generations: 150
• Population Size: 500
• Crossover rate: 90%
• Mutation rate: 10%
• Fitness cases: 20 data points from the target function, evenly sampled from the interval [-
1,1].

7
Example

• Log output of first generation :

• This simplify to (X^3)+X

8
Example

• At around generation 39, we get our best of run individual, hitting all 20 fitness cases with a very small
overall error.

• This is not perfect solution but pretty close .As it determine on its own a function that matches the data 9
,approximately .
Example 2

• Facebook metrics data set

• data set called Facebook metrics from the UCI Machine Learning Repository.
• It has been created based on an undisclosed cosmetics brand Facebook page

• The target Total Interactions is a sum of all likes, shares and comments a given post got
after it was published.

• To keep things simple, only the default binary operations are enabled: add, sub, mul, div.
The fittest solution after 20 generations is the following.

10
Output : Tree Structure

11

You might also like