Symbolic Regression

This is a Tutorial on symbolic regression with RGP via the formula interface. A basic understanding of RGP is assumed. Please visit Getting Started for a quick introduction to RGP fundamentals.

Step 1: Preparing Data

Assuming you got a dataset you want to run symbolic regression with, the first thing you need is a R data frame. For this example, we use a simple artificially generated data frame.

  x1 <- seq(0, 4*pi, length.out=201)
  x2 <- seq(0, 4*pi, length.out=201)
  y <- sin(x1) + cos(2*x2)

To define the data frame, call the data.frame() function:

data1 <- data.frame(y=y, x1=x1, x2=x2)

Step 2: Defining Function Sets

You can use one of the predefined function sets, for example:
arithmeticFunctionSet (+, -, *, /),
expLogFunctionSet (sqrt, exp, ln),
trigonometricFunctionSet (sin, cos, tan),
mathFunctionSet (all above) or define your own set.

To define your own set use the functionSet command.

newFuncSet <- functionSet("+","-","*","sin")

Step 3: Performing the Symbolic Regression Run

To execute the regression use the symbolicRegression command:

result1 <- symbolicRegression(y ~ x1 + x2, 
data=data1, functionSet=mathFunctionSet,
stopCondition=makeStepsStopCondition(2000))

The R formula y ~ x1 + x2 the target variable and the dependencies are set.

Step 4: Obtaining Results and Plots

To plot the result against the original data use:

plot(data1$y, col=1, type="l"); points(predict(result1, newdata = data1), col=2, type="l")

The best and worst individual can be shown using these commands:

bf <- result1$population[[which.min(sapply(result1$population, result1$fitnessFunction))]]
wf <- result1$population[[which.max(sapply(result1$population, result1$fitnessFunction))]]

Next Steps

See the other Tutorials to learn about other application domains of RGP.