Exporting `gplearn` Output as a Readable Expression
`gplearn` is a powerful library for genetic programming in Python. While it excels at finding optimal solutions, its output often comes in the form of a complex tree structure, making it difficult to understand and interpret. This article outlines methods to convert `gplearn` output into more readable formats, specifically focusing on exporting as a SymPy expression and other human-friendly representations.
1. Using `sympy.parsing.mathematica`
SymPy, a symbolic mathematics library, provides a way to parse mathematical expressions from various formats. We can leverage its `mathematica` parser to convert a `gplearn` tree into a SymPy expression:
from gplearn.genetic import SymbolicRegressor from sympy.parsing.mathematica import parse_mathematica # Define your features features = ['x1', 'x2', 'x3'] # Create a SymbolicRegressor est = SymbolicRegressor(population_size=1000, generations=20) est.fit(X_train, y_train) # Get the best individual best_program = est._program # Convert the best program to a SymPy expression sympy_expression = parse_mathematica(best_program.to_string()) print(sympy_expression) |
This code will print the best program found by `gplearn` as a SymPy expression, making it easier to analyze and manipulate.
2. Custom Function for Converting to String Representation
To have greater control over the format of the output, a custom function can be implemented to traverse the `gplearn` tree and generate a string representation.
def to_string(program): if program.function == '__add__': return f'{to_string(program.left)} + {to_string(program.right)}' elif program.function == '__sub__': return f'{to_string(program.left)} - {to_string(program.right)}' elif program.function == '__mul__': return f'{to_string(program.left)} * {to_string(program.right)}' elif program.function == '__div__': return f'{to_string(program.left)} / {to_string(program.right)}' elif program.function == '__pow__': return f'{to_string(program.left)} ** {to_string(program.right)}' elif program.function in ['sin', 'cos', 'tan']: return f'{program.function}({to_string(program.left)})' else: return program.function if program.function in features else f'x{program.function}' string_representation = to_string(best_program) print(string_representation) |
This custom function recursively traverses the tree, replacing each node with its corresponding mathematical operator or variable. This code prints the best program in a human-readable format.
3. Visualizing the Expression Tree
In addition to textual representations, you can visualize the expression tree to gain a better understanding of its structure. Libraries like `graphviz` provide this functionality.
import graphviz def to_dot(program): dot = graphviz.Digraph() dot.node(str(id(program)), program.function) if hasattr(program, 'left'): dot.node(str(id(program.left)), program.left.function) dot.edge(str(id(program)), str(id(program.left))) to_dot(program.left, dot) if hasattr(program, 'right'): dot.node(str(id(program.right)), program.right.function) dot.edge(str(id(program)), str(id(program.right))) to_dot(program.right, dot) return dot dot = to_dot(best_program) dot.render('best_program_tree.gv', view=True) |
This code generates a Graphviz dot file that visually represents the expression tree of the best program. When rendered, it provides an intuitive understanding of the program’s structure and the relationships between its nodes.
Summary
By utilizing methods like SymPy’s parsing capabilities, custom functions, or visualization tools, we can effectively transform the output of `gplearn` into readable and analyzable forms. This allows for better understanding, debugging, and further development of the generated programs.