Nemeth, Zsolt and Faulkner Rainford, Penn and Porter, Barry (2025) Revisiting the Fitness Landscape of Genetic Improvement for Source Code : A Phenotypic Speciation Approach. ACM Transactions on Evolutionary Learning and Optimization. ISSN 2688-3007
TELO_Revisiting_the_Fitness_Landscape_of_Genetic_Improvement_for_Source_Code_A_Phenotypic_Speciation_Approach_for_pure_OA_.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (2MB)
Abstract
Emergent software systems are composed of elementary building blocks, where many of those blocks have variations available which are better or worse in different deployment contexts. Genetic Improvement (GI) for source code has been proposed for creating and curating collections of such blocks, but the combination of new code synthesis with genetic mutation and crossover results in large, complex search spaces. A range of methods to aid such a search have been proposed, with the particular notion of species having appeared in the context of Genetic Algorithms (GAs) to identify individuals with similar genotypes for controlling competition, encouraging the exploration of distant local optima, maintaining diversity and avoiding premature convergence. In this paper we examine a species definition for GI for source code, a domain which has specific attributes: genotype similarity is largely irrelevant; distance between individuals is otherwise undefined; and the fitness landscape is extremely rugged. To support higher levels of explainability, and the ability to find novelty in the search space, we propose a phenotypic species definition that captures an algorithm's functional phenotypic characteristics, while excluding its non-functional phenotypic characteristics (and its particular representation in source code). We introduce our proposal in a GI for a hash table scenario, where species are characterised by divergence in probability distributions.