Summary
A model of strategy formulation is used to study how an adaptive attacker learns to overcome a moving target cyber defense. The attacker-defender interaction is modeled as a game in which a defender deploys a temporal platform migration defense. Against this defense, a population of attackers develop strategies specifying the temporal ordering of resource investments that bring targeted zero-day exploits into existence. Attacker response to two defender temporal platform migration scheduling policies are examined. In the first defender scheduling policy, the defender selects the active platform in each match uniformly at random from a pool of available platforms. In the second policy the defender schedules each successive platform to maximize the diversity of the source code presented to the attacker. Adaptive attacker response strategies are modeled by finite state machine (FSM) constructs that evolve during simulated play against defender strategies via an evolutionary algorithm. It is demonstrated that the attacker learns to invest heavily in exploit creation for the platform with the least similarity to other platforms when faced with a diversity defense, while avoiding investment in exploits for this least similar platform when facing a randomization defense. Additionally, it is demonstrated that the diversity-maximizing defense is superior for shorter duration attacker-defender engagements, but performs sub-optimally in extended attacker-defender interactions.