Summary
Competing Cognitive Radio Network (CCRN) coalesces communicator (comm) nodes and jammers to achieve maximal networking efficiency in the presence of adversarial threats. We have previously developed two contrasting approaches for CCRN based on multi-armed bandit (MAB) and Qlearning. Despite their differences, both approaches have shown to achieve optimal throughput performance. This paper addresses a harder class of problems where channel rewards are time-varying such that learning based on stochastic assumptions cannot guarantee the optimal performance. This new problem is important because an intelligent adversary will likely introduce dynamic changepoints, which can make our previous approaches ineffective. We propose a new, faster learning algorithm using online convex programming that is computationally simpler and stateless. According to our empirical results, the new algorithm can almost instantly find an optimal strategy that achieves the best steady-state channel rewards.