Cognitive Radar Networks were proposed by Simon Haykin in 2006 to address problems with large legacy radar implementations - primarily, single-point vulnerabilities and lack of adaptability. This work proposes to leverage the adaptability of cognitive radar networks to trade between active radar observation, which uses high power and risks interception, and passive signal parameter estimation, which uses target emissions to gain side information and lower the power necessary to accurately track multiple targets. The goal of the network is to learn over many target tracks both the characteristics of the targets as well as the optimal action choices for each type of target. In order to select between the available actions, we utilize a multi-armed bandit model, using current class information as prior information. When the active radar action is selected, the node estimates the physical behavior of targets through the radar emissions. When the passive action is selected, the node estimates the radio behavior of targets through passive sensing. Over many target tracks, the network collects the observed behavior of targets and forms clusters of similarly-behaved targets. In this way, the network meta-learns the target class distributions while learning the optimal mode selections for each target class.