Abstract:We introduce a group of related methods for binary classification tasks using probes of the hidden state activations in large language models (LLMs). Performance is on par with the largest and most advanced LLMs currently available, but requiring orders of magnitude fewer computational resources and not requiring labeled data. This approach involves translating class labels into a semantically rich description, spontaneous symmetry breaking of multilayer perceptron probes for unsupervised learning and inference, training probes to generate confidence scores (prior probabilities) from hidden state activations subject to known constraints via entropy maximization, and selecting the most confident probe model from an ensemble for prediction. These techniques are evaluated on four datasets using five base LLMs.
Abstract:A new approach to data compression is developed and applied to multimedia content. This method separates messages into components suitable for both lossless coding and 'lossy' or statistical coding techniques, compressing complex objects by separately encoding signals and noise. This is demonstrated by compressing the most significant bits of data exactly, since they are typically redundant and compressible, and either fitting a maximally likely noise function to the residual bits or compressing them using lossy methods. Upon decompression, the significant bits are decoded and added to a noise function, whether sampled from a noise model or decompressed from a lossy code. This results in compressed data similar to the original. For many test images, a two-part image code using JPEG2000 for lossy coding and PAQ8l for lossless coding produces less mean-squared error than an equal length of JPEG2000. Computer-generated images typically compress better using this method than through direct lossy coding, as do many black and white photographs and most color photographs at sufficiently high quality levels. Examples applying the method to audio and video coding are also demonstrated. Since two-part codes are efficient for both periodic and chaotic data, concatenations of roughly similar objects may be encoded efficiently, which leads to improved inference. Applications to artificial intelligence are demonstrated, showing that signals using an economical lossless code have a critical level of redundancy which leads to better description-based inference than signals which encode either insufficient data or too much detail.
Abstract:The theoretical limits of 'lossy' data compression algorithms are considered. The complexity of an object as seen by a macroscopic observer is the size of the perceptual code which discards all information that can be lost without altering the perception of the specified observer. The complexity of this macroscopically observed state is the simplest description of any microstate comprising that macrostate. Inference and pattern recognition based on macrostate rather than microstate complexities will take advantage of the complexity of the macroscopic observer to ignore irrelevant noise.