Abstract:We present and experimentally evaluate using transfer learning to address experimental data scarcity when training neural network (NN) models for Mach-Zehnder interferometer mesh-based optical matrix multipliers. Our approach involves pre-training the model using synthetic data generated from a less accurate analytical model and fine-tuning with experimental data. Our investigation demonstrates that this method yields significant reductions in modeling errors compared to using an analytical model, or a standalone NN model when training data is limited. Utilizing regularization techniques and ensemble averaging, we achieve < 1 dB root-mean-square error on the matrix weights implemented by a photonic chip while using only 25% of the available data.
Abstract:We demonstrate transfer learning-assisted neural network models for optical matrix multipliers with scarce measurement data. Our approach uses <10\% of experimental data needed for best performance and outperforms analytical models for a Mach-Zehnder interferometer mesh.
Abstract:Photonic integrated circuits are facilitating the development of optical neural networks, which have the potential to be both faster and more energy efficient than their electronic counterparts since optical signals are especially well-suited for implementing matrix multiplications. However, accurate programming of photonic chips for optical matrix multiplication remains a difficult challenge. Here, we describe both simple analytical models and data-driven models for offline training of optical matrix multipliers. We train and evaluate the models using experimental data obtained from a fabricated chip featuring a Mach-Zehnder interferometer mesh implementing 3-by-3 matrix multiplication. The neural network-based models outperform the simple physics-based models in terms of prediction error. Furthermore, the neural network models are also able to predict the spectral variations in the matrix weights for up to 100 frequency channels covering the C-band. The use of neural network models for programming the chip for optical matrix multiplication yields increased performance on multiple machine learning tasks.
Abstract:We experimentally compare simple physics-based vs. data-driven neural-network-based models for offline training of programmable photonic chips using Mach-Zehnder interferometer meshes. The neural-network model outperforms physics-based models for a chip with thermal crosstalk, yielding increased testing accuracy.