Abstract:The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.
Abstract:Imaging through scattering media is encountered in many disciplines or sciences, ranging from biology, mesescopic physics and astronomy. But it is still a big challenge because light suffers from multiple scattering is such media and can be totally decorrelated. Here, we propose a deep-learning-based method that can retrieve the image of a target behind a thick scattering medium. The method uses a trained deep neural network to fit the way of mapping of objects at one side of a thick scattering medium to the corresponding speckle patterns observed at the other side. For demonstration, we retrieve the images of a set of objects hidden behind a 3mm thick white polystyrene slab, the optical depth of which is 13.4 times of the scattering mean free path. Our work opens up a new way to tackle the longstanding challenge by using the technique of deep learning.