LightSeq is a high performance inference library for sequence processing and generation implemented in CUDA. To our best knowledge, this is the first open-source inference library which fully supports highly efficient computation of modern NLP models such as BERT, GPT, Transformer, etc. This library is efficient, functional and convenient. A demo usage can be found here: https://github.com/bytedance/lightseq/blob/master/example.

Title:LightSeq: A High Performance Inference Library for Sequence Processing and Generation

Paper and Code