Abstract:The root raised-cosine pulse commonly used in linear digital modulations yields exactly two intersymbol interference components from the preceding and the subsequent data symbols, provided that the roll-off factor is $100\%$ and the modulation packing factor is set to $50\%$. This can be exploited to symmetrically multiplex two data streams of transmitted symbols. Hence, the proposed scheme is referred to as pulse-shape binary multiplex modulation. The demodulation of the two multiplexed data streams at the receiver can be aided by making the streams mutually orthogonal. It can be achieved by superposition modulation with symbol-by-symbol interference cancellation, proper design of transmission sequences interleaving pilot and data symbols in order to also enable channel estimation, and using orthogonal spreading sequences. The presented numerical results indicate that the proposed modulation scheme can outperform Nyquist signaling in terms of transmission reliability or the time required for transmitting the whole sequence of data symbols. For instance, differentially encoded modulation symbols can be transmitted twice as fast by the proposed modulation scheme with a 3 dB penalty in signal-to-noise ratio over additive white Gaussian noise channels.
Abstract:Biomedical research is intensive in processing information in the previously published papers. This motivated a lot of efforts to provide tools for text mining and information extraction from PDF documents over the past decade. The *nix (Unix/Linux) operating systems offer many tools for working with text files, however, very few such tools are available for processing the contents of PDF files. This paper reports our effort to develop shell script utilities for *nix systems with the core functionality focused on viewing and searching multiple PDF documents combining logical and regular expressions, and enabling more reliable text extraction from PDF documents with subsequent manipulation of the resulting blocks of text. Furthermore, a procedure for extracting the most frequently occurring multi-word phrases was devised and then demonstrated on several scientific papers in life sciences. Our experiments revealed that the procedure is surprisingly robust to deficiencies in text extraction and the actual scoring function used to rank the phrases in terms of their importance or relevance. The keyword relevance is strongly context dependent, the word stemming did not provide any recognizable advantage, and the stop-words should only be removed from the beginning and the end of phrases. In addition, the developed utilities were used to convert the list of acronyms and the index from a PDF e-book into a large list of biochemical terms which can be exploited in other text mining tasks. All shell scripts and data files are available in a public repository named \pp\ on the Github. The key lesson learned in this work is that semi-automated methods combining the power of algorithms with the capabilities of research experience are the most promising for improving the research efficiency.
Abstract:Estimating hidden processes from non-linear noisy observations is particularly difficult when the parameters of these processes are not known. This paper adopts a machine learning approach to devise variational Bayesian inference for such scenarios. In particular, a random process generated by the autoregressive moving average (ARMA) linear model is inferred from non-linearity noise observations. The posterior distribution of hidden states are approximated by a set of weighted particles generated by the sequential Monte carlo (SMC) algorithm involving sampling with importance sampling resampling (SISR). Numerical efficiency and estimation accuracy of the proposed inference method are evaluated by computer simulations. Furthermore, the proposed inference method is demonstrated on a practical problem of estimating the missing values in the gene expression time series assuming vector autoregressive (VAR) data model.