Abstract:An experienced human Observer reading a document -- such as a crime report -- creates a succinct plot-like $\textit{``Working Memory''}$ comprising different actors, their prototypical roles and states at any point, their evolution over time based on their interactions, and even a map of missing Semantic parts anticipating them in the future. $\textit{An equivalent AI Observer currently does not exist}$. We introduce the $\textbf{[G]}$enerative $\textbf{[S]}$emantic $\textbf{[W]}$orkspace (GSW) -- comprising an $\textit{``Operator''}$ and a $\textit{``Reconciler''}$ -- that leverages advancements in LLMs to create a generative-style Semantic framework, as opposed to a traditionally predefined set of lexicon labels. Given a text segment $C_n$ that describes an ongoing situation, the $\textit{Operator}$ instantiates actor-centric Semantic maps (termed ``Workspace instance'' $\mathcal{W}_n$). The $\textit{Reconciler}$ resolves differences between $\mathcal{W}_n$ and a ``Working memory'' $\mathcal{M}_n^*$ to generate the updated $\mathcal{M}_{n+1}^*$. GSW outperforms well-known baselines on several tasks ($\sim 94\%$ vs. FST, GLEN, BertSRL - multi-sentence Semantics extraction, $\sim 15\%$ vs. NLI-BERT, $\sim 35\%$ vs. QA). By mirroring the real Observer, GSW provides the first step towards Spatial Computing assistants capable of understanding individual intentions and predicting future behavior.
Abstract:Despite growing interest in object detection, very few works address the extremely practical problem of cross-domain robustness especially for automative applications. In order to prevent drops in performance due to domain shift, we introduce an unsupervised domain adaptation method built on the foundation of faster-RCNN with two domain adaptation components addressing the shift at the instance and image levels respectively and apply a consistency regularization between them. We also introduce a family of adaptation layers that leverage the squeeze excitation mechanism called SE Adaptors to improve domain attention and thus improves performance without any prior requirement of knowledge of the new target domain. Finally, we incorporate a center loss in the instance and image level representations to improve the intra-class variance. We report all results with Cityscapes as our source domain and Foggy Cityscapes as the target domain outperforming previous baselines.