Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Diogo Almeida

Training language models to follow instructions with human feedback

Mar 04, 2022

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray(+10 more)

Figure 1 for Training language models to follow instructions with human feedback

Figure 2 for Training language models to follow instructions with human feedback

Figure 3 for Training language models to follow instructions with human feedback

Figure 4 for Training language models to follow instructions with human feedback

Abstract:Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.

Via

Access Paper or Ask Questions

A Generalizable Approach to Learning Optimizers

Jun 07, 2021

Diogo Almeida, Clemens Winter, Jie Tang, Wojciech Zaremba

Figure 1 for A Generalizable Approach to Learning Optimizers

Figure 2 for A Generalizable Approach to Learning Optimizers

Figure 3 for A Generalizable Approach to Learning Optimizers

Figure 4 for A Generalizable Approach to Learning Optimizers

Abstract:A core issue with learning to optimize neural networks has been the lack of generalization to real world problems. To address this, we describe a system designed from a generalization-first perspective, learning to update optimizer hyperparameters instead of model parameters directly using novel features, actions, and a reward function. This system outperforms Adam at all neural network tasks including on modalities not seen during training. We achieve 2x speedups on ImageNet, and a 2.5x speedup on a language modeling task using over 5 orders of magnitude more compute than the training tasks.

Via

Access Paper or Ask Questions

A Methodology for Approaching the Integration of Complex Robotics Systems Illustrated through a Bi-manual Manipulation Case-Study

Mar 18, 2021

Pavlos Triantafyllou, Rafael Afonso Rodrigues, Sirapoab Chaikunsaeng, Diogo Almeida, Graham Deacon, Jelizaveta Konstantinova, Giuseppe Cotugno

Figure 1 for A Methodology for Approaching the Integration of Complex Robotics Systems Illustrated through a Bi-manual Manipulation Case-Study

Figure 2 for A Methodology for Approaching the Integration of Complex Robotics Systems Illustrated through a Bi-manual Manipulation Case-Study

Figure 3 for A Methodology for Approaching the Integration of Complex Robotics Systems Illustrated through a Bi-manual Manipulation Case-Study

Figure 4 for A Methodology for Approaching the Integration of Complex Robotics Systems Illustrated through a Bi-manual Manipulation Case-Study

Abstract:The multidisciplinarity of robotics creates a need for robust integration methodologies that can facilitate the adoption of state-of-the-art research components in an industrial application. Unfortunately, there are no clear, community accepted guidelines or standards that define the integration of such components in a single robotic system. In this paper, we propose a methodology that assesses the software components of a candidate system on the basis of the effort required to integrate them and the impact their integration will have on a target system. We demonstrate how this methodology can be applied using an industrial tool packing system as an example. The system integrates a wide range of both in-house and third-party research outputs and software components. We prove the effectiveness of our approach by evaluating system performance with an experimental benchmark that assesses the robustness, reliability and operational speed of the system for the given packing task. We also demonstrate how our methodology can be used to predict the amount of integration time required for a component. The proposed integration methodology can be applied to any robotic system to facilitate its transition from the research to an industrial environment.

Via

Access Paper or Ask Questions

Asymmetric Dual-Arm Task Execution using an Extended Relative Jacobian

May 17, 2019

Diogo Almeida, Yiannis Karayiannidis

Figure 1 for Asymmetric Dual-Arm Task Execution using an Extended Relative Jacobian

Figure 2 for Asymmetric Dual-Arm Task Execution using an Extended Relative Jacobian

Figure 3 for Asymmetric Dual-Arm Task Execution using an Extended Relative Jacobian

Figure 4 for Asymmetric Dual-Arm Task Execution using an Extended Relative Jacobian

Abstract:Coordinated dual-arm manipulation tasks can be broadly characterized as possessing absolute and relative motion components. Relative motion tasks, in particular, are inherently redundant in the way they can be distributed between end-effectors. In this work, we analyse cooperative manipulation in terms of the asymmetric resolution of relative motion tasks. We discuss how existing approaches enable the asymmetric execution of a relative motion task, and show how an asymmetric relative motion space can be defined. We leverage this result to propose an extended relative Jacobian to model the cooperative system, which allows a user to set a concrete degree of asymmetry in the task execution. This is achieved without the need for prescribing an absolute motion target. Instead, the absolute motion remains available as a functional redundancy to the system. We illustrate the properties of our proposed Jacobian through numerical simulations of a novel differential Inverse~Kinematics algorithm.

* Submitted to ISRR19. 16 Pages

Via

Access Paper or Ask Questions

A Lyapunov-Based Approach to Exploit Asymmetries in Robotic Dual-Arm Task Resolution

May 03, 2019

Diogo Almeida, Yiannis Karayiannidis

Figure 1 for A Lyapunov-Based Approach to Exploit Asymmetries in Robotic Dual-Arm Task Resolution

Figure 2 for A Lyapunov-Based Approach to Exploit Asymmetries in Robotic Dual-Arm Task Resolution

Figure 3 for A Lyapunov-Based Approach to Exploit Asymmetries in Robotic Dual-Arm Task Resolution

Figure 4 for A Lyapunov-Based Approach to Exploit Asymmetries in Robotic Dual-Arm Task Resolution

Abstract:Dual-arm manipulation tasks can be prescribed to a robotic system in terms of desired absolute and relative motion of the robot's end-effectors. These can represent, e.g., jointly carrying a rigid object or performing an assembly task. When both types of motion are to be executed concurrently, the symmetric distribution of the relative motion between arms prevents task conflicts. Conversely, an asymmetric solution to the relative motion task will result in conflicts with the absolute task. In this work, we address the problem of designing a control law for the absolute motion task together with updating the distribution of the relative task among arms. Through a set of numerical results, we contrast our approach with the classical symmetric distribution of the relative motion task to illustrate the advantages of our method.

* Submitted to CDC 2019

Via

Access Paper or Ask Questions

Towards Blended Reactive Planning and Acting using Behavior Trees

Sep 14, 2018

Michele Colledanchise, Diogo Almeida, Petter Ögren

Figure 1 for Towards Blended Reactive Planning and Acting using Behavior Trees

Figure 2 for Towards Blended Reactive Planning and Acting using Behavior Trees

Figure 3 for Towards Blended Reactive Planning and Acting using Behavior Trees

Figure 4 for Towards Blended Reactive Planning and Acting using Behavior Trees

Abstract:In this paper, we show how a planning algorithm can be used to automatically create and update a Behavior Tree (BT), controlling a robot in a dynamic environment. The planning part of the algorithm is based on the idea of back chaining. Starting from a goal condition we iteratively select actions to achieve that goal, and if those actions have unmet preconditions, they are extended with actions to achieve them in the same way. The fact that BTs are inherently modular and reactive makes the proposed solution blend acting and planning in a way that enables the robot to efficiently react to external disturbances. If an external agent undoes an action the robot reexecutes it without re-planning, and if an external agent helps the robot, it skips the corresponding actions, again without replanning. We illustrate our approach in two different robotics scenarios.

Via

Access Paper or Ask Questions

Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units

Jul 19, 2016

Wenling Shang, Kihyuk Sohn, Diogo Almeida, Honglak Lee

Figure 1 for Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units

Figure 2 for Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units

Figure 3 for Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units

Figure 4 for Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units

Abstract:Recently, convolutional neural networks (CNNs) have been used as a powerful tool to solve many problems of machine learning and computer vision. In this paper, we aim to provide insight on the property of convolutional neural networks, as well as a generic method to improve the performance of many CNN architectures. Specifically, we first examine existing CNN models and observe an intriguing property that the filters in the lower layers form pairs (i.e., filters with opposite phase). Inspired by our observation, we propose a novel, simple yet effective activation scheme called concatenated ReLU (CRelu) and theoretically analyze its reconstruction property in CNNs. We integrate CRelu into several state-of-the-art CNN architectures and demonstrate improvement in their recognition performance on CIFAR-10/100 and ImageNet datasets with fewer trainable parameters. Our results suggest that better understanding of the properties of CNNs can lead to significant performance improvement with a simple modification.

* ICML 2016

Via

Access Paper or Ask Questions

Genetic Architect: Discovering Genomic Structure with Learned Neural Architectures

May 23, 2016

Laura Deming, Sasha Targ, Nate Sauder, Diogo Almeida, Chun Jimmie Ye

Figure 1 for Genetic Architect: Discovering Genomic Structure with Learned Neural Architectures

Figure 2 for Genetic Architect: Discovering Genomic Structure with Learned Neural Architectures

Figure 3 for Genetic Architect: Discovering Genomic Structure with Learned Neural Architectures

Figure 4 for Genetic Architect: Discovering Genomic Structure with Learned Neural Architectures

Abstract:Each human genome is a 3 billion base pair set of encoding instructions. Decoding the genome using deep learning fundamentally differs from most tasks, as we do not know the full structure of the data and therefore cannot design architectures to suit it. As such, architectures that fit the structure of genomics should be learned not prescribed. Here, we develop a novel search algorithm, applicable across domains, that discovers an optimal architecture which simultaneously learns general genomic patterns and identifies the most important sequence motifs in predicting functional genomic outcomes. The architectures we find using this algorithm succeed at using only RNA expression data to predict gene regulatory structure, learn human-interpretable visualizations of key sequence motifs, and surpass state-of-the-art results on benchmark genomics challenges.

* 10 pages, 4 figures

Via

Access Paper or Ask Questions

Folding Assembly by Means of Dual-Arm Robotic Manipulation

Apr 22, 2016

Diogo Almeida, Yiannis Karayiannidis

Figure 1 for Folding Assembly by Means of Dual-Arm Robotic Manipulation

Figure 2 for Folding Assembly by Means of Dual-Arm Robotic Manipulation

Figure 3 for Folding Assembly by Means of Dual-Arm Robotic Manipulation

Figure 4 for Folding Assembly by Means of Dual-Arm Robotic Manipulation

Abstract:In this paper, we consider folding assembly as an assembly primitive suitable for dual-arm robotic assembly, that can be integrated in a higher level assembly strategy. The system composed by two pieces in contact is modelled as an articulated object, connected by a prismatic-revolute joint. Different grasping scenarios were considered in order to model the system, and a simple controller based on feedback linearisation is proposed, using force torque measurements to compute the contact point kinematics. The folding assembly controller has been experimentally tested with two sample parts, in order to showcase folding assembly as a viable assembly primitive.

* 7 pages, accepted for ICRA 2016

Via

Access Paper or Ask Questions

Resnet in Resnet: Generalizing Residual Architectures

Mar 25, 2016

Sasha Targ, Diogo Almeida, Kevin Lyman

Figure 1 for Resnet in Resnet: Generalizing Residual Architectures

Figure 2 for Resnet in Resnet: Generalizing Residual Architectures

Figure 3 for Resnet in Resnet: Generalizing Residual Architectures

Figure 4 for Resnet in Resnet: Generalizing Residual Architectures

Abstract:Residual networks (ResNets) have recently achieved state-of-the-art on challenging computer vision tasks. We introduce Resnet in Resnet (RiR): a deep dual-stream architecture that generalizes ResNets and standard CNNs and is easily implemented with no computational overhead. RiR consistently improves performance over ResNets, outperforms architectures with similar amounts of augmentation on CIFAR-10, and establishes a new state-of-the-art on CIFAR-100.

Via

Access Paper or Ask Questions