Internship: Learning to grasp with visual guidance


The internship will take place at Inria Grenoble and will be supervised by Cordelia Schmid, Inria research director and head of the Thoth team, and Alexander Pashevich, a PhD student in the Thoth team. Grenoble lies in the French Alpes and offers ideal conditions for skiing, hiking, climbing, etc.


Grasping is the problem of finding the best candidate grasping configuration for an object with an under-actuated robotic system and in the presence of semantic constraints. A wide range of techniques were proposed in the literature spanning from common-sense heuristic to utilizing recent advances in the field of deep learning [1]. Classical approaches usually rely on the knowledge of the exact object geometry and theory of optimal control. These methods are specific for a given object [2]. More recently, approaches for grasping are being built on advances in the field of machine learning and, especially, deep learning [3]. Such methods differ from analytical ones in terms of how grasp candidates are sampled from the infinite candidate grasping space and how they are evaluated. However, most of these methods still rely either on careful hand-engineering [4] or large and expensive databases collection [5,6].

Figure: Example of a robot (kuka iiwa arm and WSG-50 gripper) performing tower building task in a simulated environment.

Reinforcement learning provides a framework for successive decision making where an agent is trained by trial and error. This means that once successfully applied to grasping, the reinforcement learning approaches can avoid the drawbacks of the existing methods. The recent progress of deep reinforcement learning [7] and its ability to tackle challenging problems of continuous control without extensive hand-engineering [8] makes this approach especially appealing for the domain of robotics. Despite several open problems related to high-dimensional continuous action spaces, partial observability of the environment and required sample-efficiency of the learning method, the recent reinforcement learning approaches showed significant progress in solving some of the robotic problems [9, 10].


An initial approach, developed in the THOTH team, showed the effectiveness of applying a state-of-the-art algorithm for deep reinforcement learning to the problems of grasping, stacking and tower building (see figure). This approach is developed in a simulation environment only and relies on features extracted from the simulation. The goal of this internship is to extend this approach as follows: (a) integrate visual information into the simulation environment to guide the system ; (b) implement and evaluate the approach on a real robot; and (c) extend the grapsing to a large set of objects.

Skills and profile

We are looking for strongly motivated master student (preferably in Computer Science or Applied Mathematics) with an interest in robotics, computer vision, reinforcement learning and deep learning. This project requires strong background in applied mathematics and excellent programming skills (Python). Prior courses or knowledge in the areas of computer vision, robotics, reinforcement learning, deep learning, machine learning is a plus. A successful project can lead to a PhD at the Thoth team at Inria Grenoble.


Please send a CV, letter of motivation, the name of two referees and transcripts of grades by e-mail to and .


[1] Jeannette Bohg, Antonio Morales, Tamim Asfour, and Danica Kragic. “Data-Driven Grasp Synthesis - A Survey”. In: IEEE Transactions on Robotics (2014).
[2] A. Sahbani, S. El-Khoury, and P. Bidaud. “An overview of 3D object grasp synthesis algorithms”. In: Robotics and Autonomous Systems (2012).
[3] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. “ImageNet classification with deep convolutional neural networks”. In: NIPS (2012).
[4] Miao Li, Kaiyu Hang, Danica Kragic, and Aude Billard. “Dexterous grasping under shape uncertainty”. In: Robotics and Autonomous Systems (2016).
[5] Ian Lenz, Honglak Lee, and Ashutosh Saxena. “Deep Learning for Detecting Robotic Grasps”. In: ICLR (2013).
[6] Ekaterina Nikandrova and Ville Kyrki. “Category-based task specific grasping”. In: Robotics and Autonomous Systems (2015).
[7] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. “Human-level control through deep reinforcement learning”. In: Nature (2015).
[8] John Schulman, Filip Wolski, and Prafulla Dhariwal. “Proximal Policy Optimization Algorithms Background : Policy Optimization”. In: CoRR (2017).
[9] Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine. “Deep Reinforcement Learning for Robotic Manipulation”. In: ICML (2016).
[10] Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. “End-to-End Training of Deep Visuomotor Policies”. In: The Journal of Machine Learning Research (2015)