Daniel DeTone's Academic Website

SuperGlue: Learning Feature Matching with Graph Neural Networks

This paper introduces SuperGlue, a neural network that matches two sets of local features by jointly finding correspondences and rejecting non-matchable points. Assignments are estimated by solving a differentiable optimal transport problem, whose costs are predicted by a graph neural network. We introduce a flexible context aggregation mechanism based on attention, enabling SuperGlue to reason about the underlying 3D scene and feature assignments jointly.

Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich

CVPR 2020 (Oral)

Paper
Code

Deep ChArUco: Dark ChArUco Marker Pose Estimation

We present a real-time pose estimation system which combines two custom deep networks, ChArUcoNet and RefineNet, with the Perspective-n-Point algorithm to estimate the marker's 6DoF pose. ChArUcoNet is a convolutional neural network which jointly outputs ID-specific classifiers and 2D point locations. The 2D point locations are further refined into subpixel coordinates using RefineNet. We evaluate Deep ChArUco in challenging scenarios and demonstrate that our approach is superior to a traditional OpenCV-based method.

Danying Hu, Daniel DeTone, Tomasz Malisiewicz

CVPR 2019

Self-Improving Visual Odometry

We propose a self-supervised learning framework that uses unlabeled monocular video sequences to generate large-scale supervision for training a Visual Odometry (VO) frontend. Our proposed frontend consists of a single multi-task CNN which outputs 2D keypoints locations, keypoint descriptors, and a novel point stability score. When trained using VO at scale on 2.5 million images, the stability classifier automatically discovers a ranking for keypoints that are not likely to help in VO, such as t-junctions across depth discontinuities, features on shadows and highlights, and dynamic objects like people.

Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich

arXiV 2018

Paper

SuperPoint: Self-Supervised Interest Point Detection and Description

This work presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass. Our model, when trained on the MS-COCO image dataset, is able to repeatedly detect a rich set of interest points and stably track them over time.

Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich

CVPR 2018 Deep Learning for Visual SLAM Workshop

Toward Geometric Deep SLAM

We present a point tracking system powered by two CNNs. The first network, MagicPoint, operates on single images and extracts salient 2D points. As transformation estimation is more simple when the detected points are geometrically stable, we designed a second network, MagicWarp, which operates on pairs of point images and estimates the homography that relates the inputs.

Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich

arXiV 2017

Paper
Press

Deep Image Homography Estimation

We present a deep convolutional neural network called HomographyNet for estimating the relative homography between a pair of images. We use a 4-point homography parameterization which maps the four corners from one image into the second image. The network is trained end-to-end using warped MS-COCO images, allow the use of large-scale training without time-consuming data collection. The HomographyNet does not require separate local feature detection and transformation estimation stages and outperforms a traditional homography estimator based on ORB.

Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich

RSS 2016 Workshop: Limits and Potentials of Deep Learning in Robotics

show more projects

3D Spatial Convnets for Semantic Segmentation

By training a 3D spatial convnet to recognize 127,915 CAD Models in 662 different categories, we can develop a rich feature hierarchy for performing 3D semantic segmentation.

Daniel DeTone, Matthew Johnson-Roberson

Winter 2015

ModelNet Webpage

Structure Sensor SDK

We built an SDK for developers to use with the Structure Sensor that includes sample code for 3D object capture, 3D room mapping, and augmented reality gaming.

Occipital

Summer 2014

Simultaneous Environment Discovery & Annotation

SEDA is a project for enhancing human learning by using state of the art techniques from AI. The non-technically constrained goal is to create an overlay to human vision to help with tasks humans are inherently bad at such as memory, calculations, and abstractions and to help speed up tasks such as looking up information and referencing material.

Michigan Student AI Lab (MSAIL)

Winter 2014

Scene Text Detection and Recognition

We built an end-to-end scene text detection and recognition framework that builds off of some recent published work of Lukas Neumann using an extremal region (ER) classifier and efficient exhaustive search.

Michigan Student AI Lab (MSAIL)

Winter 2014

Robust Locally Weighted Regression for Aesthetically Pleasing Region-of-Interest Video Generation

We provide a method that takes the output from an object tracker and creates a smoothed RoI to be viewed as the final output video. To accomplish this, we use a variation of linear regression, namely, robust locally weighted linear regression (rLWLR-Smooth).

ATLAS Collaboratory Project

AAAI-14

Parallel Tracking and Mapping for Outdoor Localization

By removing some of the long term pose optimizations and by limiting the allowed number of bundle adjustment iterations, I was able to modify PTAM to work in an outdoor localization setting. This work was used to help improve the accuracy of a multi-target tracking system.

Daniel DeTone, Yu Xiang, Silvio Savarese

Summer 2013

Robotics Competition for Autonomous SLAM and Path Planning

We entered a mobile robot, equipped with a fisheye camera and laser pointer, in a robotics competition. To win, the robot must autonomously map a small area, shoot green triangles, and return to a starting point. We implemented a fast agglomerative line fitting algorithm, a graph-based SLAM algorithm, and a memory efficient quad-tree for map storage. Our team finished 2nd out of 8 teams.

Daniel DeTone, Ibrahim Musba, Jonathan Bendes, Andrew Segavac

Winter 2013

Projectile Prediction and Robotic Retrieval using Kinect RGBD Video

We developed a fully automated projectile-catching robot by affixing a small basket to a mobile robot and predicting the projectile's landing position in real-time. We implemented a detection algorithm using RGBD video from a Kinect and an estimation algorithm using linear regression. Once the landing position was calculated, we used dead-reckoning and a PID controller to navigate the mobile robot.

Daniel DeTone, Rohan Thomare, Max Keener

Winter 2013

Tracking-by-detection in a Lecture Hall Setting

We present a framework for tracking a single human (person-of-interest) in a lecture hall environment. It is a tracking-by-detection framework that uses a generic person detector, a novel scoring function to solve the data association problem, and a Kalman filter that provides reliable state estimation. In our scoring function, we introduce two novel subcomponents: a subscore based on the target’s width and a subscore based on the color histogram of him/her at the first time step.

ATLAS Collaboratory Project

Fall 2013

Paper
Video

Particle Filter Tracking in a Lecture Hall Setting

Proof of concept for using a deformable parts model in conjunction with a particle filter and efficient MCMC sampling.

ATLAS Collaboratory Project

Fall 2013

Video

Linear array of photodiodes to track a human speaker for video recording

We present a human lecturer tracking and recording system that consists of a pan/tilt/zoom (PTZ) color video camera, a necklace of infrared LEDs and a linear photodiode array detector. Electronic output from the photodiode array is processed to generate the location of the LED necklace, which is worn by a human speaker. The LED necklace is flashed at 70Hz at a 50% duty cycle to provide noise-filtering capability.

Daniel DeTone, Homer Neal, Bob Lougheed

JoP:CS 2012

Publications