Sibo Zhu (朱思博)

sibozhu AT mit.edu

Massachusetts Institute of Technology

Welcome to my personal website!

My name is Sibo Zhu, and I am currently a research assistant at MIT Department of Electrical Engineering and Computer Science (EECS). I am excited to be working with Prof. Song Han in the HAN (Hardware, Accelerators, and Neural Networks) Lab on robotic perception and efficient deep learning.

I am also the perception lead at MIT Driverless, a student-led high-speed autonomous racing team, developing full scale vehicles and autonomous software to compete in driverless racing competitions.

Before coming to MIT, I received my M.S. in Computer Science from Brandeis University. During my masters, I was fortunate to work with Prof. Hongfu Liu. Prior to that, I received my B.A. in Computer Science and B.A. in Pure & Applied Mathematics from Boston University, worked closely with Prof. Sang (“Peter”) Chin.

Outside of research, I enjoy snowboarding, skydiving, running, hiking, working out, cooking, movies, music and especially travelling to balance my life from work.

Interests

Robotics
Perception
Efficient Deep Learning
Data Mining

Education

M.S in Computer Science, 2020
Brandeis University
B.A in Computer Science, 2018
Boston University
B.A in Pure & Applied Mathematics, 2018
Boston University

Featured Publications

Kieran Strobel, Sibo Zhu, Raphael Chang, Skanda Koppula

July 2020 International Conference on Intelligent Robots and Systems(IROS)

Accurate, Low-Latency Visual Perception for Autonomous Racing: Challenges, Mechanisms, and Practical Solutions

Autonomous racing provides the opportunity to test safety-critical perception pipelines at their limit. This paper describes the practical challenges and solutions to applying state-of-the-art computer vision algorithms to build a low-latency, high-accuracy perception system for DUT18 Driverless (DUT18D), a 4WD electric race car with podium finishes at all Formula Driverless competitions for which it raced. The key components of DUT18D include YOLOv3-based object detection, pose estimation, and time synchronization on its dual stereovision/monovision camera setup. We highlight modifications required to adapt perception CNNs to racing domains, improvements to loss functions used for pose estimation, and methodologies for sub-microsecond camera synchronization among other improvements. We perform a thorough experimental evaluation of the system, demonstrating its accuracy and low-latency in real-world racing scenarios.

PDF Code Dataset Slides Video Tutorial

See all publications

Experience

Perception Lead

MIT Driverless

May 2020 – Present Cambridge, MA

Summary:

Objective: Design, Implement and Deploy Efficient and Robust Perception System for High-speed Autonomous Racing at Indy Autonomous Challenge and Roborace
Own the entire perception system (LiDAR and Camera) for a full scale on-track autonomous racing vehicle, leading 10 student-engineers and managing the entire perception production process.
Responsible for data collection, annotation, NN customization, integrating NN into ROS and inferring with TensorRT (C++).
QA and testing by running developed perception system on our 25% scale testbed vehicle and full scale vehicle
With my team, we replicated the state-of-the-art sensor fusion perception model “PointPainting” with PyTorch.
With my team, we propose and develop a framework that uses both camera and LiDAR history information to predict future LiDAR frames.

Research Assistant

MIT HAN Lab

Jan 2020 – Present Cambridge, MA

End-To-End Camera and LiDAR Extrinsic Calibration (PyTorch)

Summary:

Objective: Using deep neural networks to conduct target-less Ccmera to LiDAR extrinsic calibration in real-time and end-to-end.
Designed an online calibration algorithm, by employing 2D and 3D semantic segmentation networks as backbone, and SGD as optimizer.
Designed an offline PyTorch-based extrinsic calibration end-to-end network, without need for any annotation
By directly generating calibration transform from sensory input, our proposed framework outperforms state-of-the-art target-less extrinsic calibration methods
Reduced the calibration time cost from >3hrs conventional manual calibration with checkerboard down to <1s end-to-end deep learning

Summary:

Objective: Employing only LiDAR and Navigation Map information as input, realize end-to-end autonomous navigation from raw 3D point cloud to vehicle control on full-scale vehicle.
Visualization of network attention on the end-to-end autonomous driving framework.
Collected 50km+ driving data from CARLA simulator, can further help model with generalization.

Deploy PVCNN on Real-World Self-Driving Car (PyTorch, ROS)

Summary:

Objective: Deploy a state-of-the-art 3D deep learning framework PVCNN for LiDAR-based traffic landmark detection on autonomous racing vehicle.
Re-design architecture of Point-Voxel CNN to a classification network that does geometry and color classification based off x, y, z and intensity information provided by Velodyne 32 channel LiDAR that mounted on MIT Driverless racing vehicle.
Decrease classification error rate by 5 times. (accuracy increased from 95% to 99.99%+)
Improved LiDAR detection pipeline latency by 1.5 times. (From 5 ms to 3.4 ms)

Machine Learning Lead

MIT Driverless

Sep 2019 – May 2020 Cambridge, MA

LiDAR Perception System (PyTorch, C++, ROS)

Summary:

Objective: Machine Learning pipeline that supports detecting and localizing landmarks on racing tracks through LiDAR sensor.
Design the LiDAR-based perception pipeline and lead a team of 4 engineers.
Deploy state-of-the-art research “Point-Voxel CNN” with non-trivial customization, Improved LiDAR detection system’s accuracy from 94% to 99.99%+, with 1.5 times speed-up.

Perception Core Enginner

MIT Driverless

Jan 2019 – Sep 2019 Cambridge, MA

Camera Perception System (PyTorch, C++, ROS)

Summary:

Objective: Machine Learning system that detects and localizes landmarks on racing tracks through cameras.
Customized SOTA object detection NN (YOLOv3) for autonomous racing with custom preprocessing, NN pruning, and quantization; improved mAP accuracy from 66.97% to 89.35%, inference speed from 120ms to 30ms.
Implemented a ResNet inspired network for racetrack landmark key-points detection and localization with a 93% accuracy.
Deployed all networks on a ROS based real autonomous vehicle with whole stack latency 200ms.
Open-sourced codebase and dataset, potentially used by 100+ Formula Student teams from all over the world.

Research Assistant

Brandeis University Hongfu Liu’s Lab

Sep 2018 – Jan 2020 Waltham, MA

iPOF: An Extremely and Excitingly Simple Outlier Detector via Infinite Propagation (Python)

Summary:

Objective: Enhance state-of-the-art outlier detectors through post-detection ensemble
Developed an outlier detection algorithm with direction awareness of each data point’s K nearest neighbors.
Achieved positive improvements ranging from 2% to 46% in average. In some cases, iPOF boosts the performance over 3000% over the original outlier detection algorithm.

Research Assistant

Boston University LISP Lab

Jun 2017 – May 2018 Boston, MA

High-Speed Camera with Custom Exposure (TensorFlow, Keras)

Summary:

Objective: Detect and segment motion-blurred areas from camera images
Developed a CNN-based neural network for motion blur detection, resulted in 92% accuracy of blurry batch detection.
Open-sourced the project and received over 100 Github stars.

Director of Tech Department

Boston University Chinese Students and Scholars Association

Sep 2014 – May 2018 Boston, MA

Achievements:

Founded Tech department and secured over $10,000 in sponsorship BUCSSA.
Led a team of over 20 engineers to develop a JavaScript and React Native based mobile application for BU students on both iOS and Android system.
Acquired over 1000 student users within BU.

Skills

Python

Advanced

C++

Intermediate

ROS

Intermediate

PyTorch

Advanced

TensorFlow

Intermediate

Linux/Ubuntu

100% Addicted

Running

Jogging/Marathon

Snowboarding

Carving/Freestyle

Photography

Digital/Film

Projects

Faster LiDAR Based on Predictive Deep Learning Model

Normal LiDAR in the market runs at 10hz, which is sufficient for state-of-the-art autonomous road vehicle, but not enough for autonomous racing vehicle that runs at 180mph. In order to make a “faster LiDAR”, inspired by the fact that camera(30+hz) and LiDAR(10hz) holds different operating frequency, we propose a method that uses both camera and LiDAR history information to predict future LiDAR frames.

Slides Video Related Tweet

Replication of the “PointPainting”

By replicating the state-of-the-art sensor fusion detection model “PointPainting”, we further use this tool to test/evaluate our 3D point cloud predictive model and end-to-end extrinsic sensor calibration model.

Original Paper Slides Video Related Tweet

“Point-Voxel CNN” Deployment on Full Scale Auotnomous Racing Vehicle

We deployed the state-of-the-art LiDAR perception model “Point-Voxel CNN” on MIT Driverless’s full scale autonomous racing vehicle. The deployment includes converting model task from segmentation to classification, ROS integration and full scale vehicle testing. Find full story in the video provided here.

Original Paper Slides Video Original Code Related Tweet

Grad-Cam With Object Detection(YOLOV3)

As my first project at MIT Driverless, my task was meant to find the visual explaination of the CNN-based object detection model that perception team is using, the YOLOv3. After reviewing the results, we concluded that the network has its most attention on the bottom part of the object(traffic cone), and in some cases the margin between cone and ground.

Code

Motion Blur Detection

In order to make custom high-speed cameras that can deal with small patches of motion blur, I proposed a custom convolutional model that can detect motion blurred patches within images, achieved two sigma accuracy.

Slides Code