wiki:Other/Summer/2022/Fusion

Version 8 (modified by Jdl217, 2 years ago) ( diff )

Multi-cam Fusion for Smart Intersection

    Multi-cam Fusion for Smart Intersection

    WINLAB Summer Internship 2022

    Group Members: Peter Wilmot, Jesse Lerner, Yue Wang, Varun Kota, Arnuv Batra

    Project Objective

    Combine input from multiple cameras in the orbit smart city environment to create a fused model which can be used for detection of cars and pedestrians. This project will require some knowledge of computer vision techniques. Students will work with camera feeds from both rgb and depth cameras, and will be responsible for writing code and developing procedures to synchronize and calibrate the cameras. There are several different tasks for this project:

    • Calibration and fusion of multiple depth camera feeds: There are four Intel realsense cameras placed in the smart intersection environment, providing multiple views of the intersection from different angles. Because the realsense cameras are rgb-d cameras, they each provide a point cloud data stream. These point clouds can be combined to form a more complete 3d data stream of the intersection environment. This can be done simply if the positions of the cameras relative to each other in 3d space are known. Students will research and implement a routine for “calibrating” the depth camera deployment, thereby finding the positions of all of the cameras and allowing for the combination of the point cloud data.
    • Image stitching of multiple top-down rgb cameras: The smart intersection includes several overlapping top-down views meant to allow for remote control of the vehicles in the environment. The video from these cameras needs to be stitched in real time to create a single video stream with the full top-down view.
    • (Optional) Fusion of all views in the intersection: If both the depth camera fusion and top-down stitching projects are finished before the end of the internship, the next step is to combine both views of the intersection with any available views from vehicles driving through the intersection.
    • (Optional) Low latency video streaming: Develop FPGA/GPU based low latency video streaming

    Students should get started with this project by using ROS (robot operating system) to get video streams from all of the depth cameras in the intersection. There is a realsense library for ROS, which should simplify getting all of the camera streams into one place.

    Weekly Progress:

    Week One: Discussed theoretical strategies for Camera fusion. Started training on Orbit nodes and ROS.

    Week Two: Connected to the Intersection Cameras and used graphical programs like Rviz to see the pointclouds being generated.

    Week Three: Started to work on getting the separate nodes and RealSense Cameras working in tandem and publishing to a single place, connecting nodes through a ROS Master node.

    Week Four: Found the transforms necessary and succeeded in overlaying the pointclouds using manual methods.

    Week Five: Learned to use ROS nodes to provide services and send messages.

    Week Six: Created a ROS server and client to make use of the PyTorch3d Iterative Closest Point method, and started to attempt to get PyTorch3d working.

    Week Seven: Spent a lot of time trying to get PyTorch3d to work, ultimately failing because WINLAB nodes don't work well with CUDA (which is required for PyTorch3d)

    Week Eight: Switched gears and started to work on using ArUco tags with OpenCV to calibrate the cameras.

    Week Nine: Continued work on finding the correct transform using ArUco, and started work on the Camera Latency with PTP project.

    Note: See TracWiki for help on using the wiki.