Spatio-Contextual Deep Network Based Multimodal Pedestrian Detection For Autonomous Driving


The most important component of an autonomous driving system is the module that is responsible for pedestrian detection. Although a camera is commonly used for this purpose, the quality of the image produced by the camera suffers significantly in conditions of low light, such as when driving at night. In contrast, the quality of an image captured by a thermal camera is unaffected by the conditions in which it was captured. Using RGB and thermal images, this paper presents a model for end-to-end multimodal fusion that is intended for pedestrian detection. Because of its innovative spatio-contextual deep network architecture, it is able to exploit multimodal input in an effective manner. It is made up of two separate deformable ResNeXt-50 encoders, and its purpose is to extract features from the two different modalities. A multimodal feature embedding module (MuFEm) that is comprised of several groups of a pair of Graph Attention Networks and a feature fusion unit is where these two encoded features are combined into a single feature. After that, the output of the final feature fusion unit of MuFEm is sent to two CRFs for the spatial refinement of their respective models. The application of channel-wise attention and the extraction of contextual information are used to accomplish further enhancement of the features. This is done with the assistance of four RNNs that traverse in four different directions. Last but not least, a single-stage decoder makes use of these feature maps in order to produce the score map as well as the bounding box for each pedestrian. Extensive tests of the proposed framework have been run on three publicly available multimodal pedestrian detection benchmark datasets, namely KAIST, CVC-14, and UTokyo. These datasets were used to evaluate the performance of the proposed framework. The findings on each of them led to an improvement in the respective performance of the state-of-the-art. You can watch a brief video that provides an overview of this work as well as its qualitative results at This video can be found there. When the paper is finally published, we will make our source code available to the public.

Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here

PROJECT TITLE : Tufts Dental Database A Multimodal Panoramic X-Ray Dataset for Benchmarking Diagnostic Systems
PROJECT TITLE : MM-UrbanFAC Urban Functional Area Classification Model Based on Multimodal Machine Learning ABSTRACT: The majority of the classification methods that are currently used for urban functional areas are only based
PROJECT TITLE : Learning Multimodal Representations for Drowsiness Detection ABSTRACT: The detection of drowsiness is an essential step toward ensuring safe driving. A significant amount of work has been put into developing an
PROJECT TITLE : Joint detection and matching of feature points in multimodal images ABSTRACT: In this work, we propose a novel architecture for Convolutional Neural Networks (CNNs) for the joint detection and matching of feature
PROJECT TITLE : Deep Learning on Multimodal Sensor Data at the Wireless Edge for Vehicular Network ABSTRACT: Because an exhaustive search among all candidate beam pairs cannot be assuredly completed within short contact times,

Ready to Complete Your Academic MTech Project Work In Affordable Price ?

Project Enquiry