Constraint Driven Multimodal Edge Intelligence
Author: Md Fahim Khan
Publisher:
Published: 2024
Total Pages: 0
ISBN-13:
DOWNLOAD EBOOKArtificial Intelligence or AI has achieved tremendous success lately in performing critical tasks at par or beyond human-level accuracy. Among the different branches of AI, the major breakthroughs came with Deep Learning where multiple layers (Neural Networks) of processing are used to extract progressively higher-level features from data. Deep Learning has pioneered in many domains such as classification, object detection, natural language processing, and so on. The two most prominent underlying factors behind this tremendous success of deep neural models are Data and the availability of computational power. So, in short, any complex problem can be solved by leveraging AI given that enough data and enough computing resources are available. This leads us to think about the scenarios when either of these two factors encounter constraints. Very much parallel to the success story of AI, the devices and sensors are also getting smaller leading to a vast network of connected hardware with a much lower form factor making it a huge network of connected devices which is also known as the Internet of Things or IoT. Each of these devices can be considered as a computer which has almost similar functional capability as a traditional desktop but at a much lower capability. These devices can be considered edge computing nodes equipped with sensors of different modalities. AI can help make intelligent decisions or navigate with the help of the available sensory inputs within the devices. However, the traditional deep neural networks require a lot of memory and power to run which makes the intelligence on edge a difficult task. In our first work, we address this issue with the help of a layer-wise dynamic quantization scheme. Typically, the neural networks need full precision floating point arithmetic for training and inference. These floating-point computations require extensive computing power and memory. Quantization of neural networks helps reduce the deep network to a lower state representation where computation can be done with lower precision with a much lower memory footprint. We propose an iterative accuracy-driven learning framework of competitive-collaborative quantization (CCQ) to gradually adapt the bit-precision of each individual layer. Orthogonal to prior quantization policies working with full precision for the first and last layers of the network, CCQ offers layer-wise competition for any target quantization policy where any of the state-of-the-art networks can be entirely quantized without any significant accuracy degradation. In this work, while quantizing different layers to lower precision, the optimization factor was their corresponding sizes. The second work dives a little deeper into the edge computing scenario. Non-volatile memory (NVM) based crossbar arrays have recently gained popularity due to their in-memory-computing capability and low power requirement which make them much suitable for edge deployment. However, we can only realize a certain number of bits onto these crossbar fabrics which is why quantization of neural networks is necessary before deploying any models onto these fabrics. In order to make the edge nodes self-sustainable, the energy harvesting scenarios have shown a great deal of promise. However, the power delivered by the energy harvesting sources is not constant and becomes problematic as the deep learning workloads demand typically a constant power to operate. This work addresses this issue by tuning network precision at layer granularity for variable power budgets predicted for different energy harvesting scenarios. The third work looks at a different scenario where the constraint is induced by a sensor. Predicting accurate dense depth is essential for 3D scene perception use cases like autonomous driving or robotics. The state-of-the-art time-of-flight sensors provide very sparse depth data. Dense depth-completing deep learning methods obtain the true depth by incorporating RGB with the sparse sensor data. However, due to some sensor unavailability scenarios, a reliable RGB may not always be viable, especially in low-light environments. We propose a generative adversarial network that can recover depth using only the sparse depth samples provided by the time-of-flight sensors such as LiDAR. Our proposed technique achieves competitive performance and offers visually appealing reconstructed dense-depth images. The fourth work delves much deeper into sensor failure scenarios. In this paper, at first, we propose a multimodal sensor fusion strategy using transformer-based self-attention models. We train the network in a generative setting to obtain the best results. Our proposed models outperform existing studies in terms of reconstruction accuracy and also achieve competitive throughput performance. Next, we investigate how we can make these models robust to different sensor asymmetry scenarios. We propose a novel training recipe to make the model inherently robust to certain sensor failure scenarios. The models trained in such a strategy deliver reasonably good outputs even if one input modality is noisy or unavailable.