Deep-Learning-based Video Analysis for Human Action Evaluation

Deep-Learning-based Video Analysis for Human Action Evaluation

Author: Chen Du

Publisher:

Published: 2022

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

As video analysis provides an automatic solution to extract meaningful information from the video content, it can be applied in healthcare to evaluate human action patterns for various purposes, such as biometrics estimation and performance assessment. In recent years, the fast development of deep learning and portable medical sensors has led to more affordable and accurate computer vision-based measurements for human action patterns, thus enabling a more efficient video analysis system for action evaluation in home and clinic environments. We investigate the novel usage of video analysis for healthcare monitoring purposes, including objective biometrics estimation and subjective action quality assessment. We propose a deep learning framework to extract spatial-temporal features and estimate biometrics or performance scores from 3D body landmarks using a graph convolutional neural network, which offers a portable solution to obtain gold-standard biometrics with 3D multi-joint coordination underlying body movements and can provide real-time feedback of movement performance for rehabilitation exercises. For biometrics estimation, in Chapter 2, we propose two single-task models for video-level and frame-level estimation, respectively, and a multi-task learning approach to estimate CoP metrics on two different temporal levels in parallel. To facilitate this line of research, we collect and release a novel computer-vision-based 3D body landmark dataset using pose estimation. We extend our framework to a traditional kinematics dataset collected by on-body reflective markers by using adaptive graph convolution. For action quality assessment, we propose a deep learning framework for automatic assessment of physical rehabilitation exercises using a graph convolutional network with self-supervised regularization in Chapter 3. To further improve the accessibility of the real-time CoP metrics estimation system, we investigate a view-invariant video-level CoP metrics estimation framework using a single RGB camera in Chapter 4, which could significantly benefit the data collection in home and clinic environments. We also explore a semi-supervised learning framework for video-level CoP metrics estimation for partially labeled data with only a small portion of labels in Chapter 5. Our proposed methods potentially enable a more affordable, comprehensive, and portable virtual therapy system than is available with existing tools.


Machine Learning for Vision-Based Motion Analysis

Machine Learning for Vision-Based Motion Analysis

Author: Liang Wang

Publisher: Springer Science & Business Media

Published: 2010-11-18

Total Pages: 377

ISBN-13: 0857290576

DOWNLOAD EBOOK

Techniques of vision-based motion analysis aim to detect, track, identify, and generally understand the behavior of objects in image sequences. With the growth of video data in a wide range of applications from visual surveillance to human-machine interfaces, the ability to automatically analyze and understand object motions from video footage is of increasing importance. Among the latest developments in this field is the application of statistical machine learning algorithms for object tracking, activity modeling, and recognition. Developed from expert contributions to the first and second International Workshop on Machine Learning for Vision-Based Motion Analysis, this important text/reference highlights the latest algorithms and systems for robust and effective vision-based motion understanding from a machine learning perspective. Highlighting the benefits of collaboration between the communities of object motion understanding and machine learning, the book discusses the most active forefronts of research, including current challenges and potential future directions. Topics and features: provides a comprehensive review of the latest developments in vision-based motion analysis, presenting numerous case studies on state-of-the-art learning algorithms; examines algorithms for clustering and segmentation, and manifold learning for dynamical models; describes the theory behind mixed-state statistical models, with a focus on mixed-state Markov models that take into account spatial and temporal interaction; discusses object tracking in surveillance image streams, discriminative multiple target tracking, and guidewire tracking in fluoroscopy; explores issues of modeling for saliency detection, human gait modeling, modeling of extremely crowded scenes, and behavior modeling from video surveillance data; investigates methods for automatic recognition of gestures in Sign Language, and human action recognition from small training sets. Researchers, professional engineers, and graduate students in computer vision, pattern recognition and machine learning, will all find this text an accessible survey of machine learning techniques for vision-based motion analysis. The book will also be of interest to all who work with specific vision applications, such as surveillance, sport event analysis, healthcare, video conferencing, and motion video indexing and retrieval.


Deep Learning for Human Activity Recognition

Deep Learning for Human Activity Recognition

Author: Xiaoli Li

Publisher: Springer Nature

Published: 2021-02-17

Total Pages: 139

ISBN-13: 9811605750

DOWNLOAD EBOOK

This book constitutes refereed proceedings of the Second International Workshop on Deep Learning for Human Activity Recognition, DL-HAR 2020, held in conjunction with IJCAI-PRICAI 2020, in Kyoto, Japan, in January 2021. Due to the COVID-19 pandemic the workshop was postponed to the year 2021 and held in a virtual format. The 10 presented papers were thorougly reviewed and included in the volume. They present recent research on applications of human activity recognition for various areas such as healthcare services, smart home applications, and more.


Machine Learning for Human Motion Analysis: Theory and Practice

Machine Learning for Human Motion Analysis: Theory and Practice

Author: Wang, Liang

Publisher: IGI Global

Published: 2009-12-31

Total Pages: 318

ISBN-13: 1605669016

DOWNLOAD EBOOK

"This book highlights the development of robust and effective vision-based motion understanding systems, addressing specific vision applications such as surveillance, sport event analysis, healthcare, video conferencing, and motion video indexing and retrieval"--Provided by publisher.


Demystifying Human Action Recognition in Deep Learning with Space-Time Feature Descriptors

Demystifying Human Action Recognition in Deep Learning with Space-Time Feature Descriptors

Author: Mike Nkongolo

Publisher: GRIN Verlag

Published: 2018-02-21

Total Pages: 39

ISBN-13: 3668642591

DOWNLOAD EBOOK

Research Paper (postgraduate) from the year 2018 in the subject Computer Science - Internet, New Technologies, , course: Machine Learning, language: English, abstract: Human Action Recognition is the task of recognizing a set of actions being performed in a video sequence. Reliably and efficiently detecting and identifying actions in video could have vast impacts in the surveillance, security, healthcare and entertainment spaces. The problem addressed in this paper is to explore different engineered spatial and temporal image and video features (and combinations thereof) for the purposes of Human Action Recognition, as well as explore different Deep Learning architectures for non-engineered features (and classification) that may be used in tandem with the handcrafted features. Further, comparisons between the different combinations of features will be made and the best, most discriminative feature set will be identified. In the paper, the development and implementation of a robust framework for Human Action Recognition was proposed. The motivation behind the proposed research is, firstly, the high effectiveness of gradient-based features as descriptors - such as HOG, HOF, and N-Jets - for video-based human action recognition. They are capable of capturing both the salient spatial and temporal information in the video sequences, while removing much of the redundant information that is not pertinent to the action. Combining these features in a hierarchical fashion further increases performance.


Structured Deep Learning for Video Analysis

Structured Deep Learning for Video Analysis

Author: Fabien Baradel

Publisher:

Published: 2020

Total Pages: 171

ISBN-13:

DOWNLOAD EBOOK

With the massive increase of video content on Internet and beyond, the automatic understanding of visual content could impact many different application fields such as robotics, health care, content search or filtering. The goal of this thesis is to provide methodological contributions in Computer Vision and Machine Learning for automatic content understanding from videos. We emphasis on problems, namely fine-grained human action recognition and visual reasoning from object-level interactions. In the first part of this manuscript, we tackle the problem of fine-grained human action recognition. We introduce two different trained attention mechanisms on the visual content from articulated human pose. The first method is able to automatically draw attention to important pre-selected points of the video conditioned on learned features extracted from the articulated human pose. We show that such mechanism improves performance on the final task and provides a good way to visualize the most discriminative parts of the visual content. The second method goes beyond pose-based human action recognition. We develop a method able to automatically identify unstructured feature clouds of interest in the video using contextual information. Furthermore, we introduce a learned distributed system for aggregating the features in a recurrent manner and taking decisions in a distributed way. We demonstrate that we can achieve a better performance than obtained previously, without using articulated pose information at test time. In the second part of this thesis, we investigate video representations from an object-level perspective. Given a set of detected persons and objects in the scene, we develop a method which learns to infer the important object interactions through space and time using the video-level annotation only. That allows to identify important objects and object interactions for a given action, as well as potential dataset bias. Finally, in a third part, we go beyond the task of classification and supervised learning from visual content by tackling causality in interactions, in particular the problem of counterfactual learning. We introduce a new benchmark, namely CoPhy, where, after watching a video, the task is to predict the outcome after modifying the initial stage of the video. We develop a method based on object- level interactions able to infer object properties without supervision as well as future object locations after the intervention.


Applied Video Processing in Surveillance and Monitoring Systems

Applied Video Processing in Surveillance and Monitoring Systems

Author: Dey, Nilanjan

Publisher: IGI Global

Published: 2016-10-11

Total Pages: 321

ISBN-13: 1522510230

DOWNLOAD EBOOK

Video monitoring has become a vital aspect within the global society as it helps prevent crime, promote safety, and track daily activities such as traffic. As technology in the area continues to improve, it is necessary to evaluate how video is being processed to improve the quality of images. Applied Video Processing in Surveillance and Monitoring Systems investigates emergent techniques in video and image processing by evaluating such topics as segmentation, noise elimination, encryption, and classification. Featuring real-time applications, empirical research, and vital frameworks within the field, this publication is a critical reference source for researchers, professionals, engineers, academicians, advanced-level students, and technology developers.


Human Action Recognition with Depth Cameras

Human Action Recognition with Depth Cameras

Author: Jiang Wang

Publisher: Springer Science & Business Media

Published: 2014-01-25

Total Pages: 65

ISBN-13: 331904561X

DOWNLOAD EBOOK

Action recognition technology has many real-world applications in human-computer interaction, surveillance, video retrieval, retirement home monitoring, and robotics. The commoditization of depth sensors has also opened up further applications that were not feasible before. This text focuses on feature representation and machine learning algorithms for action recognition from depth sensors. After presenting a comprehensive overview of the state of the art, the authors then provide in-depth descriptions of their recently developed feature representations and machine learning techniques, including lower-level depth and skeleton features, higher-level representations to model the temporal structure and human-object interactions, and feature selection techniques for occlusion handling. This work enables the reader to quickly familiarize themselves with the latest research, and to gain a deeper understanding of recently developed techniques. It will be of great use for both researchers and practitioners.


Learning to Recognize Human Actions

Learning to Recognize Human Actions

Author: Albert Clapés i Sintes

Publisher:

Published: 2019

Total Pages: 126

ISBN-13:

DOWNLOAD EBOOK

"Action recognition is a very challenging and important problem in computer vision. Researchers working on this field aspire to provide computers with the ability to visually perceive human actions - that is, to observe, interpret, and understand human-related events that occur in the physical environment merely from visual data. The applications of this technology are numerous: human-machine interaction, e-health, monitoring/surveillance, and content-based video retrieval, among others. Hand-crafted methods dominated the field until the apparition of the first successful deep learning-based action recognition works. Although earlier deep-based methods underperformed with respect to hand-crafted approaches, these slowly but steadily improved to become state-of-the-art, eventually achieving better results than hand-crafted ones. Still, hand-crafted approaches can be advantageous in certain scenarios, specially when not enough data is available to train very large deep models or simply to be combined with deep-based methods to further boost the performance. Hence, showing how hand-crafted features can provide extra knowledge the deep networks are not able to easily learn about human actions. This Thesis concurs in time with this change of paradigm and, hence, reflects it into two distinguished parts. In the first part, we focus on improving current successful hand-crafted approaches for action recognition and we do so from three different perspectives. Using the dense trajectories framework as a backbone: first, we explore the use of multi-modal and multi-view input data to enrich the trajectory descriptors. Second, we focus on the classification part of action recognition pipelines and propose an ensemble learning approach, where each classifier learns from a different set of local spatiotemporal features to then combine their outputs following an strategy based on the Dempster-Shaffer Theory. And third, we propose a novel hand-crafted feature extraction method that constructs a mid-level feature description to better model long-term spatiotemporal dynamics within action videos. Moving to the second part of the Thesis, we start with a comprehensive study of the current deep-learning based action recognition methods. We review both fundamental and cutting edge methodologies reported during the last few years and introduce a taxonomy of deep-learning methods dedicated to action recognition. In particular, we analyze and discuss how these handle the temporal dimension of data. Last but not least, we propose a residual recurrent network for action recognition that naturally integrates all our previous findings in a powerful and promising framework." -- TDX.


Deep Learning for Human Motion Analysis

Deep Learning for Human Motion Analysis

Author: Natalia Neverova (informaticienne).)

Publisher:

Published: 2016

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

The research goal of this work is to develop learning methods advancing automatic analysis and interpreting of human motion from different perspectives and based on various sources of information, such as images, video, depth, mocap data, audio and inertial sensors. For this purpose, we propose a several deep neural models and associated training algorithms for supervised classification and semi-supervised feature learning, as well as modelling of temporal dependencies, and show their efficiency on a set of fundamental tasks, including detection, classification, parameter estimation and user verification. First, we present a method for human action and gesture spotting and classification based on multi-scale and multi-modal deep learning from visual signals (such as video, depth and mocap data). Key to our technique is a training strategy which exploits, first, careful initialization of individual modalities and, second, gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. Moving forward, from 1 to N mapping to continuous evaluation of gesture parameters, we address the problem of hand pose estimation and present a new method for regression on depth images, based on semi-supervised learning using convolutional deep neural networks, where raw depth data is fused with an intermediate representation in the form of a segmentation of the hand into parts. In separate but related work, we explore convolutional temporal models for human authentication based on their motion patterns. In this project, the data is captured by inertial sensors (such as accelerometers and gyroscopes) built in mobile devices. We propose an optimized shift-invariant dense convolutional mechanism and incorporate the discriminatively-trained dynamic features in a probabilistic generative framework taking into account temporal characteristics. Our results demonstrate, that human kinematics convey important information about user identity and can serve as a valuable component of multi-modal authentication systems.