The Importance of the Instantaneous Phase in Detecting Faces with Convolutional Neural Networks
Large scale training of Deep Learning methods requires significant computational resources. The use of transfer learning methods tends to speed up learning while producing complex networks that are very hard to interpret. This paper investigates the use of a low-complexity image processing system to investigate the advantages of using AMFM representations versus raw images for face detection. Thus, instead of raw images, we consider the advantages of using AM, FM, or AM-FM representations derived from a low-complexity filterbank and processed through a reduced LeNet-5. The results showed that there are significant advantages associated with the use of FM representations. FM images enabled very fast training over a few epochs while neither IA nor raw images produced any meaningful training for such low-complexity network. Furthermore, the use of FM images was 7× to 11× faster to train per epoch while using 123× less parameters than a reduced-complexity MobileNetV2, at comparable performance (AUC of 0.79 vs 0.80)
Fast and Scalable 2D Convolutions and Cross-correlations for Processing Image Databases and Videos on CPUs
The dominant use of Convolutional Neural Networks (CNNs) in several image and video analysis tasks necessitates a careful re-evaluation of the underlying software libraries for computing them for large-scale image and video databases. We focus our attention on developing methods that can be applied to large image databases or videos of large image sizes. We develop a method that maximizes throughput through the use of vector-based memory I/O and optimized 2D FFT libraries that run on all available physical cores. We also show how to decompose arbitrarily large images into smaller, optimal blocks that can be effectively processed through the use of overlap-andadd. Our approach outperforms Tensorflow for 5x5 kernels and significantly outperforms Tensorflow for 11x11 kernels.
Long Term Object Detection and Tracking in Collaborative Learning Environments, 2021
Long-term object detection requires the integration of frame-
based results over several seconds. For non-deformable objects, long-term
detection is often addressed using object detection followed by video
tracking. Unfortunately, tracking is inapplicable to objects that undergo
dramatic changes in appearance from frame to frame. As a related ex-
ample, we study hand detection over long video recordings in collab-
orative learning environments. More specifically, we develop long-term
hand detection methods that can deal with partial occlusions and dra-
matic changes in appearance.
Our approach integrates object-detection, followed by time projections,
clustering, and small region removal to provide effective hand detection
over long videos. The hand detector achieved average precision (AP) of
72% at 0.5 intersection over union (IoU). The detection results were im-
proved to 81% by using our optimized approach for data augmentation.
The method runs at 4.7×the real-time with AP of 81% at 0.5 intersection
over the union. Our method reduced the number of false-positive hand
detections by 80% by improving IoU ratios from 0.2 to 0.5. The overall
hand detection system runs at 4× real-time.
The Importance of the Instantaneous Phase in Detecting Faces with Convolutional Neural Networks, 2019
This thesis considers the problem of detecting faces
from the AOLME video dataset.This thesis examines the
impact of using the instantaneous phase for the AOLME block-based
face detection application. For comparison, the thesis compares the
use of the Frequency modulation image based on the instantaneous phase,
the use of the instantaneous amplitude, and the original gray scale image. To generate the FM and
AM inputs, the thesis uses dominant component analysis that aims to decrease the training overhead
while maintaining interpretability.
The results indicate that the use of the FM image yielded about the same performance as the
MobileNet V2 architecture (AUC of 0.78 vs 0.79), with vastly reduced training times
Training was 7x faster for an Intel Xeon with a GTX 1080 based desktop and 11x faster on a
laptop with Intel i5 with a GTX 1050. Furthermore, the proposed
architecture trains 123x less parameters than what is needed for MobileNet V2.
Hand Movement Detection in Collaborative Learning Environment Videos, 2018
This thesis explores detection of hand movement using color and optical flow.
Exploratory analysis considered the problem component wise on components created
from thresholds applied to motion and color. The proposed approach uses patch
color classification, space-time patches of video, and histogram of optical flow. The
approach was validated on video patches extracted from 15 AOLME video clips. The
approach achieved an average accuracy of 84% and an average receiver operating
characteristic area under curve (ROC AUC) of 89%.
UNM Digital repository Download Software
Context-Sensitive Human Activity Classification in Video Utilizing Object Recognition and Motion Estimation, 2017
This thesis explores the use of color based object detection in conjunction with
contextualization of object interaction to isolate motion vectors specific to an activity
sought within uncropped video. Feature extraction in this thesis differs significantly
from other methods by using geometric relationships between objects to infer con-
text. The approach avoids the need for video cropping or substantial preprocessing
by significantly reducing the number of features analyzed in a single frame. The
method was tested using 43 uncropped video clips with 620 video frames for writing,
1050 for typing, and 1755 frames for talking. Using simple KNN classification, the
method gave accuracies of 72.6% for writing, 71% for typing and 84.6% for talk-
ing. Classification accuracy improved to 92.5% (writing), 82.5% (typing) and 99.7%
(talking) with the use of a trained Deep Neural Network.
UNM Digital repository Download Software
Human Attention Detection Using AM-FM Representations, 2016
The thesis explores phase-based solutions for (i) detecting faces,
(ii) back of the heads, (iii) joint detection of faces and back of the heads, and (iv)
whether the head is looking to the left or the right, using standard video cameras
without any control on the imaging geometry. The proposed phase-based approach
is based on the development of simple and robust methods that relie on the use of
Amplitude Modulation - Frequency Modulation (AM-FM) models.For the students facing the camera,
the method was able to correctly classify 97.1% of them looking to the left and 95.9%
of them looking to the right. For the students facing the back of the camera, the
method was able to correctly classify 87.6% of them looking to the left and 93.3%
of them looking to the right. The results indicate that AM-FM based methods hold
great promise for analyzing human activity videos.
UNM Digital repository
Distributed and Scalable Video Analysis Architecture for Human Activity Recognition Using Cloud Services, 2016
This thesis proposes an open-source, maintainable system for detecting human
activity in large video datasets using scalable hardware architectures. The system
is validated by detecting writing and typing activities that were collected as part of
the Advancing Out of School Learning in Mathematics and Engineering (AOLME)
project. The implementation of the system using Amazon Web Services (AWS)
is shown to be both horizontally and vertically scalable. The software associated
with the system was designed to be robust so as to facilitate reproducibility and
extensibility for future research.
UNM Digital repository
Lesson Plan and Workbook for Introducing Python Game Programming to Support the Advancing Out-of-School Learning in Mathematics and Engineering (AOLME) Project, 2014
The Advancing Out-of-School Learning in Mathematics and Engineering
(AOLME) project was created specifically for providing integrated mathematics and
engineering experiences to middle-school students from under-represented groups.
The thesis presents a new approach to introducing game programming to middle
-school students that have undergone AOLME-training while still maintaining a fun
and relaxed environment. The thesis provides a discussion of three different
educational, visual-programming environments that are also designed for younger
programmers and provides motivation for the proposed approach based on Python.
The thesis details interactive activities that are intended for supporting the students
to develop their own games in Python.
UNM Digital repository
Some of the material is based upon work supported by the National Science Foundation under Grant No. 1613637 and No. 1842220. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation