Publications

Kinematic motion representation in Cine-MRI to support cardiac disease classification

Abstract

Cine-MRI sequences provide detailed information about anatomical and movement of heart, covering one full periodic cycle, which result fundamental to support diagnosis and follow personalised treatments. From such sequences, expert cardiologists can estimate cardiac performance index manually delineating shapes, and evaluating temporal geometrical changes. This patterns nevertheless are subject to proper manual delineations of ventricles and restrict the analysis to standard dynamic index, losing sight-hidden dynamic relationships that could be related with certain cardiac diseases. This work introduces a motion cardiac descriptor that fully describes kinematic heart patterns computed from local velocity fields, along the cycle. Firstly, velocity field is recovered among consecutive basal slices, which thereafter are characterised with differential kinematics, such as velocity, acceleration, divergence, and among others. Then, a regional multiscale partition allows to recover regional motion patterns, coding incidence motion measures as kinematic occurrence histograms. The set of regional motion patterns form a motion descriptor that fully describes heart dynamic and allows to automatically classify cardiac pathologies. The motion descriptor was evaluated over two different datasets, achieving averages accuracies of 80.58% (45 cases, 4 conditions) and 75.23% (100 cases, 5 conditions) mapped to a Random Forest Classifier, and over a set of Cine-MRI volumes achieved an average accuracy of 80.58% among four pathologies.

Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization

Deep learning representations to support COVID-19 diagnosis on CT slices

Abstract

To explore deep learning representations, trained from thoracic CT-slices, to automatically distinguish COVID-19 disease from control samples.

Biomédica


A COVID-19 patient severity stratification using a 3D convolutional strategy on CT-scans

Abstract

This work introduces a 3D deep learning methodology to stratify patients according to the severity of lung infection caused by COVID-19 disease on computerized tomography images (CT). A set of volumetric attention maps were also obtained to explain the results and support the diagnostic tasks. The validation of the approach was carried out on a dataset composed of 350 patients, diagnosed by the RT-PCR assay either as negative (control - 175) or positive (COVID-19 - 175). Additionally, the patients were graded (0-25) by two expert radiologists according to the extent of lobar involvement. These gradings were used to define 5 COVID-19 severity categories. The model yields an average 60\% accuracy for the multi-severity classification task. Additionally, a set of Mann Whitney U significance tests were conducted to compare the severity groups. Results show that patients in different severity groups have significantly different severity scores ($p< 0.01$) for all the compared severity groups.

IEEE International Symposium on Biomedical Imaging (ISBI), 2021.


How important is motion in sign language translation?

Abstract

More than 70 million people use at least one Sign Language (SL) as their main channel of communication. Nevertheless, the absence of effective mechanisms to translate massive information among sign, written and spoken languages is the main cause of a negligible inclusion of deaf people into society. Therefore, SL automatic recognition systems have widely proposed to support the characterization of the sign structure. Today, the natural and continuous SL recognition is an open research problem due to multiple spatio-temporal shape variations, challenging visual sign characterization, as well as the non-linear correlation among signs to express a message. This work introduces a compact sign to text architecture that explores motion as an alternative to support sign translation. Such characterization results robust to appearance variance with relative support to geometrical variations. The proposed representation focus on the main spatio-temporal regions to each corresponding word. The proposed architecture was evaluated in a built SL dataset (LSCDv1) dedicated to the motion study and also in the state-of-the-art RWTH-Phoenix. From LSCDv1 dataset, the best configuration reports a BLEU-4 score of 63.04 in testing set. Regarding the RWTH-Phoenix, the proposed strategy achieved a BLEU-4 score in test of 4.56 improving the results under similar reduced conditions.

IET Computer Vision Journal, 2020

Understanding Motion in Sign Language: A New Structured Translation Dataset

Abstract

Sign languages are the main mechanism of communication and interaction in the Deaf community. These languages are highly variable in communication with divergences between gloss representation, sign configuration, and multiple variants, among others, due to cultural and regional aspects. Current methods for automatic and continuous sign translation include robust and deep learning models that encode the visual signs representation. Despite the significant progress, the convergence of such models requires huge amounts of data to exploit sign representation, resulting in very complex models. This fact is associated to the highest variability but also to the shortage exploration of many language components that support communication. For instance, gesture motion and grammatical structure are fundamental components in communication, which can deal with visual and geometrical sign misinterpretations during video analysis. This work introduces a new Colombian sign language translation dataset (CoL-SLTD), that focuses on motion and structural information, and could be a significant resource to determine the contribution of several language components. Additionally, an encoder-decoder deep strategy is herein introduced to support automatic translation, including attention modules that capture short, long, and structural kinematic dependencies and their respective relationships with sign recognition. The evaluation in CoL-SLTD proves the relevance of the motion representation, allowing compact deep architectures to represent the translation. Also, the proposed strategy shows promising results in translation, achieving Bleu-4 scores of 35.81 and 4.65 in signer independent and unseen sentences tasks.

Asian Conference on Computer Vision, Kyoto, Japan, 2020

Sign language translation using motion filters and attention models

Latin American Meeting In Artificial Intelligence, Montevideo, Uruguay, 2019 (Poster)

Regional multiscale motion representation for cardiac disease prediction

Abstract

Heart characterization is a challenging task due to the non-linear dynamic performance and the strong shape deformation during the cardiac cycle. This work presents a regional multiscale motion representation of cardiac structures that is able to recognize pathologies on cine-MRI sequences. Firstly, a dense optical flow that considers large displacements was computed to obtain a velocity field representation. Then, regional dynamic patterns are coded into a multiscale scheme, from coarse to fine, emerging the most relevant cardiac patterns that remain along the different scales. The resulting motion descriptor is then formed by a set of flow orientation occurrences computed in whole multiscale regions. This descriptor is mapped to a previously trained Random forest classifier to obtain a prediction of the cardiac condition. The proposed strategy was evaluated over a set of 45 cine-MRI volumes achieving an average F1-score of 77.83% on the task of binary classification of among fourth cardiac conditions.

XXII Symposium on Image, Signal Processing and Artificial Vision (STSIVA), Bucaramanga, Colombia, 2019

Towards On-Line Sign Language Recognition Using Cumulative SD-VLAD Descriptors

Abstract

On-line prediction of sign language gestures is nowadays a fundamental task to help and support multimedia interpretation of deaf communities. This work presents a novel approach to recover partial sign language gestures by cumulative coding different intervals of the video sequences. The method starts by computing volumetric patches that contain kinematic information from different appearance flow primitives. Then, several sequential intervals are learned to carry out the task of partial recognition. For each new video, a cumulative shape difference (SD)-VLAD representation is obtained at different intervals of the video. Each SD-VLAD descriptor recovers mean and variance motion information as signature of the computed gesture. Along the video, each partial representation is mapped to a support vector machine model to obtain a gesture recognition, being usable in on-line scenarios. The proposed approach was evaluated in a public dataset with 64 different classes, recorded in 3200 samples. This approach is able to recognize sign gestures using only 20% of the sequence with an average accuracy of 53.8% and with 60% of information, the 80% of accuracy was achieved. For complete sequences the proposed approach achieves 85% on average.

Colombian Conference on Computing, Cartagena, Colombia, 2018

A kinematic gesture representation based on shape difference VLAD for sign language recognition

Abstract

Automatic Sign language recognition (SLR) is a fundamental task to help with inclusion of deaf community in society, facilitating, noways, many conventional multimedia interactions. In this work is proposed a novel approach to represent gestures in SLR as a shape difference-VLAD mid level coding of kinematic primitives, captured along videos. This representation capture local salient motions together with regional dominant patterns developed by articulators along utterances. Also, the special VLAD representation allows to quantify local motion pattern but also capture shape of motion descriptors, that achieved a proper regional gesture characterization. The proposed approach achieved an average accuracy of 85,45% in a corpus data of 64 sign words captured in 3200 videos. Additionally, for Boston sign dataset the proposed approach achieve competitive results with 82% of accuracy in average.

International Conference on Computer Vision and Graphics, Warsaw, Poland, 2018

* Equal authorship statement