Research
My work at GTI-UPM, Universidad Politécnica de Madrid.
Research Lines
-
Egocentric Video Understanding
activeFirst-person video perception and action recognition from wearable cameras. Focused on understanding human activity from the camera wearer's perspective.
-
VLM Optimization for Edge Devices
activeQuantization and efficient inference of vision-language models on constrained hardware — Raspberry Pi, embedded systems, and other resource-limited devices.
-
Embodied Intelligence
activeConnecting perception models to physical world interaction and action understanding — bridging the gap between what a model sees and what it can do.
Publications
- CV4Animals Workshop, CVPR 2024 · 2024
AnimalMotionCLIP: Embedding Motion in CLIP for Animal Behavior Analysis
We extend CLIP for animal behavior recognition by interleaving video frames with optical flow, adding motion awareness to a model designed for static images. Multiple temporal aggregation strategies (dense, semi-dense, sparse) are compared, achieving state-of-the-art results on the Animal Kingdom dataset.
- MDPI Sensors · 2023
Real-time monocular skeleton-based hand gesture recognition using 3D-Jointsformer
Automatic hand gesture recognition in video sequences has widespread applications, ranging from home automation to sign language interpretation and clinical operations. A hybrid approach combining 3D Convolutional Neural Networks (3D-CNNs) and Transformers is proposed: a 3D-CNN computes high-level semantic skeleton embeddings capturing local spatial and temporal characteristics, while a Transformer with self-attention efficiently captures long-range temporal dependencies. Evaluation on the Briareo and Multimodal Hand Gesture datasets achieved accuracy scores of 95.49% and 97.25% respectively, with real-time performance on a standard CPU.