Enmin Zhong

Currently Building

EgoAgent

Building a wearable egocentric AI agent on Raspberry Pi 5 with Hailo AI HAT 2 — from hardware assembly to real-time first-person vision-language interaction.

Latest update 3 May 2026

Assembly: Raspberry Pi 5 + AI HAT+ 2

Follow the build

Publications

📄
What's next?

Work in progress — stay tuned.
CV4Animals Workshop, CVPR 2024 · 2024

AnimalMotionCLIP: Embedding Motion in CLIP for Animal Behavior Analysis

Enmin Zhong, Carlos R. Del-Blanco, Daniel Berjón, Fernando Jaureguizar, Narciso García

We extend CLIP for animal behavior recognition by interleaving video frames with optical flow, adding motion awareness to a model designed for static images. Multiple temporal aggregation strategies (dense, semi-dense, sparse) are compared, achieving state-of-the-art results on the Animal Kingdom dataset.

PDF Poster
MDPI Sensors · 2023

Real-time monocular skeleton-based hand gesture recognition using 3D-Jointsformer

Enmin Zhong, Carlos R. Del-Blanco, Daniel Berjón, Fernando Jaureguizar, Narciso García

Automatic hand gesture recognition in video sequences has widespread applications, ranging from home automation to sign language interpretation and clinical operations. A hybrid approach combining 3D Convolutional Neural Networks (3D-CNNs) and Transformers is proposed: a 3D-CNN computes high-level semantic skeleton embeddings capturing local spatial and temporal characteristics, while a Transformer with self-attention efficiently captures long-range temporal dependencies. Evaluation on the Briareo and Multimodal Hand Gesture datasets achieved accuracy scores of 95.49% and 97.25% respectively, with real-time performance on a standard CPU.

PDF Code

All Research

Featured Projects

Side projects at the intersection of AI and the physical world.

View All

EgoAgent

What's next?

AnimalMotionCLIP: Embedding Motion in CLIP for Animal Behavior Analysis

Real-time monocular skeleton-based hand gesture recognition using 3D-Jointsformer