Recognizing Action at a Distance
Source:
ICCV'03: Proceedings of the 2003 IEEE International
Conference on Computer Vision, p.726--733 (2003)
Abstract:
Our goal is to recognize human action at a distance, at
resolutions where a whole person may be, say, 30 pixels tall. We
introduce a novel motion descriptor based on optical flow
measurements in a spatio-temporal volume for each stabilized human
figure, and an associated similarity measure to be used in a
nearest-neighbor framework. Making use of noisy optical flow
measurements is the key challenge, which is addressed by treating
optical flow not as precise pixel displacements, but rather as a
spatial pattern of noisy measurements which are carefully smoothed
and aggregated to form our spatio-temporal motion descriptor. To
classify the action being performed by a human figure in a query
sequence, we retrieve nearest neighbor(s) from a database of
stored, annotated video sequences. We can also use these retrieved
exemplars to transfer 2D/3D skeletons onto the figures in the query
sequence, as well as two forms of data-based action synthesis "do
as I do" and "do as I say". Results are demonstrated on ballet,
tennis as well as football datasets.