The idea of using a wearable camera to gather visual data from a first-person perspective dates back to the 70s, when
Steve Mann invented "Digital Eye Glass", a device that, when worn, causes the human eye itself to effectively become both an electronic camera and a television display. Subsequently, wearable cameras were used for health-related applications in the context of Humanistic Intelligence and Wearable AI. Egocentric vision is best done from the point-of-eye, but may also be done by way of a neck-worn camera when eyeglasses would be in-the-way. This neck-worn variant was popularized by way of the
Microsoft SenseCam in 2006 for experimental health research works. The interest of the computer vision community into the egocentric paradigm has been arising slowly entering the 2010s and it is rapidly growing in recent years, boosted by both the impressive advances in the field of
wearable technology and by the increasing number of potential applications. The prototypical first-person vision system described by Kanade and Hebert, in 2012 is composed by three basic components: a localization component able to estimate the surrounding, a recognition component able to identify object and people, and an
activity recognition component, able to provide information about the current activity of the user. Together, these three components provide a complete situational awareness of the user, which in turn can be used to provide assistance to the user or to the caregiver. Following this idea, the first computational techniques for egocentric analysis focused on hand-related activity recognition and social interaction analysis. Also, given the unconstrained nature of the video and the huge amount of data generated,
temporal segmentation and
summarization were among the first problems addressed. After almost ten years of egocentric vision (2007–2017), the field is still undergoing diversification. Emerging research topics include: • Social saliency estimation • Multi-agent egocentric vision systems • Privacy preserving techniques and applications • Attention-based activity analysis • Social interaction analysis • Ego graphical User Interfaces (EUI) • Understanding social dynamics and attention • Revisiting robotic vision and
machine vision as egocentric sensing • Activity forecasting • Gaze prediction == Technical challenges ==