Enhancing Engagement in Online Courses: AI-Powered Gaze Tracking Technology for Improved Eye Contact

Imagine taking an online course where most content is delivered via pre-recorded videos. What would a participant think if the instructor on screen is looking elsewhere instead of at the camera, and their gaze shifts from one place to another seeming distracted or anxious? This scenario often occurs when online instructors prepare their content. Most of them do not have the training or experience to always look at the camera recording them, which is entirely understandable. Most instructors are used to the in-person classroom experience. A standard engagement activity is to look all around the classroom, trying to address the whole class and not directly at any point during their lecture. Therefore when they record in a studio setting, being on camera may be challenging, and the resulting video product often lacks that element of direct eye contact that would increase the feeling of being directly looked at in a conversation. This situation often results in the instructor wanting to avoid appearing on screen to mitigate this issue or appearing on screen and looking distracted. Both conditions produce a video with reduced engagement capabilities.

There are methods to mitigate this. However, the most common one is using a teleprompter, which is rather challenging. The problem with following this path is coming up with the script ahead of time, as this is a heavy pre-production burden often placed on the instructors, who are usually used to presenting on the fly. The method discussed in this presentation is non-intrusive, flexible, and scalable.

Using a software developer kit (SDK) called Nvidia Gaze, video can be analyzed by an artificial intelligence (AI) engine that uses the power of Nvidia’s RTX line of multicore processors to track the presenter’s body posture, face, and direction of their eye direction. It is non-intrusive as it can, in real-time, redirect the instructor’s gaze directly at the camera at all times, within certain limitations. The result is incredibly realistic and not perceivable by the audience, only seeing the processed video with enhanced results. Presenters highly appreciate this method as they do not have the burden of a conscious effort to look straight at the camera and can entirely focus on their delivery. 

Instructional designers and content producers appreciate the flexibility and scalability of this method, as existing videos can also be processed. The SDK can be instructed to process videos as a post-production tool to enhance pre-recorded materials. Moreover, depending on the project’s needs, multiple cards can process existing materials in batches.

During the presentation, we will demonstrate the before and after results of using this method in a one-minute video and go over the steps we followed to use this SDK. We will close by explaining the limitations and walkarounds necessary to use this technology still in beta stages.

This is how the finished product looks