You Have a Familiar Face, So Let’s Not Waste CPU Cycles



As long as there have been computers, our expectations for them have outstripped their capabilities. The fact that their memory and processing capacities have increased by orders of magnitude over the past few decades has not changed this one bit. No matter how much computing power we can get our hands on, we will always find bigger problems to solve that push past the present limits of the hardware.

Today, many of these problems exist in the world of artificial intelligence (AI). In recent years, it has been found that models with extremely large parameter counts (and with them, proportionally large computational requirements) do an excellent job of improving the performance of many applications. But due to factors like cost, latency, and privacy, this path forward appears to be a dead end. Algorithms must run on-device, or at least at the edge, to overcome these issues.

But if we are pushing the limits of what is possible on cutting-edge equipment, how in the world are we going to run them on much more constrained hardware? Unless we want to be patient (we don’t) and wait for hardware to catch up, we are going to have to get creative. And boy did a pair of engineers at Khalifa University ever come up with a creative solution for more efficient facial recognition. They squeezed every last drop of performance out of an NVIDIA Jetson AGX Orin edge computer to get a blazing fast frame rate and top-notch energy efficiency.

Traditional approaches generally assign tasks exclusively to either CPUs or GPUs, which results in underutilization of available hardware. In their work, the team tapped into the full suite of processing elements embedded in the Jetson AGX Orin, including the GPUs, CPUs, Deep Learning Accelerators, Vision Image Compensators, and Video Decoders/Encoders to achieve superior performance.

In addition to maximizing the use of the available hardware, they also integrated a face tracking module into the recognition pipeline. Normally, systems attempt to recognize every face in every single video frame, which is both computationally expensive and redundant. But by tracking faces across frames, which is a much less complex algorithm, recognition only needs to be triggered when a new face appears. This drastically reduces unnecessary processing.

This dual optimization of hardware and software yielded some impressive results. The system achieved a throughput of 290 frames per second on full HD video streams containing an average of six faces per frame, which is a significant improvement over traditional GPU-only methods. Furthermore, by offloading tasks across multiple hardware units and integrating tracking, the researchers shaved off about 800 milliwatts of power consumption, which is quite significant in the world of edge computing.

With the demand for edge computing only increasing as time goes by, this type of approach may prove to be important for the next-generation of computer vision systems. In any case, this work proves that when hardware limitations rear their ugly head, the solution is not always adding more power — it is about better using every ounce of what you already have.

By admin

Deixe um comentário

O seu endereço de email não será publicado. Campos obrigatórios marcados com *