This post has been republished via RSS; it originally appeared at: Microsoft Research.
HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations
Generating both plausible and accurate full body avatar motion is essential for creating high quality immersive experiences in mixed reality scenarios. Head-mounted devices (HMDs) typically only provide a few input signals, such as head and hands 6-DoF—or the six degrees of freedom of movement by a rigid body in a three-dimensional space. Recent approaches have achieved impressive performance in generating full body motion given only head and hands signal. However, all known existing approaches rely on full hand visibility. While this is the case when using motion controllers, for example, a considerable proportion of mixed reality experiences do not involve motion controllers and instead rely on egocentric hand tracking. This introduces the challenge of partial hand visibility, owing to the restricted field of view of the HMD.
In a recent paper: HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations, researchers from Microsoft propose HMD-NeMo, the first unified approach that addresses plausible and accurate full body motion generation even when the hands may be only partially visible. HMD-NeMo is a lightweight neural network that predicts full body motion in an online and real-time fashion. At the heart of HMD-NeMo is a spatio-temporal encoder with novel temporally adaptable mask tokens that encourage plausible motion in the absence of hand observations. The researchers perform extensive analysis of the impact of different components in HMD-NeMo and, through their evaluation, introduce a new state-of-the-art on AMASS, a large database of human motion unifying different optical marker-based motion capture datasets.
Will Code Remain a Relevant User Interface for End-User Programming with Generative AI Models?
The research field of end-user programming has largely been concerned with helping non-experts learn to code well enough to achieve their own tasks. Generative AI stands to obviate this entirely by allowing users to generate code from naturalistic language prompts.
In a recent essay: Will Code Remain a Relevant User Interface for End-User Programming with Generative AI Models?, researchers from Microsoft explore the relevance of “traditional” programming languages for non-expert end-user programmers in a world with generative AI. They posit the “generative shift hypothesis”: that generative AI will create qualitative and quantitative expansions in the traditional scope of end-user programming. They outline some reasons that traditional programming languages may still be relevant and useful for end-user programmers, and speculate whether each of these reasons might endure or disappear with further improvements and innovations in generative AI. And finally, they articulate a set of implications for end-user programming research, including the possibility of needing to revisit many well-established core concepts, such as Ko’s learning barriers and Blackwell’s attention investment model.
LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup
On-device deep neural network (DNN) inference, widely used in mobile devices such as smartphones and smartwatches, offers unparalleled intelligent services, but also stresses the limited hardware resources on those devices.
In a recent paper: LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup, researchers at Microsoft propose a system that consumes less latency, memory, disk, and power, for more efficient DNN inference. LUT-NN learns the typical features for each operator, known as the centroid, and precomputes the results for these centroids to save in lookup tables. During inference, the results of the closest centroids with the inputs can be read directly from the table, as the approximated outputs without computations.
LUT-NN integrates two major novel techniques: (1) differentiable centroid learning through backpropagation, which adapts three levels of approximation to minimize the accuracy impact by centroids; (2) table lookup inference execution, which comprehensively considers different levels of parallelism, memory access reduction, and dedicated hardware units for optimal performance.