Motion Recruitment | Jobspring | Workbridge

Senior Machine Learning AI Engineer / Remote / Entertainment / Audio ML expert

Los Angeles, California

100% Remote

Full Time

$225k - $275k

Job Description
This is a full-time opportunity based in Los Angeles (hybrid or remote optional for the right candidate) with a stealth-mode startup that’s building foundational AI technology for the connected home. The team is rethinking what it means for hardware, software, and machine learning to work together at the OS level delivering intelligent, real-time responsiveness through multimodal input (voice, video, and context). The tech stack is cutting-edge, and this role centers on applied machine learning at the intersection of speech, audio, and video understanding.
Join a world-class team of engineers, scientists, and designers reimagining the interface between people and their environments. In this role, you’ll help build a system that doesn’t just respond it anticipates. This is an incredible opportunity for an Audio ML or Intent Recognition expert to shape a product that integrates AI into daily life in a meaningful, consumer-facing way. The role offers ownership, autonomy, and the chance to work with advanced ML architectures, foundational model development, and real-time sequence processing.
Required Skills & Experience
· 4+ years of experience in applied ML focused on audio, speech, or intent classification
· Strong foundation in training sequence models (RNNs, Transformers, BERT, Whisper, CLIP, Wav2Vec, etc.)
· Experience working with speech/audio ML, embedded systems, or multimodal inputs
· Proven track record building and training models from scratch not just relying on APIs
· Deep familiarity with tools like PyTorch, TensorFlow, Hugging Face Transformers, torchaudio, librosa, OpenCV, etc.
Desired Skills & Experience
· Experience with signal processing, voice activity detection, audio embeddings, speaker identification
· Comfort with model evaluation, real-time inference, labeling, augmentation, and tuning
· Background with visual sequence modeling, action recognition, or video context detection
· Prior experience in environments like OpenAI, DeepMind, Amazon Alexa, Dolby, Sonos, Roku, AssemblyAI, etc., is a strong plus
· Familiarity with ONNX, ffmpeg, NVIDIA Triton also welcome
What You Will Be Doing
Tech Breakdown
· Audio ML and signal processing
· Intent recognition and sequence classification
· Multimodal fusion (audio + video input modeling)
· System optimization and integration
Daily Responsibilities
· 80% Hands-on model development, training, tuning
· 20% Cross-functional team collaboration, experimentation, and architecture discussions
The Offer
· equity eligible
You will receive the following benefits:
· Medical, Dental, and Vision Insurance
· Paid Vacation & Holidays
· Generous Equity Package

Applicants must be currently authorized to work in the US on a full-time basis now and in the future.

Posted by: Casey Ryan