Applied Scientist - AI Inference (Agentic AI startup)

Applied Scientist - AI Inference (Agentic AI startup)

NinjaTech AI | Council of the City of Sydney, NSW, AU

Posted 8 days ago

Apply Now

Description

Applied Scientist - AI Inference (Agentic AI startup)

We invite you to join NinjaTech AI as an Applied Scientist specialized in AI inference and distributed systems to help optimize and scale our AI models for production environments.

You will work at the intersection of deep learning and systems engineering, focusing on optimizing our inference infrastructure that powers millions of user interactions daily.

About Us

NinjaTech AI is a generative AI startup (B2C and B2B) with headquarters in Silicon Valley and offices in Sydney and Vancouver. Our Engineering team consists mostly of former AWS engineers; our Product and UX team is ex‑Google. Backed by Alexa Fund and Samsung Ventures, we're on track to raise Series A funding in 2025.

Our flagship product, SuperNinja, is an advanced agentic AI platform that offers full OS capabilities, performing website creation, end‑to‑end coding, advanced data analysis, and more.

This is a full‑time onsite position, based at our Sydney office (great office space, free meals). We work in a fast‑paced collaborative environment and iterate quickly.

Why Join NinjaTech AI

This is a unique opportunity for a motivated Applied Scientist to join our LLM Inference optimization team and help us build the foundation of next‑generation AI agent systems.

The role focuses on critical infrastructure and optimization techniques that enable autonomous agents to operate efficiently at scale, with significant contributions to an emerging field.

What Makes You a Strong Match

  • Experience in practical R&D (prototyping based on publication and literature review)
  • Proven ability to solve real‑world problems using cutting‑edge ideas and independent research
  • Strong problem‑solving skills for design, creation, and testing of custom inference systems
  • Adept at adapting academic ideas and theoretical algorithms into production systems
  • Experience with hardware accelerators and specialized AI chips
  • Knowledge of model serving frameworks and inference optimization techniques
  • Experience with large language models and their deployment challenges

Key Challenges You will Work On

  • Research and develop novel techniques for optimizing LLM inference, focusing on latency, throughput, and resource efficiency
  • Design and implement distributed inference architectures that scale efficiently across multiple GPUs and nodes
  • Develop and optimize memory management techniques for large language models, including attention mechanisms and KV cache strategies
  • Research and implement quantization methods to reduce model size while preserving quality
  • Explore speculative decoding and other algorithmic optimizations to improve inference speed
  • Collaborate with engineering to integrate innovations into production systems
  • Benchmark and evaluate different optimization approaches against key performance metrics
  • Stay current with the latest research in LLM inference optimization and contribute to the field through publications and open‑source contributions

Experience Requirements

  • Master's or PhD in Computer Science, Machine Learning, or a related field
  • Strong publication record (preferred)
  • 1+ years of industry experience (can be before or after PhD)
  • Strong proficiency in Python and PyTorch
  • Experience with GPU programming and optimization
  • Track record of solving complex technical problems with innovative approaches

Day-to-Day Responsibilities

  • Research, design, and build high-performance inference systems for autonomous AI agents
  • Reproduce cutting‑edge innovations from academic literature in model optimization and build upon research findings
  • Build rapid prototypes and proof of concepts to turn ideas into product features and infrastructure
  • Stay up-to-date with the latest advancements in AI inference, quantization, pruning, and distributed systems
  • Design and implement efficient inference pipelines that operate at scale
  • Collaborate with cross‑functional teams including engineering, product, and design

Benefits

  • Annual health insurance subsidy
  • Superannuation
  • Paid time off (vacation, sick & holidays)
  • Paid lunches when you work on‑site
  • Stock Option Plan

Seniority Level

  • Entry level

Employment Type

  • Full‑time

Job Function

  • Research, Analyst, and Information Technology

Industries

  • Technology, Information and Internet
#J-18808-Ljbffr