Skip to Content

Nabla: Differentiable Programming in Mojo

A Research Preview

Diagram illustrating Nabla's Imperative and Functional API modes

Nabla Engine

Hello World

Nabla brings JIT-accelerated Automatic Differentiation (AD) to the Mojo programming language 🔥, a vital technique for gradient-based optimization and physics simulations. Nabla is still limited to CPU execution; our plan is to achieve full GPU integration by Q3 2025.

Explore Usage

Documentation & Examples

Currently, Nabla executes all programs lazily, which - in combination with Mojo's unique memory management capabilities - allows for two quite different programming styles within one framework: functional programming with JAX-like transformations (e.g. vmap, grad, jit) as well as PyTorch-like imperative programming.


Engineered for general purpose Scientific Computing

Familiar PyTorch-like API
Powerful function transforms (vmap, jit, grad)
High performance via MAX (initially CPU, with GPU support as a key development focus)
Educative, open-source & lean codebase
import nabla

# Init params with gradient computation enabled
weight = nabla.randn((3, 4), DType.float32, requires_grad=True)
bias = nabla.randn((2, 4), DType.float32, requires_grad=True)
label = nabla.randn((2, 4), DType.float32)
input = nabla.randn((2, 3), DType.float32)

# Compute forward pass (single layer MLP)
logits = nabla.relu(input @ weight + bias)
loss =  nabla.sum((logits - label) ** 2)
print("Loss:", loss)

# Backward pass to compute gradients
loss.backward()

# Update parameters à la SGD
weight -= 0.01 * weight.grad()
bias -= 0.01 * bias.grad()
print("weight:", weight, "bias:", bias)
1/2

'Why another ML framework?'

Nabla leverages Mojo's unique combination of performance and type/memory safety to address limitations inherent to Python-based SciComp libraries. Built on top of MAX (Modular's hardware-agnostic, high-performance Graph compiler), Nabla empowers researchers to tackle the most complex challenges in scientific simulation, mechanistic interpretability, and large-scale training.

Unlike frameworks that retrofit JIT onto eager systems (like PyTorch’s Dynamo), Nabla adopts a slightly different approach: We started this project with building a dynamic compilation system on top of MAX first (initially for CPU targets), then added full AD support (forward/reverse modes), and are now integrating eager execution. This order avoids architectural dead ends and yields a uniquely modular and performant system (with GPU support for Nabla in Mojo via MAX actively under development, building on this foundation).

Abstract graphic representing neural network training

Train complex Models

Train models with a familiar PyTorch-like style, leveraging Nabla's dynamic compilation for high performance.

Abstract graphic representing function transformations

Compose Transformations

Use composable transforms (vmap, jit, grad) – powerful tools familiar from frameworks like JAX.

Abstract graphic representing custom kernel integration

Integrate Custom Ops

Integrate specialized differentiable CPU kernels, gaining fine-grained control unseen in Python (GPU kernel integration via MAX is planned).

Mojo language logo representing ecosystem integration

Leverage a broad Ecosystem

Benefit from seamless integration with Mojo's growing ecosystem for cutting-edge high-performance computing.


Join us

Connect with researchers and developers. Discuss features, share use cases, report issues, and contribute to the future of high-performance Scientific Computing.

Discussions on GitHub
Nabla 2025