AI & HPC Consulting

AI & HPC systems,
engineered to scale.

Voxel is an independent consultancy building the infrastructure behind modern AI — GPU clusters, SLURM scheduling, and high-performance inference systems that hold up under real load.

Start a project→View our work

Built on proven infrastructure

What we do

Deep expertise across the AI compute stack.

From bare-metal GPU clusters to the dashboards your team lives in, we cover the full path from hardware to serving.

AI Infrastructure

Design and deploy GPU clusters, model fine-tuning, and model-serving platforms built to scale from prototype to production.

GPU cluster architecture
Model fine-tuning
vLLM / inference serving

HPC & Scheduling

Stand up and tune SLURM-based high-performance computing environments with the observability and reliability research teams depend on.

SLURM deployment & tuning
Job scheduling strategy
Cluster observability

Performance & Scale

Squeeze every cycle out of your hardware. We profile, benchmark, and re-architect systems for low latency and high throughput.

Latency optimization
Throughput at scale
High-availability gateways

Platform Engineering

From CI/CD to internal tooling and dashboards, we build the developer-facing layer that makes complex infrastructure usable.

Internal tooling
Monitoring dashboards
Automation & IaC

Also in our wheelhouse

Beyond the core stack.

Infrastructure is where we go deep — but we also build the software and workflows that sit on top of it.

Science gateway design & creation
AI workflow orchestration
Dashboard design & creation
API design & implementation
Data pipelines & ETL
Cluster observability & monitoring
CI/CD & automation
Internal developer tooling

Selected work

Tools we build & maintain.

@thediymaker

Open Source · 75★

Slurm Node Dashboard

A powerful Next.js dashboard for monitoring SLURM-based HPC clusters — real-time CPU/GPU utilization, job history & analytics, Prometheus metrics, and an AI-powered chat assistant.

Next.jsTypeScriptSLURMPrometheus

View repositoryslurmdash.com

Source Available · 7★

Obleth

A fairshare-first AI gateway for shared LLM clusters — multi-tenant auth, weighted admission under contention, token-accurate usage accounting, auto model routing, and an operator dashboard. OpenAI-compatible; composes with vLLM, Aibrix, and any upstream provider.

RustFairshareMulti-tenantOpenAI Compatible

View repositoryobleth.com

How we work

A focused process, no wasted cycles.

Discover

We start with your workloads, constraints, and goals — mapping the real bottlenecks before writing a line of code.

Architect

A pragmatic plan for compute, scheduling, and serving — designed around your hardware budget and reliability targets.

Build

Hands-on implementation with the tooling, observability, and automation your team needs to operate it confidently.

Scale

We tune for throughput and cost, then hand off clean, documented systems — or stay on to help you grow.

Let's build something that scales.

Whether you're standing up your first GPU cluster or scaling inference to thousands of requests per second — let's talk.

support@voxellc.com GitHub

AI & HPC systems,engineered to scale.