lukasGPT — char-level demo

How it works

This page runs a small generative language model entirely in your browser. No server, no API call, no data leaving your device.

The model

A character-level transformer with ~12M parameters: 6 layers, 6 attention heads, a 384-dimensional residual stream, and a 1024-character context window. Trained from scratch on a subset of Project Gutenberg books published before 1919 — that's why the prose tends to sound vaguely Victorian.

Tokens

Each colored box above is one token — the model's atomic unit of input and output. Tokens are character-level here, so every box is a single letter, digit, punctuation mark, or piece of whitespace. The underlined ones are what you typed; the rest were sampled one at a time by the model. Adjacent tokens get cycling background colors so the boundaries are visible.

Browser-side inference

The trained PyTorch model is exported to ONNX and loaded by ONNX Runtime Web. The browser runs the actual matrix multiplications via WebGPU (when available) or WASM SIMD. Sampling — temperature, top-k, optional lookahead — is implemented in plain JavaScript and drives the model in a tight autoregressive loop, one token per forward pass.

Lookahead sampling

When the lookahead checkbox is on, instead of greedily sampling one token at a time, the sampler expands a depth-N tree with branching factor K, ranks all K^N candidate paths by their joint probability, and picks one weighted by temperature. Trades wall-clock for more coherent output — useful for a small char-level model where greedy sampling drifts quickly.

Smart learning rate

Training uses a linear warmup over the first 100 iterations followed by ReduceLROnPlateau — the learning rate only drops when validation loss stops improving for several eval intervals, rather than following a predetermined decay schedule. The loss curve and lr staircase are both logged to TensorBoard during training.

Tech stack

Training in PyTorch with mixed char/BPE tokenization, plateau LR schedule, and TensorBoard for monitoring
Export via torch.onnx.export with a dynamic time axis
Runtime: ONNX Runtime Web (WebGPU + multi-threaded WASM SIMD, with a service worker injecting COOP/COEP headers so SharedArrayBuffer works on GitHub Pages)
Hosting: GitHub Pages — static, free, no backend
Streamlit dashboard for inspecting checkpoint embeddings and token vocabularies during training