Home/AI Innovation/Fine-Tuned SLM

Fine-Tuned SLM

— Domain-Specific Mistral 7B, Trained on Customer Data

Open Live Demo ↗

http://localhost:7860/gradio

Open ↗

Click to open live demo

http://localhost:7860/gradio

Key Capabilities

Fine-tuned Mistral 7B Instruct v0.3 with LoRA (rank 16, 4-bit quantization)
Training data: 53 PDFs, 575 pages, 2,276 Q&A pairs extracted and synthesized
GGUF quantization (Q5_K_M) for efficient local inference via llama.cpp
Runs on Apple Silicon (M4 Max) — no cloud GPU required
Complete pipeline: PDF extraction, Q&A synthesis, training, quantization, deployment
Replicable for any customer domain: supply their documents, produce their SLM

Overview

A Mistral 7B Instruct model fine-tuned with LoRA on 53 internal policy PDFs (575 pages, 2,276 Q&A pairs), quantized to GGUF Q5_K_M. Runs locally via llama.cpp on Apple Silicon. Data never leaves the environment — a core differentiator for regulated industries.

By the Numbers

Data LeakageZero

InferenceOn-Prem

PipelineEnd-to-End

Tech Stack

Mistral 7BLoRA / PEFTllama.cppGGUF Q5_K_MGradio + FastAPISentence Transformers

Links

Local Instance↗