Home/AI Innovation/Fine-Tuned SLM

Fine-Tuned SLM

http://localhost:7860/gradio
Open ↗

Click to open live demo

http://localhost:7860/gradio

Key Capabilities

  • Fine-tuned Mistral 7B Instruct v0.3 with LoRA (rank 16, 4-bit quantization)
  • Training data: 53 PDFs, 575 pages, 2,276 Q&A pairs extracted and synthesized
  • GGUF quantization (Q5_K_M) for efficient local inference via llama.cpp
  • Runs on Apple Silicon (M4 Max) — no cloud GPU required
  • Complete pipeline: PDF extraction, Q&A synthesis, training, quantization, deployment
  • Replicable for any customer domain: supply their documents, produce their SLM

Overview

A Mistral 7B Instruct model fine-tuned with LoRA on 53 internal policy PDFs (575 pages, 2,276 Q&A pairs), quantized to GGUF Q5_K_M. Runs locally via llama.cpp on Apple Silicon. Data never leaves the environment — a core differentiator for regulated industries.

By the Numbers

Data LeakageZero
InferenceOn-Prem
PipelineEnd-to-End

Tech Stack

Mistral 7BLoRA / PEFTllama.cppGGUF Q5_K_MGradio + FastAPISentence Transformers