← Back

Company / 2025–Present

AI / LLM Testing

Designing and executing structured evaluation frameworks for LLM-powered features in production — testing AI behaviour, reliability, and hallucination risk.

The Mission

To bring rigorous, structured quality engineering into AI systems — where traditional test approaches break down and new evaluation methodologies are required.

With AI features increasingly entering the production stack at my current company, I introduced a structured approach to LLM evaluation covering prompt consistency, output reliability, edge case behaviour, and hallucination detection. This involved building custom evaluation harnesses, defining quality criteria for non-deterministic outputs, and collaborating with ML engineers to establish feedback loops that improve model behaviour over time.

Status

Overview

Impact

Established a repeatable LLM testing practice from scratch — improving stakeholder confidence in AI feature releases and reducing unpredictable output incidents in production.

Category

Company

Core Technologies

LLM Evaluation

AI Testing

Prompt Engineering

Hallucination Detection

Python

Test Harness Design

Non-Deterministic Testing