Company / 2025–Present
AI / LLM Testing
Designing and executing structured evaluation frameworks for LLM-powered features in production — testing AI behaviour, reliability, and hallucination risk.
The Mission
To bring rigorous, structured quality engineering into AI systems — where traditional test approaches break down and new evaluation methodologies are required.
With AI features increasingly entering the production stack at my current company, I introduced a structured approach to LLM evaluation covering prompt consistency, output reliability, edge case behaviour, and hallucination detection. This involved building custom evaluation harnesses, defining quality criteria for non-deterministic outputs, and collaborating with ML engineers to establish feedback loops that improve model behaviour over time.
Status
Overview
Impact
Established a repeatable LLM testing practice from scratch — improving stakeholder confidence in AI feature releases and reducing unpredictable output incidents in production.
Category
Company