Learn to evaluate AI methodically.
Two-day workshop with a concrete procedure model for test leads who have to evaluate AI-assisted features or full AI systems. Together we set up a local AI environment, build a small demo application with an integrated AI assistant using Claude Code, and run through the test categories functional, guardrails, adversarial and localisation on it. Prerequisite: a solid understanding of AI, development environments and test methodology (more on this in the FAQ).
Request the workshop→Four reasons to learn the procedure model.
Concrete procedure model
From setup through demo application to reporting, step by step. You take a reproducible model back to your team and put it to use right away.
Your own local AI
oLLama or LM Studio set up locally, a model running on your own machine. You test free from API limits, data-protection questions and external drift.
Test categories hands-on
Functional, guardrails, adversarial and localisation tests are practised on your own demo system. Not dry theory, but application on a real AI setup.
Audit-ready reporting
Structured documentation that can justify a release decision in a clean way. Templates with AI-Act and ISO-42001 anchors are part of the takeaway.
Lifetime access to the online course material.
After the workshop your login keeps working. Whenever we update the material, you get the new version automatically. No expiry, no re-booking.
What participants can do after the workshop.
Seven building blocks over two days.
Procedure model & eval mindset
What makes AI testing structurally different and what a coherent procedure model from setup to reporting looks like. Risk-based test thinking and the relation to the AI Act and ISO 42001.
- Distinguish AI testing from classical testing and place non-determinism
- Understand the procedure phases setup, demo, strategy, execution and reporting
- Apply risk-based test prioritisation to your own domain
- Place AI-Act and ISO-42001 anchors as the framing of the procedure
Set up a local AI environment
Hands-on setup of a local AI environment with oLLama or LM Studio. Run a local model, understand the inference layer, prepare the API connection for the demo app.
- Install and configure oLLama or LM Studio on your own machine
- Load and steer a local model like Llama or Mistral
- Use the local inference API from test scripts
- Place the data-protection benefits of local AI against external APIs
Demo application with Claude Code
With Claude Code we build a small demo homepage with an integrated AI assistant together. Understand the layered architecture, connect to the local model, prepare test hooks for the following blocks.
- Work productively with Claude Code in your own development environment
- Build a demo homepage with chat UI and backend endpoints
- Connect the AI assistant to the local model and structure system messages
- Set up test hooks and logging for the following test blocks
Test strategy & eval concept
Before we test, the concept: what do I test, which risks sit at the top, which metrics and thresholds apply, and how does all of this become repeatable.
- Derive an eval set from real requirements and edge cases
- Select a metric mix for functional, security, quality and performance dimensions
- Define thresholds and acceptance criteria for release decisions
- Use test-automation frameworks such as Promptfoo, deepeval or RAGAS as a repeatability lever
- Understand model, prompt and eval-set versioning as the basis for reproducibility
Functional tests & guardrails
First tests on the demo app: does the AI work as expected and do the protective layers hold? Use-case coverage, groundedness of answers, guardrails and policy checks.
- Run functional tests against the demo application in a structured way
- Measure answer quality against groundedness and factual fidelity
- Treat system prompts and guardrails as testable components
- Probe policy checks and filter layers deliberately and document the results
Adversarial scenarios & red teaming
Put the AI deliberately under stress: prompt injection, jailbreak attempts, manipulative inputs, bias probing. Red teaming as a repeatable process rather than a one-off check.
- Reproduce and document prompt-injection and jailbreak patterns
- Set up red teaming as a repeatable process with clear escalation triggers
- Probe bias systematically against protected attributes
- Test input manipulation against weak validations in the backend
Localisation, drift & reporting
Last test dimensions and the close: localisation across languages, non-functional tests, drift monitoring after release, and eval reporting in audit-ready form.
- Treat multilingual behaviour and cultural adaptation as their own test dimension
- Place performance, latency and cost as non-functional test dimensions
- Sketch drift monitoring after release with triggers for re-evaluation
- Store eval reporting in audit-ready form, with AI-Act and ISO-42001 anchors
- Take templates and sample reports along for further use
What it costs.
What is included
- Live workshop with trainer
- Eval framework templates for your use cases
- Hallucination and bias check templates
- Confirmation of attendance
Discounts
Early-bird discount 10% when booking more than 30 days before the date. Group discount 10% from 5 participants registered together.
In-house training?
For an on-site learning environment and a format that you can run repeatedly for your colleagues, we deliver this workshop in-house as well. Reach out for a tailored offer.
What we are often asked.
What prior knowledge do I need?
We assume a solid understanding of AI, development environments and test methodology. An ISTQB Foundation Level certification is ideal but not required formally. If you are still unsure about the AI part, the Workshop: AI fundamentals — intensive gives you the right foundation first. Without this prior knowledge the workshop moves too fast and the depth does not land.
What do I need on my machine technically?
Your own laptop with a current development environment (Node.js and a code editor are enough), internet access, and enough memory for a local AI model. 8 GB RAM is the minimum, 16 GB is more comfortable. oLLama and LM Studio are free. Claude Code is provided in the workshop.
Do I need programming skills?
A basic understanding of web development helps, for example HTML, JavaScript and the handling of simple APIs. With Claude Code we work conversationally, so you do not need to sit deep in code. You should find your way in a development environment and place concepts like API calls, backend and frontend.
How does this relate to the "Testing AI" consulting offer?
The consulting develops a test concept for a specific product together with your team. The workshop here builds the team's capability to do this on its own going forward, with a concrete procedure model as the anchor.
From setup to audit reporting.
Practical procedure model for 4 to 12 participants, remote or on site. Your own local AI environment, a demo application built with Claude Code, all test categories hands-on, ready-to-use reporting templates as takeaway.
Request the workshop→Maybe a different pillar fits your situation better.
Quality Consulting
Strategie, Methodik, Frameworks für belastbare Qualität. Audits, Konzepte, AI-Compliance.
→Quality Services
Operative Test-Manpower, Interim-Testmanagement und Vermittlung aus dem Fachnetzwerk.
→Quality Education
Workshops, Schulungen und 1:1-Coaching für Test-, Projekt- und KI-Compliance-Themen.
→CT Map
Übersicht aller drei QCT-Säulen mit Wegweiser zu deinem passenden Einstiegspunkt.
→