Learn to evaluate AI methodically.

Two-day workshop with a concrete procedure model for test leads who have to evaluate AI-assisted features or full AI systems. Together we set up a local AI environment, build a small demo application with an integrated AI assistant using Claude Code, and run through the test categories functional, guardrails, adversarial and localisation on it. Prerequisite: a solid understanding of AI, development environments and test methodology (more on this in the FAQ).

Request the workshop
Why this workshop

Four reasons to learn the procedure model.

01

Concrete procedure model

From setup through demo application to reporting, step by step. You take a reproducible model back to your team and put it to use right away.

02

Your own local AI

oLLama or LM Studio set up locally, a model running on your own machine. You test free from API limits, data-protection questions and external drift.

03

Test categories hands-on

Functional, guardrails, adversarial and localisation tests are practised on your own demo system. Not dry theory, but application on a real AI setup.

04

Audit-ready reporting

Structured documentation that can justify a release decision in a clean way. Templates with AI-Act and ISO-42001 anchors are part of the takeaway.

What stays with you

Lifetime access to the online course material.

After the workshop your login keeps working. Whenever we update the material, you get the new version automatically. No expiry, no re-booking.

Learning outcomes

What participants can do after the workshop.

Apply a coherent procedure model for testing AI systems and transfer it to your own projects
Set up a local AI environment with oLLama or LM Studio and use it for structured tests
Plan and run test activities methodically in the categories functional, guardrails, adversarial and localisation
Rehearse security-relevant scenarios systematically: prompt injection, red teaming, bias and input manipulation
Document eval results in a way that release decisions rest on them and stay traceable in an AI-Act audit
Audience
Test/quality leads, QA leads, product owners for AI features
Duration
2 days as a block
Participants
max. 12 people
Format
On site or hybrid
Workshop content

Seven building blocks over two days.

Block 01 · Day 1

Procedure model & eval mindset

What makes AI testing structurally different and what a coherent procedure model from setup to reporting looks like. Risk-based test thinking and the relation to the AI Act and ISO 42001.

  • Distinguish AI testing from classical testing and place non-determinism
  • Understand the procedure phases setup, demo, strategy, execution and reporting
  • Apply risk-based test prioritisation to your own domain
  • Place AI-Act and ISO-42001 anchors as the framing of the procedure
Block 02 · Day 1

Set up a local AI environment

Hands-on setup of a local AI environment with oLLama or LM Studio. Run a local model, understand the inference layer, prepare the API connection for the demo app.

  • Install and configure oLLama or LM Studio on your own machine
  • Load and steer a local model like Llama or Mistral
  • Use the local inference API from test scripts
  • Place the data-protection benefits of local AI against external APIs
Block 03 · Day 1

Demo application with Claude Code

With Claude Code we build a small demo homepage with an integrated AI assistant together. Understand the layered architecture, connect to the local model, prepare test hooks for the following blocks.

  • Work productively with Claude Code in your own development environment
  • Build a demo homepage with chat UI and backend endpoints
  • Connect the AI assistant to the local model and structure system messages
  • Set up test hooks and logging for the following test blocks
Block 04 · Day 2

Test strategy & eval concept

Before we test, the concept: what do I test, which risks sit at the top, which metrics and thresholds apply, and how does all of this become repeatable.

  • Derive an eval set from real requirements and edge cases
  • Select a metric mix for functional, security, quality and performance dimensions
  • Define thresholds and acceptance criteria for release decisions
  • Use test-automation frameworks such as Promptfoo, deepeval or RAGAS as a repeatability lever
  • Understand model, prompt and eval-set versioning as the basis for reproducibility
Block 05 · Day 2

Functional tests & guardrails

First tests on the demo app: does the AI work as expected and do the protective layers hold? Use-case coverage, groundedness of answers, guardrails and policy checks.

  • Run functional tests against the demo application in a structured way
  • Measure answer quality against groundedness and factual fidelity
  • Treat system prompts and guardrails as testable components
  • Probe policy checks and filter layers deliberately and document the results
Block 06 · Day 2

Adversarial scenarios & red teaming

Put the AI deliberately under stress: prompt injection, jailbreak attempts, manipulative inputs, bias probing. Red teaming as a repeatable process rather than a one-off check.

  • Reproduce and document prompt-injection and jailbreak patterns
  • Set up red teaming as a repeatable process with clear escalation triggers
  • Probe bias systematically against protected attributes
  • Test input manipulation against weak validations in the backend
Block 07 · Day 2

Localisation, drift & reporting

Last test dimensions and the close: localisation across languages, non-functional tests, drift monitoring after release, and eval reporting in audit-ready form.

  • Treat multilingual behaviour and cultural adaptation as their own test dimension
  • Place performance, latency and cost as non-functional test dimensions
  • Sketch drift monitoring after release with triggers for re-evaluation
  • Store eval reporting in audit-ready form, with AI-Act and ISO-42001 anchors
  • Take templates and sample reports along for further use
Investment

What it costs.

€ 999
per participant, net
Duration
2 days as a block
Group size
bis 12 participants
Format
On site or remote

What is included

  • Live workshop with trainer
  • Eval framework templates for your use cases
  • Hallucination and bias check templates
  • Confirmation of attendance

Discounts

Early-bird discount 10% when booking more than 30 days before the date. Group discount 10% from 5 participants registered together.

In-house training?

For an on-site learning environment and a format that you can run repeatedly for your colleagues, we deliver this workshop in-house as well. Reach out for a tailored offer.

Questions

What we are often asked.

What prior knowledge do I need?

We assume a solid understanding of AI, development environments and test methodology. An ISTQB Foundation Level certification is ideal but not required formally. If you are still unsure about the AI part, the Workshop: AI fundamentals — intensive gives you the right foundation first. Without this prior knowledge the workshop moves too fast and the depth does not land.

What do I need on my machine technically?

Your own laptop with a current development environment (Node.js and a code editor are enough), internet access, and enough memory for a local AI model. 8 GB RAM is the minimum, 16 GB is more comfortable. oLLama and LM Studio are free. Claude Code is provided in the workshop.

Do I need programming skills?

A basic understanding of web development helps, for example HTML, JavaScript and the handling of simple APIs. With Claude Code we work conversationally, so you do not need to sit deep in code. You should find your way in a development environment and place concepts like API calls, backend and frontend.

How does this relate to the "Testing AI" consulting offer?

The consulting develops a test concept for a specific product together with your team. The workshop here builds the team's capability to do this on its own going forward, with a concrete procedure model as the anchor.

From setup to audit reporting.

Practical procedure model for 4 to 12 participants, remote or on site. Your own local AI environment, a demo application built with Claude Code, all test categories hands-on, ready-to-use reporting templates as takeaway.

Request the workshop
info@qct.de · +49 (2826) 999 3201
More from the portfolio

Maybe a different pillar fits your situation better.

QCT – Dein Experte für Testmanagement, Softwarequalität und digitale Transformation

QCT Logo in Negativ-Darstellung für dunkle Hintergründe