QA in the age of non-determinism

QA-Experte testet KI-Systeme anhand von Streuselkuchen als Metapher für Nichtdeterminismus

What streusel cake has to do with testing AI systems

“To infinity and beyond!” Systems with artificial intelligence raise high expectations. But that has to be tested too.

And while plenty of voices already cry that AI will soon do everything on its own, let’s stay relaxed for the moment. We are not there yet. Testing these systems (by actual humans) remains an essential part of the development cycle, perhaps more than ever. The way these systems work means we have to test differently. And for that, QA professionals first need to redefine what “expected result” means.

QA expert testing AI systems with streusel cake as a metaphor for non-determinism

When you test classic software, the procedure resembles a mathematical equation. You feed the program a defined set of identical input values, and if everything works correctly, you get exactly the result you expected. Every time. Test case passed, status green, product done, the release date can come.

AI works differently.

The challenge is called: non-determinism.

What does that mean?

Non-determinism means the system does not deliver an exactly reproducible result, even if you repeat the same input under apparently identical conditions. Each result can still be semantically correct in terms of the desired intent.

With modern AI systems, especially assistants and agent-based solutions running on large language models, it is more like baking. You can bake a streusel cake several times, with the same recipe or different ones. One tastes moister, the streusel on another is bigger. In each case “a streusel cake” is the correct expected result, but they are not 100 percent identical. Does it matter? If you test elements of the cake against a specific streusel shape and size, your test result will probably read failed. Doesn’t match expectations. If you test against “looks like a streusel cake”, “has streusel covering the surface” and “tastes like a streusel cake”, you’ll almost certainly get a passed result and be satisfied.

What does this have to do with AI? Tell your AI five times to create an image of a streusel cake and you will get five outputs that look similar but never identical, while almost always showing a streusel cake. Is that result correct for you? It depends on how you defined the expected result, and by extension on the test strategy you decided on.

And that’s exactly what my new article series is about:

How do you test systems whose results aren’t reliably identical and yet meet a correct expectation?

How do you turn apparent unpredictability into manageable, demonstrable quality?

Because that is what counts.

A typical example: imagine an AI assistant that understands speech, infers intent and then triggers functions in connected systems, via APIs, workflows, databases, ticket systems, devices, services or other business logic. This kind of agentic AI setup fills the LinkedIn feed of many of us several times a day at the moment. Solutions promise AI helpers for customer service, banking, healthcare, industry, internal IT, HR, procurement. Same message everywhere: the assistant is the master solution for your business case and will handle most things for you autonomously. The catch: it doesn’t only talk, it acts, and whether you see that as a curse or a blessing, it acts non-deterministically. This is where one of the great new challenges for QA practitioners begins, and it lifts the discipline to a new level.

Stay tuned, article 1 goes live on Monday 19/01/2026 at 12:00, then weekly from there.

In article 1 we cover the basics you need before any sensible discussion of a test strategy for an AI product: what is non-determinism? What does “semantics” mean in a testing context? And how do you build evaluation logic for “expected result” when “exact wording” is no longer a usable acceptance criterion?

The second article deals with the theoretical infinity of scenarios in end-to-end testing of such AI assistants, and how to keep the black-box parts of an LLM under control.

From article 3 onwards it gets restrictive and maybe slightly philosophical. Guard rails and ethics are at the centre of this part.

If that’s not enough and you think globally, the fourth piece will catch your interest, when we add the localisation multiplier (languages and markets) on top of an already near-infinite scenario space.

To wrap things up, in edition 5 I lay out strategies and approaches to make this infinity manageable. Buzz Lightyear can pack his bags. We don’t need to push to infinity, and certainly not beyond. It just isn’t efficient.

Once published, the articles will be linked here directly. Enjoy reading.

Your Chrizz 😉

Share

QCT – Dein Experte für Testmanagement, Softwarequalität und digitale Transformation

QCT Logo in Negativ-Darstellung für dunkle Hintergründe