Individual scenarios let you validate one behaviour at a time. A Suite groups
related scenarios so you can run them all with a single await, get a unified
pass/fail report, and pinpoint exactly which scenario and which check broke.
Write one scenario per behaviour you want to verify. Keeping scenarios focused
on a single capability makes failure reports precise โ when order_lookup fails
you know immediately which feature broke.
from giskard.checks import Scenario, FnCheck, StringMatching
Use Suite to group the four scenarios and run them in one call. The suite
runs scenarios serially and returns a SuiteResult with a unified pass/fail
summary, per-scenario results, and a total duration.
from giskard.checks import Suite
suite =(
Suite(name="chatbot_suite")
.append(greeting_scenario)
.append(order_lookup_scenario)
.append(return_policy_scenario)
.append(empty_input_scenario)
)
result =await suite.run()
result.print_report()
Output
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Suite Results โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ....โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Summary: 4 total, 4 passed | Pass Rate: 100.0% | Total Duration: 16ms
| Attribute | Type | What it contains |
| ----------------- | ----------------------- | ---------------------------------------- |
| results | list[ScenarioResult] | One entry per scenario, in order |
| pass_rate | float | Fraction of scenarios that passed |
| duration_ms | int | Total wall-clock time in milliseconds |
Iterate over results to build a readable report:
passed =sum(1for r in result.results if r.passed)
total =len(result.results)
print(f"Suite: {passed}/{total} passed ({result.pass_rate:.0%}) in {result.duration_ms} ms\n")
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ PASSED โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโresponds_with_greetingPASSโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Trace โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Interaction 1 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Inputs: 'Hello there'
Outputs: 'Hello! How can I help you today?'โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1 step in 0ms | runs: 1/1 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ PASSED โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโorder_id_echoedPASSdelivery_estimate_givenPASSโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Trace โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Interaction 1 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Inputs: 'Where is my order #12345?'
Outputs: 'Order #12345? is on its way and will arrive in 2โ3 days.'โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1 step in 9ms | runs: 1/1 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ PASSED โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโmentions_30_daysPASSโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Trace โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Interaction 1 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Inputs: 'Can I return an item?'
Outputs: 'You can return any item within 30 days for a full refund.'โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1 step in 4ms | runs: 1/1 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ PASSED โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโhandles_empty_inputPASSโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Trace โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Interaction 1 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Inputs: ''
Outputs: "I didn't receive a message. Could you please try again?"โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1 step in 0ms | runs: 1/1 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
When a scenario fails you need to know which check broke and what it saw. Each
ScenarioResult has a steps list โ one StepResult per .interact() call.
Each step has a results list of CheckResult objects.
To see this in action, build a scenario with a deliberate bug โ the expected
keyword is wrong so the check will always fail:
buggy_scenario =(
Scenario("buggy_greeting")
.interact(
inputs="Hello there",
outputs=lambdainputs:chatbot(inputs),
)
.check(
StringMatching(
name="wrong_keyword",
keyword="Howdy",# chatbot never says this
text_key="trace.last.outputs",
)
)
)
debug_suite =Suite(name="debug_suite")
debug_suite.append(buggy_scenario)
debug_result =await debug_suite.run()
debug_result.print_report()
Output
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Suite Results โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโF
==================================================== FAILURES =====================================================
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ buggy_greeting โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ FAILED โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโwrong_keywordFAIL The answer does not contain the keyword 'Howdy' โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Trace โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Interaction 1 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Inputs: 'Hello there' โโ Outputs: 'Hello! How can I help you today?' โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1 step in 4ms | runs: 1/1 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
===================================================== SUMMARY =====================================================
buggy_greetingFAILwrong_keywordFAIL The answer does not contain the keyword 'Howdy'โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Summary: 1 total, 1 failed | Pass Rate: 0.0% | Total Duration: 4ms
Real projects often have many similar test cases that differ only in their
inputs and expected outputs. Writing one scenario per case by hand doesnโt
scale. Instead, keep your test data in a list and generate scenarios
programmatically.
Here is the data-driven pattern: define a list of (name, input, keyword)
tuples and build a Scenario for each one in a loop:
test_cases =[
("greeting_hello","Hello!","Hello"),
("greeting_hi","Hi there","Hello"),
("greeting_hey","Hey!","Hello"),
("order_99","Status of order #99?","99"),
("order_777","Track order #777 please","777"),
("return_query","I want to return something","30 days"),
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Suite Results โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ.......โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Summary: 7 total, 7 passed | Pass Rate: 100.0% | Total Duration: 98ms
This pattern scales to hundreds of cases without any extra boilerplate. You can
load test_cases from a CSV, a YAML file, or a database โ the suite-building
loop stays the same.
You now know how to organise scenarios into suites and debug failures. The next
step is integrating suites into your CI pipeline so they run automatically on
every pull request: