Skip to content
GitHubDiscord

Checks

Ready-to-use validation checks for common testing scenarios, including function-based checks, string matching, comparisons, and LLM-powered semantic validation.


Create a check from a callable function. Perfect for quick prototyping and simple validation logic. Not reliably serializable — intended for programmatic/test use only.

Module: giskard.checks.builtin.fn

fn Callable Required

Function taking a Trace, returning bool or CheckResult.

name str | None Default: None

Optional check name.

description str | None Default: None

Optional description.

success_message str | None Default: None

Message when check passes.

failure_message str | None Default: None

Message when check fails.

details dict[str, Any] Default: {}

Additional details to include in result.

from giskard.checks import FnCheck
# Simple boolean check
check = FnCheck(
fn=lambda trace: trace.last.outputs is not None,
name="has_output",
success_message="Output was provided",
failure_message="No output found",
)
# Async check
async def validate_response(trace):
is_valid = await external_validator(trace.last.outputs)
return is_valid
check = FnCheck(fn=validate_response, name="async_validation")

Check that validates string patterns (substring matching) in trace values.

Module: giskard.checks.builtin.text_matching

keyword str | None

Substring to search for (or use keyword_key to extract from trace).

keyword_key str | None

JSONPath to extract keyword from trace.

text str | None

JSONPath-resolved text to search in. If unset, falls back to text_key.

text_key str Default: "trace.last.outputs"

JSONPath to extract text to search in.

normalization_form Literal["NFC", "NFD", "NFKC", "NFKD"] | None Default: "NFKC"

Unicode normalization applied before matching.

case_sensitive bool Default: True

Whether matching is case-sensitive.

from giskard.checks import StringMatching
check = StringMatching(keyword="success", text_key="trace.last.outputs")
# Case-insensitive
check = StringMatching(
keyword="error", text_key="trace.last.outputs", case_sensitive=False
)

Check with regex pattern matching.

Module: giskard.checks.builtin.text_matching

pattern str | None

Regular expression pattern.

pattern_key str | None

JSONPath to extract the regex pattern from the trace. Provide exactly one of pattern or pattern_key.

text str | None

JSONPath-resolved text to match against (alternative to text_key).

text_key str Default: "trace.last.outputs"

JSONPath to extract text to match against.

from giskard.checks import RegexMatching
check = RegexMatching(
pattern=r"\d{3}-\d{3}-\d{4}",
text_key="trace.last.outputs.phone",
)

Validate numeric and comparable values against expected thresholds.

Module: giskard.checks.builtin.comparison

All comparison checks share these parameters:

expected_value Any | None

Static expected value.

expected_value_key JSONPathStr | NotProvided

JSONPath to extract expected value from trace.

key str Required

JSONPath to extract the actual value from the trace.

normalization_form str | None Default: "NFKC"

Unicode normalization: "NFC", "NFD", "NFKC", "NFKD".

Provide exactly one of expected_value or expected_value_key.

Check that extracted values equal an expected value.

from giskard.checks import Equals
check = Equals(expected_value=42, key="trace.last.outputs.count")
check = Equals(expected_value="success", key="trace.last.outputs.status")
# Compare against another trace value
check = Equals(
expected_value_key="trace.interactions[0].outputs.baseline",
key="trace.last.outputs.result",
)

Check that extracted values do not equal an expected value.

from giskard.checks import NotEquals
check = NotEquals(expected_value="error", key="trace.last.outputs.status")
from giskard.checks import GreaterThan, GreaterEquals
check = GreaterThan(
expected_value=0.8, key="trace.last.metadata.confidence_score"
)
check = GreaterEquals(expected_value=100, key="trace.last.outputs.user_count")
from giskard.checks import LesserThan, LesserThanEquals
check = LesserThan(expected_value=500, key="trace.last.metadata.latency_ms")
check = LesserThanEquals(
expected_value=1000, key="trace.last.metadata.token_count"
)

Validation checks powered by Large Language Models for semantic understanding.

Abstract base class for creating custom LLM-powered checks. Handles LLM interaction, prompt rendering, and result parsing — subclasses only need to define the evaluation prompt.

Module: giskard.checks.judges.base

BaseLLMCheck
generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation. Falls back to the global default if not provided.

name str | None Default: None

Optional check name.

description str | None Default: None

Optional description.

.get_prompt() str | Message | MessageTemplate | TemplateReference

Returns the prompt to send to the LLM. Subclasses must implement this method.

.get_inputs() dict[str, Any]

Provides template variables for prompt rendering. Override to customize available variables. Default: {"trace": trace}.

trace Trace Required
The trace containing interaction history.
.run() CheckResult

Execute the LLM-based check (inherited, usually doesn’t need overriding).

trace Trace Required
The trace to evaluate.
from giskard.checks.judges.base import BaseLLMCheck
@BaseLLMCheck.register("custom_llm_check")
class CustomLLMCheck(BaseLLMCheck):
custom_instruction: str
def get_prompt(self):
return f"""
Evaluate based on: {self.custom_instruction}
Input: {{{{ trace.last.inputs }}}}
Output: {{{{ trace.last.outputs }}}}
Return passed=true if criteria are met, passed=false otherwise.
"""
check = CustomLLMCheck(
custom_instruction="Response must be concise and helpful"
)

Module: giskard.checks.judges.base

Default result model for LLM-based checks. This is the structured output format expected from the LLM.

LLMCheckResult
passed bool

Whether the check passed.

reason str | None

Optional explanation for the result.


Validates that answers are grounded in provided context documents. Essential for RAG systems to ensure responses don't hallucinate information.

Module: giskard.checks.judges.groundedness

answer str | None

The answer text to evaluate (static).

answer_key str Default: "trace.last.outputs"

JSONPath to extract answer from trace.

context str | list[str] | None

Context document(s) that should support the answer (static).

context_key str Default: "trace.last.metadata.context"

JSONPath to extract context from trace.

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation.

from giskard.checks import Groundedness
# Static values
check = Groundedness(
answer="The Eiffel Tower is in Paris.",
context=[
"Paris is the capital of France.",
"The Eiffel Tower is a famous landmark.",
],
)
# Extract from trace
check = Groundedness(
answer_key="trace.last.outputs.answer",
context_key="trace.last.metadata.retrieved_docs",
)

Validates that interactions conform to a specified rule or requirement. The rule supports Jinja2 templating, allowing dynamic rules that reference trace data.

Module: giskard.checks.judges.conformity

rule str Required

The conformity rule to evaluate. Supports Jinja2 templating with access to the trace object.

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation (falls back to default).

from giskard.checks import Conformity
# Static rule
check = Conformity(rule="The response must be professional and polite")
# Dynamic rule with templating
check = Conformity(
rule="The response must contain the keywords '{{ trace.last.inputs.required_keywords }}'"
)
# Reference metadata
check = Conformity(
rule="Use a {{ trace.last.metadata.tone }} tone in the response"
)

General-purpose LLM-based validation with custom prompts. The most flexible LLM check — use when specialized checks (Groundedness, Conformity) don't fit your needs.

Module: giskard.checks.judges.judge

prompt str | None

Inline prompt content with Jinja2 templating support.

prompt_path str | None

Path to a template file (e.g. "checks::my_template.j2").

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation.

Exactly one of prompt or prompt_path must be provided.

Template variables available in prompts:

| Variable | Description | | --------------------- | ----------------------------------------- | | trace | Full trace object with all interactions | | trace.interactions | List of all interactions in order | | trace.last | Most recent interaction | | trace.last.inputs | Inputs from the most recent interaction | | trace.last.outputs | Outputs from the most recent interaction | | trace.last.metadata | Metadata from the most recent interaction |

from giskard.checks import LLMJudge
# Inline prompt
check = LLMJudge(
prompt="""
Evaluate if the response is helpful and accurate.
User Input: {{ trace.last.inputs }}
AI Response: {{ trace.last.outputs }}
Return passed=true if helpful and accurate, passed=false otherwise.
""",
)
# Multi-turn evaluation
check = LLMJudge(
prompt="""
Evaluate the multi-turn conversation quality.
{% for interaction in trace.interactions %}
User: {{ interaction.inputs }}
Assistant: {{ interaction.outputs }}
{% endfor %}
Criteria: consistency, relevance, professional tone.
Return passed=true if all criteria are met.
""",
)

Validate semantic similarity between outputs and expected content using embeddings.

Module: giskard.checks.builtin.semantic_similarity

reference_text str | None

Reference text to compare against (static).

reference_text_key str Default: "trace.last.metadata.reference_text"

JSONPath to extract reference text from trace.

actual_answer_key str Default: "trace.last.outputs"

JSONPath to extract actual value from trace.

threshold float Default: 0.95

Similarity threshold (0.0 to 1.0).

embedding_model BaseEmbeddingModel Default: get_default_embedding_model()

Embedding model used to compute similarity scores.

from giskard.checks import SemanticSimilarity
check = SemanticSimilarity(
reference_text="The capital of France is Paris.",
actual_answer_key="trace.last.outputs",
threshold=0.8,
)

from giskard.checks import Groundedness, Conformity, LLMJudge, Scenario
scenario = (
Scenario()
.interact(
inputs="What is the capital of France?",
outputs=lambda inputs: "Paris is the capital of France.",
)
.check(
Groundedness(
context=["France is a country in Europe.", "Paris is the capital."]
)
)
.check(Conformity(rule="The response must be a complete sentence"))
.check(
LLMJudge(
prompt="Is the response educational? Return passed=true/false."
)
)
)
from giskard.agents.generators import Generator
from giskard.checks import set_default_generator
# Set once, use everywhere
set_default_generator(Generator(model="openai/gpt-5", temperature=0.1))
# No need to pass generator anymore
check1 = Groundedness(answer="...", context=["..."])
check2 = Conformity(rule="...")
check3 = LLMJudge(prompt="...")
from giskard.checks import Check, CheckResult, Trace
@Check.register("custom_business_logic")
class CustomBusinessCheck(Check):
threshold: float = 0.9
allowed_categories: list[str] = []
async def run(self, trace: Trace) -> CheckResult:
output = trace.last.outputs
category = output.get("category")
confidence = output.get("confidence", 0)
if category not in self.allowed_categories:
return CheckResult.failure(
message=f"Invalid category: {category}",
details={
"category": category,
"allowed": self.allowed_categories,
},
)
if confidence < self.threshold:
return CheckResult.failure(
message=f"Confidence {confidence} below threshold {self.threshold}",
)
return CheckResult.success(message="Validation passed")
check = CustomBusinessCheck(
threshold=0.85, allowed_categories=["sports", "news"]
)

  • Core API — Base classes and fundamental types
  • Scenarios — Multi-step workflow testing