Checks

Ready-to-use validation checks for common testing scenarios, including function-based checks, string matching, comparisons, and LLM-powered semantic validation.

Function-based Checks

`FnCheck`

Create a check from a callable function. Perfect for quick prototyping and simple validation logic. Not reliably serializable — intended for programmatic/test use only.

Module: giskard.checks.builtin.fn

fn Callable Required

Function taking a Trace, returning bool or CheckResult.

name str | None Default: None

Optional check name.

description str | None Default: None

Optional description.

success_message str | None Default: None

Message when check passes.

failure_message str | None Default: None

Message when check fails.

details dict[str, Any] Default: {}

Additional details to include in result.

from giskard.checks import FnCheck

# Simple boolean check
check = FnCheck(
    fn=lambda trace: trace.last.outputs is not None,
    name="has_output",
    success_message="Output was provided",
    failure_message="No output found",
)


# Async check
async def validate_response(trace):
    is_valid = await external_validator(trace.last.outputs)
    return is_valid


check = FnCheck(fn=validate_response, name="async_validation")

String Matching

`StringMatching`

Check that validates string patterns (substring matching) in trace values.

Module: giskard.checks.builtin.text_matching

keyword str | None

Substring to search for (or use keyword_key to extract from trace).

keyword_key str | None

JSONPath to extract keyword from trace.

text str | None

JSONPath-resolved text to search in. If unset, falls back to text_key.

text_key str Default: "trace.last.outputs"

JSONPath to extract text to search in.

normalization_form Literal["NFC", "NFD", "NFKC", "NFKD"] | None Default: "NFKC"

Unicode normalization applied before matching.

case_sensitive bool Default: True

Whether matching is case-sensitive.

from giskard.checks import StringMatching

check = StringMatching(keyword="success", text_key="trace.last.outputs")

# Case-insensitive
check = StringMatching(
    keyword="error", text_key="trace.last.outputs", case_sensitive=False
)

`RegexMatching`

Check with regex pattern matching.

Module: giskard.checks.builtin.text_matching

pattern str | None

Regular expression pattern.

pattern_key str | None

JSONPath to extract the regex pattern from the trace. Provide exactly one of pattern or pattern_key.

text str | None

JSONPath-resolved text to match against (alternative to text_key).

text_key str Default: "trace.last.outputs"

JSONPath to extract text to match against.

from giskard.checks import RegexMatching

check = RegexMatching(
    pattern=r"\d{3}-\d{3}-\d{4}",
    text_key="trace.last.outputs.phone",
)

Comparison Checks

Validate numeric and comparable values against expected thresholds.

Module: giskard.checks.builtin.comparison

All comparison checks share these parameters:

expected_value Any | None

Static expected value.

expected_value_key JSONPathStr | NotProvided

JSONPath to extract expected value from trace.

key str Required

JSONPath to extract the actual value from the trace.

normalization_form str | None Default: "NFKC"

Unicode normalization: "NFC", "NFD", "NFKC", "NFKD".

Provide exactly one of expected_value or expected_value_key.

`Equals`

Check that extracted values equal an expected value.

from giskard.checks import Equals

check = Equals(expected_value=42, key="trace.last.outputs.count")
check = Equals(expected_value="success", key="trace.last.outputs.status")

# Compare against another trace value
check = Equals(
    expected_value_key="trace.interactions[0].outputs.baseline",
    key="trace.last.outputs.result",
)

`NotEquals`

Check that extracted values do not equal an expected value.

from giskard.checks import NotEquals

check = NotEquals(expected_value="error", key="trace.last.outputs.status")

`GreaterThan` / `GreaterEquals`

from giskard.checks import GreaterThan, GreaterEquals

check = GreaterThan(
    expected_value=0.8, key="trace.last.metadata.confidence_score"
)
check = GreaterEquals(expected_value=100, key="trace.last.outputs.user_count")

`LesserThan` / `LesserThanEquals`

from giskard.checks import LesserThan, LesserThanEquals

check = LesserThan(expected_value=500, key="trace.last.metadata.latency_ms")
check = LesserThanEquals(
    expected_value=1000, key="trace.last.metadata.token_count"
)

LLM-based Checks

Validation checks powered by Large Language Models for semantic understanding.

`BaseLLMCheck`

Abstract base class for creating custom LLM-powered checks. Handles LLM interaction, prompt rendering, and result parsing — subclasses only need to define the evaluation prompt.

Module: giskard.checks.judges.base

BaseLLMCheck

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation. Falls back to the global default if not provided.

name str | None Default: None

Optional check name.

description str | None Default: None

Optional description.

.get_prompt() → str | Message | MessageTemplate | TemplateReference

Returns the prompt to send to the LLM. Subclasses must implement this method.

.get_inputs() → dict[str, Any]

Provides template variables for prompt rendering. Override to customize available variables. Default: {"trace": trace}.

trace Trace Required

The trace containing interaction history.

.run() → CheckResult

Execute the LLM-based check (inherited, usually doesn’t need overriding).

trace Trace Required

The trace to evaluate.

from giskard.checks.judges.base import BaseLLMCheck


@BaseLLMCheck.register("custom_llm_check")
class CustomLLMCheck(BaseLLMCheck):
    custom_instruction: str

    def get_prompt(self):
        return f"""
        Evaluate based on: {self.custom_instruction}

        Input: {{{{ trace.last.inputs }}}}
        Output: {{{{ trace.last.outputs }}}}

        Return passed=true if criteria are met, passed=false otherwise.
        """


check = CustomLLMCheck(
    custom_instruction="Response must be concise and helpful"
)

`LLMCheckResult`

Module: giskard.checks.judges.base

Default result model for LLM-based checks. This is the structured output format expected from the LLM.

LLMCheckResult

passed bool

Whether the check passed.

reason str | None

Optional explanation for the result.

`Groundedness`

Validates that answers are grounded in provided context documents. Essential for RAG systems to ensure responses don't hallucinate information.

Module: giskard.checks.judges.groundedness

answer str | None

The answer text to evaluate (static).

answer_key str Default: "trace.last.outputs"

JSONPath to extract answer from trace.

context str | list[str] | None

Context document(s) that should support the answer (static).

context_key str Default: "trace.last.metadata.context"

JSONPath to extract context from trace.

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation.

from giskard.checks import Groundedness

# Static values
check = Groundedness(
    answer="The Eiffel Tower is in Paris.",
    context=[
        "Paris is the capital of France.",
        "The Eiffel Tower is a famous landmark.",
    ],
)

# Extract from trace
check = Groundedness(
    answer_key="trace.last.outputs.answer",
    context_key="trace.last.metadata.retrieved_docs",
)

`Conformity`

Validates that interactions conform to a specified rule or requirement. The rule supports Jinja2 templating, allowing dynamic rules that reference trace data.

Module: giskard.checks.judges.conformity

rule str Required

The conformity rule to evaluate. Supports Jinja2 templating with access to the trace object.

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation (falls back to default).

from giskard.checks import Conformity

# Static rule
check = Conformity(rule="The response must be professional and polite")

# Dynamic rule with templating
check = Conformity(
    rule="The response must contain the keywords '{{ trace.last.inputs.required_keywords }}'"
)

# Reference metadata
check = Conformity(
    rule="Use a {{ trace.last.metadata.tone }} tone in the response"
)

`LLMJudge`

General-purpose LLM-based validation with custom prompts. The most flexible LLM check — use when specialized checks (Groundedness, Conformity) don't fit your needs.

Module: giskard.checks.judges.judge

prompt str | None

Inline prompt content with Jinja2 templating support.

prompt_path str | None

Path to a template file (e.g. "checks::my_template.j2").

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation.

Exactly one of prompt or prompt_path must be provided.

Template variables available in prompts:

| Variable | Description | | --------------------- | ----------------------------------------- | | trace | Full trace object with all interactions | | trace.interactions | List of all interactions in order | | trace.last | Most recent interaction | | trace.last.inputs | Inputs from the most recent interaction | | trace.last.outputs | Outputs from the most recent interaction | | trace.last.metadata | Metadata from the most recent interaction |

from giskard.checks import LLMJudge

# Inline prompt
check = LLMJudge(
    prompt="""
    Evaluate if the response is helpful and accurate.

    User Input: {{ trace.last.inputs }}
    AI Response: {{ trace.last.outputs }}

    Return passed=true if helpful and accurate, passed=false otherwise.
    """,
)

# Multi-turn evaluation
check = LLMJudge(
    prompt="""
    Evaluate the multi-turn conversation quality.

    {% for interaction in trace.interactions %}
    User: {{ interaction.inputs }}
    Assistant: {{ interaction.outputs }}
    {% endfor %}

    Criteria: consistency, relevance, professional tone.
    Return passed=true if all criteria are met.
    """,
)

`SemanticSimilarity`

Validate semantic similarity between outputs and expected content using embeddings.

Module: giskard.checks.builtin.semantic_similarity

reference_text str | None

Reference text to compare against (static).

reference_text_key str Default: "trace.last.metadata.reference_text"

JSONPath to extract reference text from trace.

actual_answer_key str Default: "trace.last.outputs"

JSONPath to extract actual value from trace.

threshold float Default: 0.95

Similarity threshold (0.0 to 1.0).

embedding_model BaseEmbeddingModel Default: get_default_embedding_model()

Embedding model used to compute similarity scores.

from giskard.checks import SemanticSimilarity

check = SemanticSimilarity(
    reference_text="The capital of France is Paris.",
    actual_answer_key="trace.last.outputs",
    threshold=0.8,
)

Common patterns

Combining multiple checks

from giskard.checks import Groundedness, Conformity, LLMJudge, Scenario

scenario = (
    Scenario()
    .interact(
        inputs="What is the capital of France?",
        outputs=lambda inputs: "Paris is the capital of France.",
    )
    .check(
        Groundedness(
            context=["France is a country in Europe.", "Paris is the capital."]
        )
    )
    .check(Conformity(rule="The response must be a complete sentence"))
    .check(
        LLMJudge(
            prompt="Is the response educational? Return passed=true/false."
        )
    )
)

Reusing generators

from giskard.agents.generators import Generator
from giskard.checks import set_default_generator

# Set once, use everywhere
set_default_generator(Generator(model="openai/gpt-5", temperature=0.1))

# No need to pass generator anymore
check1 = Groundedness(answer="...", context=["..."])
check2 = Conformity(rule="...")
check3 = LLMJudge(prompt="...")

Creating custom checks

from giskard.checks import Check, CheckResult, Trace


@Check.register("custom_business_logic")
class CustomBusinessCheck(Check):
    threshold: float = 0.9
    allowed_categories: list[str] = []

    async def run(self, trace: Trace) -> CheckResult:
        output = trace.last.outputs
        category = output.get("category")
        confidence = output.get("confidence", 0)

        if category not in self.allowed_categories:
            return CheckResult.failure(
                message=f"Invalid category: {category}",
                details={
                    "category": category,
                    "allowed": self.allowed_categories,
                },
            )

        if confidence < self.threshold:
            return CheckResult.failure(
                message=f"Confidence {confidence} below threshold {self.threshold}",
            )

        return CheckResult.success(message="Validation passed")


check = CustomBusinessCheck(
    threshold=0.85, allowed_categories=["sports", "news"]
)

Checks

Function-based Checks

FnCheck

String Matching

StringMatching

RegexMatching

Comparison Checks

Equals

NotEquals

GreaterThan / GreaterEquals

LesserThan / LesserThanEquals

LLM-based Checks

BaseLLMCheck

LLMCheckResult

Groundedness

Conformity

LLMJudge

SemanticSimilarity

Common patterns

Combining multiple checks

Reusing generators

Creating custom checks

See also

`FnCheck`

`StringMatching`

`RegexMatching`

`Equals`

`NotEquals`

`GreaterThan` / `GreaterEquals`

`LesserThan` / `LesserThanEquals`

`BaseLLMCheck`

`LLMCheckResult`

`Groundedness`

`Conformity`

`LLMJudge`

`SemanticSimilarity`