Registry 기반 에이전트 구성

uv run llamon agent my-agent --template agent-general --yes

어떤 템플릿을 고를까

agent-general: 일반 대화, 검색, 도구 호출
agent-structured: 분류, 추출, scorer처럼 구조화 결과가 중요한 경우

처음 시작할 때는 보통 agent-general을 고르면 됩니다. 다음 단계가 읽을 output_data를 안정적으로 만들어야 하면 agent-structured를 선택하세요.

핵심 수정 파일

실제로 자주 수정하는 파일은 아래 3개입니다.

순서	파일	역할
①	`app/config.py`	Registry ID로 `ExtensionConfig` 구성
②	`app/agent_card.py`	카드 정보와 스킬 선언
③	`.env`	`LLAMON_REGISTRY_HOST`, 포트 등 환경값

main.py, app/tools.py는 기본 scaffold 기준으로는 보통 수정하지 않습니다.

1) `app/config.py`

Registry 기반 에이전트는 ExtensionConfig에 Registry ID를 넣어 LLM, Prompt, Guardrail, MCP를 연결합니다.

from llamon_agent.config import (
    ExtensionConfig,
    GuardrailConfig,
    GuardrailJudgeLLMConfig,
    GuardrailSetConfig,
    LLMConfig,
    MCPConfig,
    PromptConfig,
    PromptSetConfig,
    VariableBindingConfig,
)


def build_extension(max_retry: int = 3) -> ExtensionConfig:
    return ExtensionConfig(
        llm=LLMConfig(
            id="<YOUR_MODEL_ID>",
            temperature=0.1,
        ),
        prompts=PromptSetConfig(
            system=PromptConfig(
                id="<YOUR_PROMPT_ID>",
                # bindings={
                #     "job": VariableBindingConfig(source="input"),
                #     "user": VariableBindingConfig(source="context.userId", default="anonymous"),
                # },
            ),
        ),
        # guardrails=GuardrailSetConfig(
        #     input=GuardrailConfig(
        #         id="<YOUR_INPUT_GUARDRAIL_ID>",
        #         # [선택] prompt 기반 판정 전용 LLM. 지정 시 Registry Guardrail 의
        #         # model_id 및 ExtensionConfig.llm 보다 우선합니다.
        #         # judge_llm=GuardrailJudgeLLMConfig(id="<YOUR_GUARDRAIL_JUDGE_MODEL_ID>"),
        #     ),
        #     output=GuardrailConfig(id="<YOUR_OUTPUT_GUARDRAIL_ID>"),
        # ),
        # mcp=[
        #     MCPConfig(id="<YOUR_MCP_ID>"),
        # ],
        max_retry=max_retry,
        # [선택] A2A artifact 메타 — 자세한 내용은 이 페이지 하단의
        # "A2A artifact 메타데이터" 섹션 참조.
        # artifact_name="<YOUR_ARTIFACT_NAME>",
        # artifact_description="<이 에이전트 응답에 대한 한 줄 설명>",
    )

중요한 점:

agent-general 기본 scaffold는 LLM + system prompt를 바로 쓰는 형태입니다.
guardrails, mcp는 지원하지만 기본 scaffold에서는 주석 처리되어 있습니다.
MCPConfig, GuardrailConfig, PromptConfig는 모두 id와 선택적 version을 사용합니다.
GuardrailConfig.judge_llm 은 prompt 기반 판정 전용 모델을 지정합니다 — 미지정 시 Registry Guardrail 메타 또는 LLMConfig 가 fallback. 자세한 우선순위는 아래 skip_llm 섹션 참조.
artifact_name / artifact_description은 A2A 최종 응답에 실리는 artifact 식별자로, 선택 사항이며 미설정 시 SDK가 자동 결정합니다.

Prompt 변수 바인딩

PromptConfig(..., bindings=...)로 요청 입력이나 metadata를 prompt 변수에 주입할 수 있습니다.

bindings={
    "job": VariableBindingConfig(source="input"),
    "user": VariableBindingConfig(source="context.userId", default="anonymous"),
    "session": VariableBindingConfig(source="context.sessionId"),
}

주요 source:

source	의미
`input`	사용자 입력 텍스트
`context.userId`	요청 metadata의 `userId`
`context.sessionId`	요청 metadata의 `sessionId`
`state.metadata.<key>`	state metadata의 특정 필드
`env.<VAR>`	허용된 환경변수

context.*는 런타임에서 요청 metadata를 읽습니다.

LLM 호출 우회 — `skip_llm`

LLM 출력을 사실상 사용하지 않는 에이전트가 있습니다. 예:

SchemaValidatedRuntimeAdapter + 100% passthrough — apply_business_rules가 a2a_data의 모든 키를 그대로 흘리고 summary도 입력 텍스트로 덮어쓰는 형태.
가드레일 + 패스스루 전용 에이전트 — 입력 검증 + 라우팅만 하고 응답은 다른 에이전트에서 조립.

이런 케이스에서 skip_llm=True를 켜면 다음과 같이 바뀝니다 — 모두 SDK가 알아서 처리하므로 별도 설정은 필요 없습니다.

모델 ID를 안 적어도 됩니다 — Registry에서 LLM 메타데이터를 로드하지 않으므로 LLMConfig.id를 비워도 검증을 통과합니다. (가드레일 prompt 검사 fallback용으로 id를 함께 둬도 무방 — 아래 LLM 호출 우회 — skip_llm 섹션의 prompt 규칙 우선순위 참고.)
LLM 호출 자체가 발생하지 않습니다 — primary LLM 응답은 항상 빈 문자열로 즉시 반환됩니다. 그래서 응답 latency 단축 + 토큰 비용 0원의 효과가 나오는 것입니다.
응답은 LLM이 아닌 곳에서 만들어야 합니다 — primary LLM 출력이 비어 있으므로, 실제 응답은 아래 중 하나가 조립합니다:
- RuntimeAdapter.postprocess (예: SchemaValidatedRuntimeAdapter + a2a_data passthrough)
- 가드레일 전용 런타임
- passthrough 비즈니스 로직
가드레일은 평소처럼 동작합니다 — input/output 가드레일은 최종 응답에 그대로 적용됩니다.

ExtensionConfig(
    llm=LLMConfig(
        # skip_llm=True 이면 id / temperature / max_tokens / response_format 모두 무시.
        # LLM 출력을 사용하지 않는 에이전트에 권장.
        skip_llm=True,
    ),
    prompts=PromptSetConfig(
        # prompts 는 그대로 두어도 무방 (LLM 호출 자체가 발생하지 않으므로 무시됨).
        system=PromptConfig(id="<YOUR_PROMPT_ID>"),
    ),
    guardrails=GuardrailSetConfig(
        # 가드레일은 정상 동작 — postprocess 가 만든 RuntimeOutput 에 적용됩니다.
        output=GuardrailConfig(id="<YOUR_OUTPUT_GUARDRAIL_ID>"),
    ),
    artifact_name="<YOUR_ARTIFACT_NAME>",
)

skip_llm=True 일 때:

필드	동작
`id`, `temperature`, `max_tokens`, `response_format`	무시됨 — Studio UI 도 이 필드들을 비활성화 처리
`prompts`	LLM 호출이 없으므로 무시됨 (제거하지 않아도 무방)
`guardrails`	정상 동작 — `RuntimeOutput`에 input/output 가드레일이 그대로 적용
`mcp`	LLM 자율 tool 선택이 발생하지 않으므로 무시됨. 비즈니스 노드의 `MCPHandle.call(...)`은 영향 없음

가드레일의 regex 규칙은 skip_llm 과 무관하게 항상 동작합니다. 다만 가드레일에 prompt 규칙(LLM 심사기 기반)이 활성화되어 있으면 별도의 LLM 이 필요합니다. RegistryGuardrail._get_prompt_llm 의 모델 결정 우선순위는:

GuardrailConfig.judge_llm
Registry 가드레일 메타의 model_id / provider_id / model
그 다음 LLMConfig.id / LLMConfig.model (primary LLM 설정 fallback)

GuardrailConfig(
    id="<YOUR_GUARDRAIL_ID>",
    judge_llm=GuardrailJudgeLLMConfig(
        id="<YOUR_GUARDRAIL_JUDGE_MODEL_ID>",
        temperature=0.0,
        max_tokens=128,
    ),
)

skip_llm=True 와 id 는 공존 가능합니다 (@model_validator 가 동시 지정을 허용). 가드레일 prompt 검사를 메인 LLM 모델로 수행하고 싶다면 다음과 같이 함께 두세요:

llm=LLMConfig(
    id="<YOUR_MODEL_ID>",   # 가드레일 prompt 검사 fallback 용 (lazy resolve, 위반 발생 시에만 호출)
    skip_llm=True,          # 메인 LLM 호출은 그대로 우회 → 일반 트래픽 비용 0
)

id 도 비워둔 채 skip_llm=True 만 두면, Registry 가드레일에 model_id 가 별도 바인딩되어 있지 않은 한 prompt 검사가 silently skip 되고 "prompt 검사 skip: model_id 없음" 경고만 남습니다.

/.well-known/agent-card.json 의 capabilities.extensions.llm 영역은 실제로 호출되는 LLM 만 노출합니다. skip_llm=True 로 primary LLM 이 우회된 상태에서도 가드레일이 judge_llm 으로 실제 모델을 호출하므로, 그 정보를 카드에 동일한 형식으로 노출합니다.

`skip_llm`	`judge_llm`	카드 `llm` 노출
`False`	무관	primary LLM (기존)
`True`	있음	`judge_llm` (신규)
`True`	없음	미노출

guardrails.input.judge_llm 이 guardrails.output.judge_llm 보다 우선합니다. provider 정보가 Registry 에서 해결되면 primary LLM 경로와 동일하게 {id: provider_uuid, model, provider} 형식으로 직렬화되고, provider 미해결 시 {id: judge_llm.id, model} 로 노출됩니다.

Registry provider 선택

LLMConfig(id=...)는 먼저 Registry에서 모델 메타데이터를 읽고, 그 모델의 provider type에 따라 내부 adapter를 고릅니다.

PROVIDER_OLLAMA → OllamaAdapter
PROVIDER_OPENAI, PROVIDER_VLLM → OpenAIAdapter
PROVIDER_ANTHROPIC → AnthropicAdapter

즉 Registry 기반에서는 config.py에 모델명을 직접 적기보다 Registry model ID를 적는 방식이 기본입니다.

Business 노드에서 MCP 직접 호출

LLM이 tool을 자율 선택하게 하지 않고, business 노드에서 특정 MCP tool을 확정적으로 호출해야 할 때는 MCPHandle 패턴이 가장 단순합니다.

from llamon_agent import MCPHandle

mcp = MCPHandle()
await mcp.bind_registry(settings, mcp_ids=["<YOUR_MCP_ID>"], mcp_id="<YOUR_MCP_ID>")

result = await mcp.call(
    "lookup_customer",
    customer_id="12345",
)
# 또는 dict 형태: await mcp.call("lookup_customer", params={"customer_id": "12345"})
# (kwargs 와 params= 는 상호배타 — 동시 사용 시 ValueError)

if result.status != "ok":
    raise ValueError(result.summary)

MCP가 여러 개면 handle도 MCP마다 하나씩 두는 편이 가장 안전합니다.

from llamon_agent import MCPHandle

crm_mcp = MCPHandle()
web_mcp = MCPHandle()

await crm_mcp.bind_registry(settings, mcp_ids=["<CRM_MCP_ID>"], mcp_id="<CRM_MCP_ID>")
await web_mcp.bind_registry(settings, mcp_ids=["<WEB_MCP_ID>"], mcp_id="<WEB_MCP_ID>")

customer = await crm_mcp.call("lookup_customer", customer_id="12345")
search = await web_mcp.call("search_web", query="customer 12345 news")

여기서 mcp_id는 tool metadata에 붙은 MCP 식별자입니다.

Registry MCP: Registry ID/UUID
Local MCPToolLoader: add_server("name", ...)에 넣은 서버 이름

보통 startup/build 단계에서 bind_registry(...) 또는 bind(...)를 한 번만 호출하고, 노드에서는 call(...)만 사용하면 됩니다.

2) `app/agent_card.py`

카드 정보와 스킬은 여기서 수정합니다.

from llamon_agent.config import Settings
from llamon_agent.server import AgentCardBuilder


def build_card(settings: Settings):
    return (
        AgentCardBuilder(
            name="my-agent",
            description="Registry 기반 일반 에이전트",
            url=settings.AGENT_URL,
            version="1.0.0",
        )
        .add_skill(
            id="chat",
            name="일반 대화",
            description="질문에 답변합니다",
            tags=["chat", "registry"],
            examples=["프랑스의 수도는 어디야?"],
        )
        .set_capabilities(streaming=True, push_notifications=False)
        .build()
    )

스킬은 .add_skill(...) 블록을 직접 추가/수정하면 됩니다.

스킬 라우팅 메모

스킬이 여러 개면 SDK는 아래 순서로 라우팅합니다.

토큰 기반 pre-routing
LLM structured output
JSON fallback

tags와 name이 pre-routing에 가장 크게 반영되므로, 검색어가 들어갈 표현을 짧게 넣는 편이 좋습니다.

3) `.env`

Registry 기반 scaffold에서 먼저 확인할 값은 아래입니다.

LLAMON_REGISTRY_HOST=http://<registry-host>:7860
HOST=0.0.0.0
PORT=8000
AGENT_URL=http://localhost:8000
# ReAct tool-calling 루프 최대 반복 횟수 (recursion_limit = N*2+1)
REACT_MAX_ITERATIONS=3

v0.2: MAX_RETRY env 제거

구 MAX_RETRY env는 v0.2.0에서 완전히 제거되었고, 설정되어 있으면 서버가 시작 단계에서 명시적 오류로 종료됩니다. 반드시 REACT_MAX_ITERATIONS로 이름을 변경하세요.

# 한 번에 치환
sed -i '' 's/^MAX_RETRY=/REACT_MAX_ITERATIONS=/' .env

Python API의 ExtensionConfig(max_retry=...) 인자는 그대로 유지됩니다 (env 키만 바뀐 것).

LLAMON_REGISTRY_HOST가 비어 있으면 Registry 기반 LLM/Prompt/Guardrail/MCP를 resolve할 수 없습니다.

`RuntimeEnv`로 env 읽기

os.getenv 대신 SDK가 제공하는 RuntimeEnv를 쓰는 것이 기본 패턴입니다. scaffold된 HTTP·PostgreSQL 노드도 이 패턴으로 생성됩니다.

from llamon_agent.config import RuntimeEnv

# 모듈 상단에서 한 번만 선언 — 여러 노드 함수가 이 env 객체를 공유합니다.
env = RuntimeEnv(source_file=__file__)

async def postgres_lookup(state) -> dict:
    dsn = env.get("POSTGRES_URL", "").strip()
    # ...

async def http_fetch(state) -> dict:
    url = env.get("API_ENDPOINT", "https://api.example.com/v1/data")
    # ...

.env 자동 탐색: 프로젝트 루트에서 상위 디렉터리까지 올라가며 가장 가까운 .env를 찾아 .env → 실제 OS env 순으로 병합합니다.
source_file tracking: Langfuse span에 어느 파일에서 값을 읽었는지 기록되어 디버깅이 쉽습니다.
모듈 상단 1회 선언: 함수 호출마다 env 를 재생성할 필요 없이 같은 객체를 재사용합니다. Studio 가 생성하는 HTTP/PostgreSQL 노드도 이 패턴을 따릅니다.

여러 노드가 같은 env 키를 쓰면 app/config.py 에 상수로 올려 from app.config import POSTGRES_URL 로 공유하세요. Studio 는 이런 공유 패턴을 자동 감지해 승급 제안 배너를 띄워 줍니다.

Registry 등록 정보 — `AGENT_ID`와 `PUBLIC_AGENT_URL`

Registry UI에서 에이전트를 등록할 때 아래 2가지를 입력합니다. 이 값과 .env의 두 env 변수가 정확히 일치해야 A2A 호출이 정상 작동합니다.

Registry UI 입력	`.env` 변수	자동 생성 여부
AGENT ID (선택)	`AGENT_ID`	✅ scaffold 시 자동 주입
A2A 접근 경로 (필수)	`PUBLIC_AGENT_URL`의 경로 부분	✅ scaffold 시 자동 주입

# Registry 내부 식별자 — 자동 생성됨. 수동 지정 시 같은 값으로 교체.
AGENT_ID=<자동생성값>

# 프록시/게이트웨이 호스트 + Registry UI에 표시된 A2A 접근 경로
# 경로 prefix는 프록시 설정에 따라 다를 수 있습니다
PUBLIC_AGENT_URL=http://<proxy-host>:<port><a2a-path>

`agent-general` vs `agent-structured`

agent-general: 일반 대화, 질의응답, 도구 호출용
agent-structured: 구조화 결과가 중요한 경우
structured 템플릿은 app/runtime_adapter.py를 추가로 수정
structured 템플릿은 최종 output_text, output_data를 코드에서 보정

Pydantic 스키마 기반 (`SchemaValidatedRuntimeAdapter`)

structured 어댑터는 두 가지 베이스 중에서 선택할 수 있습니다.

베이스	언제
`StructuredOutputAgent`	`extract_payload` / `build_summary`를 직접 구현해 세밀하게 제어할 때
`SchemaValidatedRuntimeAdapter[PayloadT]`	Pydantic 스키마 하나로 JSON 파싱·검증·캐스팅·기본 summary를 SDK에 위임할 때 (권장)

from pydantic import BaseModel, Field
from typing import Literal
from llamon_agent import SchemaValidatedRuntimeAdapter


class IntentPayload(BaseModel):
    intentType: Literal["simple_query", "document_verification", "unclassified"]
    confidence: float = Field(ge=0, le=1, default=0.0)
    originalQuery: str = ""


class MyAgent(SchemaValidatedRuntimeAdapter[IntentPayload]):
    payload_schema = IntentPayload

    def apply_business_rules(self, payload, *, query, a2a_files, **_):
        # 첨부 파일이 있으면 의도를 강제
        if a2a_files and payload.intentType != "document_verification":
            payload.intentType = "document_verification"
            payload.confidence = max(payload.confidence, 0.6)
        return payload

    def format_summary(self, payload):
        return f"질의를 {payload.intentType}로 분류했습니다 (신뢰도 {payload.confidence})"

내부 파이프라인 (자동):

raw dict 추출
payload_schema.model_validate(raw) 검증·캐스팅
실패 시 on_validation_error() 위임 (기본: 안전 안내 응답)
apply_business_rules(...) 호출 (선택, 기본 no-op)
format_summary(...) 호출 (기본: payload의 summary → output_text → message 필드 자동 탐색)
RuntimeOutput(text=summary, data=[payload_dict], files=...) 반환

검증 실패 처리 (`on_validation_error`)

스키마 검증이 실패하면 SDK가 자동으로 on_validation_error를 호출합니다. 기본 구현은 안전한 안내 메시지를 반환하며, 도메인 fallback이 필요할 때만 오버라이드합니다.

from llamon_agent.runtime import RuntimeOutput

class MyAgent(SchemaValidatedRuntimeAdapter[IntentPayload]):
    payload_schema = IntentPayload

    def on_validation_error(self, raw, error, *, query, a2a_files=None, **_):
        # 키워드 fallback: "예금"이 들어 있으면 기본 의도로 처리
        if "예금" in (query or ""):
            payload = IntentPayload(intentType="simple_query", confidence=0.3, originalQuery=query)
            return self.build_fallback_output(payload)
        return super().on_validation_error(raw, error, query=query, a2a_files=a2a_files)

build_fallback_output(payload, ...) 헬퍼로 RuntimeOutput 조립 보일러플레이트를 줄일 수 있습니다.

A2A artifact 메타데이터 (`artifact_name` / `artifact_description`)

A2A 응답의 artifact.name / artifact.description 필드를 에이전트가 직접 덮어쓸 수 있습니다. 미지정 시 SDK 가 data/files 유무 기반으로 structured-response / agent-response / generated-files 중 자동 결정합니다.

언제 설정하나

상황	효과
Orchestrator agent 가 여러 sub-agent 응답을 받음	`name` 으로 어떤 에이전트의 결과인지 식별·라우팅
Studio UI 에서 artifact 목록 탐색	`name` + `description` 이 툴팁/레이블로 노출
Langfuse/감사 로그 가독성	의미있는 이름으로 trace span 식별
여러 artifact 를 한 응답에 담음	각 artifact 를 구분하기 위한 고유 이름 필수

지정 안 해도 A2A 프로토콜은 정상 동작합니다. 순수 선택(optional).

표준: `ExtensionConfig` 필드 (모든 에이전트 타입 공통)

v0.2.0+ — config.py 한 곳에서 선언적으로 지정. Registry 단일 LLM (create_server(agent=None)) / Flow / Adapter 모든 에이전트 타입에서 동일하게 작동.

def build_extension(max_retry: int = 3) -> ExtensionConfig:
    return ExtensionConfig(
        llm=LLMConfig(id="59"),
        prompts=PromptSetConfig(...),
        # [선택] A2A artifact 메타 기본값
        artifact_name="support-response",
        artifact_description="사용자 대면 자연어 답변",
        max_retry=max_retry,
    )

동적 덮어쓰기: `RuntimeOutput` 또는 graph 출력 dict

Flow 노드의 exit 출력에서 요청별로 다른 artifact_name을 내고 싶을 때 사용. ExtensionConfig default를 덮어씁니다.

# nodes.py — Flow 에이전트의 exit 노드
async def aggregate_results(state):
    return {
        "output": {
            "output_text": summary,
            "output_data": result,
            "artifact_name": "subject-management-result",   # ← per-request override
            "artifact_description": "대상자 관리 처리 결과",
        },
    }

from llamon_agent import RuntimeOutput

return RuntimeOutput(
    text="분류 결과 요약...",
    data=[{"intentType": "simple_query", ...}],
    artifact_name="intent-classification",
    artifact_description="사용자 질의를 3가지 intent 로 분류한 결과",
)

Priority chain

순위	지정 위치	용도
1 (최우선)	result-level (`RuntimeOutput.artifact_name` 또는 graph 출력 dict `"artifact_name"`)	요청별 동적 덮어쓰기
2	`ExtensionConfig.artifact_name`	에이전트 표준 기본값
3 (최하)	fallback heuristic (`structured-response` / `agent-response` / `response-summary`)	자동

응답 JSON 에서의 모양

설정 전:

{
  "artifacts": [{
    "artifactId": "...",
    "name": "structured-response",
    "parts": [...]
  }]
}

설정 후:

{
  "artifacts": [{
    "artifactId": "...",
    "name": "intent-classification",
    "description": "사용자 질의를 3가지 intent 로 분류한 결과",
    "parts": [...]
  }]
}

수정하지 않아도 되는 파일

기본 scaffold 기준:

main.py: 서버 조립/실행 진입점
app/tools.py: 추가 도구가 필요할 때만 수정

Registry 기반 에이전트 구성

어떤 템플릿을 고를까

핵심 수정 파일

1) app/config.py

Prompt 변수 바인딩

LLM 호출 우회 — skip_llm

Registry provider 선택

Business 노드에서 MCP 직접 호출

2) app/agent_card.py

스킬 라우팅 메모

3) .env

RuntimeEnv로 env 읽기

Registry 등록 정보 — AGENT_ID와 PUBLIC_AGENT_URL

agent-general vs agent-structured

Pydantic 스키마 기반 (SchemaValidatedRuntimeAdapter)

검증 실패 처리 (on_validation_error)

A2A artifact 메타데이터 (artifact_name / artifact_description)