# llm_chat 装饰器

本文档介绍 SimpleLLMFunc 库中的聊天装饰器 `llm_chat`。该装饰器专门用于实现与大语言模型的对话功能，支持多轮对话、历史记录管理和工具调用。

## llm_chat 装饰器概述

### 装饰器作用

`llm_chat` 装饰器用于构建对话式应用，特别适合以下场景：

- **多轮对话**: 自动管理对话历史，支持上下文连续性
- **流式响应**: 支持实时流式返回响应内容
- **智能助手**: 集成工具调用能力，让 LLM 可以执行外部操作
- **聊天机器人**: 适合构建实时交互的聊天应用

### 主要功能特性

- **多轮对话支持**: 自动管理对话历史记录，保持上下文
- **流式响应**: 返回异步生成器，支持实时流式输出
- **工具集成**: 支持在对话中调用工具，扩展 LLM 的能力范围
- **灵活参数处理**: 智能处理历史记录参数和用户消息
- **完整的日志记录**: 与框架日志系统集成，自动追踪对话

## 装饰器用法

> ⚠️ **重要说明**：`llm_chat` 只能装饰 `async def` 定义的异步函数，调用后返回 **异步生成器**；请使用 `async for` 消费输出。

### 基本语法

```python
from typing import AsyncGenerator, List, Dict, Tuple
from SimpleLLMFunc import llm_chat

@llm_chat(
    llm_interface=llm_interface,           # LLM interface instance (required)
    toolkit=None,                          # Tool list (optional)
    max_tool_calls=None,                   # Max tool calls (optional)
    stream=True,                           # Stream mode (optional)
    self_reference=None,                   # Shared SelfReference object (optional)
    self_reference_key=None,               # SelfReference memory key (optional)
    **llm_kwargs                           # Other LLM kwargs
)
async def your_chat_function(
    message: str,
    history: List[Dict[str, str]] | None = None,
) -> AsyncGenerator[Tuple[str, List[Dict[str, str]]], None]:
    """
    Describe assistant role and behavior here.
    This docstring is used as the system prompt.
    """
    pass
```

> 提示：函数体不会被执行，DocString 才是 Prompt；建议直接使用 `pass`。

### 参数说明

- **llm_interface** (必需): LLM 接口实例，用于与大语言模型通信
- **toolkit** (可选): 工具列表，可以是 Tool 对象或被 @tool 装饰的函数
- **max_tool_calls** (可选): 最大工具调用次数；默认为 `None`，表示框架不主动施加工具调用上限。如需更严格保护，请显式传入较小整数。
- **stream** (可选): 是否启用流式模式，默认为 False
- **return_mode** (可选): 返回模式，可选值为 "text"（默认）或 "raw"。
  - 仅在 `enable_event=False` 时生效
  - 当 `enable_event=True` 时，`ResponseYield.response` 始终为原始响应对象或流式 chunk
- **enable_event** (可选): 是否启用事件流，默认为 False
  - `False`: 返回 `(response, messages)` 元组（向后兼容模式）
  - `True`: 返回 `ReactOutput`（`ResponseYield` 或 `EventYield`）
  - 详细说明请参考 [事件流文档](event_stream.md)
- **strict_signature** (可选): 当为 True 时强制 `agent(history, message: str, _template_params=None)` 规范签名
- **self_reference** (可选): 共享的 `SelfReference` 实例；若未显式传入，`llm_chat` 会从 `PyRepl` runtime backend 中自动探测
- **self_reference_key** (可选): 本聊天函数的记忆键，默认为函数名
- ****llm_kwargs**: 额外的关键字参数，将直接传递给 LLM 接口（如 temperature、top_p 等）

### 运行时中断（AbortSignal）

你可以在调用时传入 `_abort_signal` 来中断正在执行的回合（停止流式输出并取消正在执行的工具调用）。

推荐使用常量 `ABORT_SIGNAL_PARAM`，避免硬编码参数名：

```python
from SimpleLLMFunc.hooks import AbortSignal, ABORT_SIGNAL_PARAM

abort_signal = AbortSignal()

async for output in your_chat_function(
    "你好",
    history=[],
    **{ABORT_SIGNAL_PARAM: abort_signal},
):
    ...

# 在其他协程中触发中断
abort_signal.abort("user_interrupt")
```

当 `enable_event=True` 时，`ReactEndEvent.extra` 会包含 `aborted: true` 和可选的 `abort_reason`。
详见 [中断与取消](abort.md)。

### 返回值

当 `enable_event=False`（默认）时，`llm_chat` 装饰的函数返回一个异步生成器，每次迭代返回：

- `chunk` (str): 响应内容的一部分（流式模式）或完整响应（非流式）
- `updated_history` (List[Dict[str, str]]): 更新后的对话历史

> 提示：当 `stream=True` 且 `return_mode="text"` 时，流结束会额外 yield 一个空字符串作为结束标记。

当 `enable_event=True` 时，返回 `ReactOutput`，可以是：
- `ResponseYield`: 包含响应和消息列表
- `EventYield`: 包含 ReAct 循环中的事件（如工具调用开始/结束、LLM 调用等）

> 注意：历史参数名应为 `history` 或 `chat_history`。若未提供符合格式的历史，框架会忽略历史并发出警告。若历史中包含 `system` 消息，最新的 `system` 会覆盖 DocString 作为系统提示，其余 `system` 会被过滤。

## 使用示例

### 示例 1: 基础聊天助手

最简单的对话助手实现：

```python
import asyncio
from typing import AsyncGenerator, Dict, List, Tuple
from SimpleLLMFunc import llm_chat, OpenAICompatible

# 初始化 LLM 接口
llm = OpenAICompatible.load_from_json_file("provider.json")["openai"]["gpt-3.5-turbo"]

# 创建聊天函数
@llm_chat(llm_interface=llm, stream=True)
async def simple_chat(
    message: str,
    history: List[Dict[str, str]] | None = None,
) -> AsyncGenerator[Tuple[str, List[Dict[str, str]]], None]:
    """你是一个友好的聊天助手，善于回答各种问题。"""
    pass

# 使用示例
async def main():
    history = []
    user_message = "你好，请介绍一下你自己"

    print(f"用户: {user_message}")
    print("助手: ", end="", flush=True)

    # 流式获取响应
    async for chunk, updated_history in simple_chat(user_message, history):
        if chunk:
            print(chunk, end="", flush=True)
        history = updated_history

    print()  # 换行

asyncio.run(main())
```

### 示例 2: 带工具调用的聊天助手

展示如何在对话中使用工具：

```python
import asyncio
from typing import AsyncGenerator, Dict, List, Tuple
from SimpleLLMFunc import llm_chat, tool, OpenAICompatible

# 定义工具
@tool(name="get_weather", description="获取指定城市的天气信息")
async def get_weather(city: str) -> Dict[str, str]:
    """
    获取指定城市的天气信息

    Args:
        city: 城市名称

    Returns:
        包含温度、湿度和天气状况的字典
    """
    # 模拟天气数据
    weather_data = {
        "北京": {"temperature": "25°C", "humidity": "60%", "condition": "晴朗"},
        "上海": {"temperature": "28°C", "humidity": "75%", "condition": "多云"},
        "广州": {"temperature": "30°C", "humidity": "80%", "condition": "小雨"}
    }
    return weather_data.get(city, {"temperature": "20°C", "humidity": "50%", "condition": "未知"})

# 初始化 LLM
llm = OpenAICompatible.load_from_json_file("provider.json")["openai"]["gpt-3.5-turbo"]

# 创建带工具的聊天函数
@llm_chat(llm_interface=llm, toolkit=[get_weather], stream=True)
async def weather_chat(
    message: str,
    history: List[Dict[str, str]] | None = None,
) -> AsyncGenerator[Tuple[str, List[Dict[str, str]]], None]:
    """
    你是一个天气助手，可以查询城市天气信息。
    当用户询问天气时，使用 get_weather 工具来获取实时信息。
    """
    pass

# 使用示例
async def main():
    history = []
    query = "北京今天天气怎么样？"

    print(f"用户: {query}")
    print("助手: ", end="", flush=True)

    async for chunk, updated_history in weather_chat(query, history):
        if chunk:
            print(chunk, end="", flush=True)
        history = updated_history

    print()

asyncio.run(main())
```

### 示例 3: 交互式多轮对话

展示如何维护完整的对话会话：

```python
import asyncio
from typing import AsyncGenerator, Dict, List, Tuple
from SimpleLLMFunc import llm_chat, OpenAICompatible

llm = OpenAICompatible.load_from_json_file("provider.json")["openai"]["gpt-3.5-turbo"]

@llm_chat(llm_interface=llm, stream=True)
async def multi_turn_chat(
    message: str,
    history: List[Dict[str, str]] | None = None,
) -> AsyncGenerator[Tuple[str, List[Dict[str, str]]], None]:
    """你是一个专业的编程助手，精通 Python 和 JavaScript。"""
    pass

async def interactive_chat_session():
    """运行交互式聊天会话"""
    history = []

    print("=== 编程助手（输入 'quit' 退出）===\n")

    # 这里使用 input() 只是为了演示，实际应用中应使用异步输入
    while True:
        # 在实际应用中，应该使用更好的异步输入方法
        user_input = input("你: ").strip()

        if user_input.lower() == "quit":
            break

        if not user_input:
            continue

        print("助手: ", end="", flush=True)

        response_text = ""
        async for chunk, updated_history in multi_turn_chat(user_input, history):
            if chunk:
                print(chunk, end="", flush=True)
                response_text += chunk
            history = updated_history

        print("\n")

# 非交互式演示（避免阻塞 input()）
async def demo():
    """演示版本，不使用交互式输入"""
    history = []

    messages = [
        "Python 中什么是列表推导式？",
        "如何使用异步编程？",
    ]

    for user_message in messages:
        print(f"\n用户: {user_message}")
        print("助手: ", end="", flush=True)

        async for chunk, updated_history in multi_turn_chat(user_message, history):
            if chunk:
                print(chunk, end="", flush=True)
            history = updated_history

        print()

asyncio.run(demo())
```

## 高级特性

### SelfReference + runtime primitives

当挂载了带 runtime 的工具（例如 `PyRepl`）时，框架会在 system prompt 顶部注入一段去重后的工具最佳实践块；runtime primitive 的指引会包含在这些工具自己的 best practices 中。

这段 guidance 会在每个回合按当前 runtime 状态重新构建，因此不会把临时运行时说明写进持久化 system-prompt memory；需要持久修改提示词时，仍然使用 `set_system_prompt(...)` / `append_system_prompt(...)`。

这段 guidance 会告诉 agent：

- 如何发现当前挂载的 runtime 能力（`runtime.list_primitives()`）
- 如何查看单个契约（`runtime.get_primitive_spec(name)`）或按条件筛选契约（`runtime.list_primitive_specs(names=[...], contains="...")`）
- 如何查看 selfref 命名空间 guidance（`runtime.selfref.guide()`）
- `reset_repl` 会清理 REPL 变量，并继续保留当前 runtime backend 状态
- 当绑定了 `SelfReference` 记忆时，本 chat 函数对应的 memory key 作用域

`PyRepl()` 默认会安装 builtin `selfref` pack；`llm_chat` 会直接从 toolkit 的 runtime backend 解析并复用这份默认 backend。

示例：

```python
from SimpleLLMFunc import llm_chat
from SimpleLLMFunc.builtin import PyRepl, SelfReference

repl = PyRepl()
self_reference = repl.get_runtime_backend("selfref")
assert isinstance(self_reference, SelfReference)

@llm_chat(
    llm_interface=llm,
    toolkit=repl.toolset,
    self_reference_key="agent_main",
)
async def agent(message: str, history=None):
    """You are a practical coding assistant."""
```

单次调用的高级覆盖（可选）：

```python
await agent(
    "task",
    _template_params={
        "__self_reference_key_override": "agent_alt",
        "__self_reference_toolkit_override": repl.toolset,
    },
)
```

在 `execute_code` 中，优先通过 runtime primitives 处理记忆：

```python
runtime.selfref.history.append_system_prompt(
    "User preference: answer in concise bullet points.",
)
```

Runtime self-reference 原语参考：

- `runtime.selfref.guide()`: 返回命名空间概览与 fork / 记忆最佳实践清单。
- `runtime.selfref.history.keys()`: 列出所有已绑定的 memory key。
- `runtime.selfref.history.active_key()`: 获取当前上下文解析出的 active key。
- `runtime.selfref.history.count(key=None)`: 返回解析后 key 的消息数量。
- `runtime.selfref.history.all(key=None)`: 返回完整消息列表的深拷贝。
- `runtime.selfref.history.get(index, key=None)`: 读取指定索引的消息。
- `runtime.selfref.history.append(message, key=None)`: 追加一条消息。
- `runtime.selfref.history.insert(index, message, key=None)`: 在指定索引插入一条消息。
- `runtime.selfref.history.update(index, message, key=None)`: 替换指定索引消息。
- `runtime.selfref.history.delete(index, key=None)`: 删除指定索引消息。
- `runtime.selfref.history.replace(messages, key=None)`: 用校验后的消息列表替换整段历史。
- `runtime.selfref.history.clear(key=None)`: 清理非 system 消息并保留当前 system prompt。
- `runtime.selfref.history.get_system_prompt(key=None)`: 读取最新 system prompt。
- `runtime.selfref.history.set_system_prompt(text, key=None)`: 覆盖当前 system prompt。
- `runtime.selfref.history.append_system_prompt(text, key=None)`: 向现有 system prompt memory 追加文本。
- `runtime.selfref.fork.spawn(message, ...)`: 异步创建子 self-fork（chat 形态）。
- `runtime.selfref.fork.gather_all(fork_id_or_list=None, include_history=False)`: 聚合 fork 结果，返回 `dict[fork_id -> ForkResult]`（用 `.items()` / `.values()` 遍历）。

默认情况下 fork 结果是紧凑模式（`history_included=False`，并提供 `history_count` 元数据），这样主上下文可以保持简洁。
确实需要完整子历史时，再显式使用 `include_history=True`。

当 `enable_event=True` 时，你可以通过 origin 元数据区分主链路事件与 fork 事件：

```python
from SimpleLLMFunc.hooks import is_event_yield

async for output in agent("analyze and split"):
    if not is_event_yield(output):
        continue
    if output.origin.fork_id:
        print(f"fork={output.origin.fork_id} type={output.event.event_type}")
    else:
        print(f"main type={output.event.event_type}")
```

清理记忆：

- 使用 `reset_repl` 清理 Python runtime 变量。
- 使用 selfref history primitives 清理记忆记录，例如 `runtime.selfref.history.delete`、`runtime.selfref.history.replace`。
- 需要保留当前 system prompt 时，使用 `runtime.selfref.history.clear`。

### 返回模式

`return_mode` 参数控制返回的数据类型：

> 提示：`return_mode` 仅在 `enable_event=False` 时生效；事件流模式始终返回原始响应对象或流式 chunk。

```python
# 返回文本（默认）
@llm_chat(llm_interface=llm, stream=True, return_mode="text")
async def text_mode_chat(message: str, history=None):
    """聊天函数"""
    pass

# 返回原始响应对象（用于获取 token 使用量等详细信息）
@llm_chat(llm_interface=llm, stream=True, return_mode="raw")
async def raw_mode_chat(message: str, history=None):
    """聊天函数"""
    pass
```

### 并发聊天会话

使用 `asyncio.gather` 处理多个并发的聊天会话：

```python
async def concurrent_chats():
    """并发处理多个聊天会话"""

    @llm_chat(llm_interface=llm, stream=True)
    async def chat(message: str, history=None):
        """通用聊天函数"""
        pass

    # 定义多个会话
    sessions = [
        {"user_id": "user_1", "message": "你好"},
        {"user_id": "user_2", "message": "如何学习Python？"},
        {"user_id": "user_3", "message": "告诉我一个笑话"},
    ]

    async def handle_session(session):
        """处理单个会话"""
        history = []
        results = []

        async for chunk, updated_history in chat(session["message"], history):
            if chunk:
                results.append(chunk)
            history = updated_history

        return session["user_id"], "".join(results)

    # 并发执行所有会话
    results = await asyncio.gather(
        *[handle_session(session) for session in sessions]
    )

    for user_id, response in results:
        print(f"{user_id}: {response}\n")

asyncio.run(concurrent_chats())
```

## 最佳实践

### 1. 错误处理

```python
async def robust_chat():
    history = []
    try:
        async for chunk, updated_history in multi_turn_chat("测试", history):
            if chunk:
                print(chunk, end="", flush=True)
            history = updated_history
    except Exception as e:
        print(f"聊天出错: {e}")
```

### 2. 超时控制

```python
async def chat_with_timeout():
    history = []
    try:
        async with asyncio.timeout(30):  # Python 3.11+
            async for chunk, updated_history in multi_turn_chat("测试", history):
                if chunk:
                    print(chunk, end="", flush=True)
                history = updated_history
    except asyncio.TimeoutError:
        print("聊天超时")
```

### 3. 历史记录限制

为避免上下文过长，限制历史记录长度：

```python
MAX_HISTORY_LENGTH = 10

def trim_history(history: List[Dict[str, str]]) -> List[Dict[str, str]]:
    """保留最近的 N 条消息"""
    if len(history) > MAX_HISTORY_LENGTH:
        return history[-MAX_HISTORY_LENGTH:]
    return history

async def chat_with_limited_history():
    history = []

    messages = ["第一条消息", "第二条消息", "第三条消息"]

    for msg in messages:
        # 限制历史记录
        history = trim_history(history)

        async for chunk, updated_history in multi_turn_chat(msg, history):
            if chunk:
                print(chunk, end="", flush=True)
            history = updated_history
        print()

asyncio.run(chat_with_limited_history())
```

### 4. 日志与调试

```python
import logging

# 启用详细日志
logging.basicConfig(level=logging.DEBUG)

# SimpleLLMFunc 日志
logger = logging.getLogger("SimpleLLMFunc")
logger.setLevel(logging.DEBUG)
```

### 5. 事件流（Event Stream）

事件流是 SimpleLLMFunc v0.5.0+ 引入的高级特性，允许你实时观察 ReAct 循环的完整执行过程。

通过设置 `enable_event=True`，你可以：

- **实时监控**：观察 LLM 调用、工具调用的实时状态
- **性能分析**：获取详细的执行统计和性能指标
- **自定义 UI**：基于事件构建丰富的用户界面
- **调试支持**：深入了解 ReAct 循环的执行细节

**基本用法**：

```python
@llm_chat(llm_interface=llm, enable_event=True)
async def chat(message: str, history=None):
    """智能助手"""
    pass

# 处理事件和响应
from SimpleLLMFunc.hooks import ResponseYield, EventYield

async for output in chat("查询天气"):
    if isinstance(output, ResponseYield):
        # 原始响应对象（流式时为 chunk）。
        # 文本渲染建议使用 LLMChunkArriveEvent 的 accumulated_content。
        pass
    elif isinstance(output, EventYield):
        print(f"事件: {output.event.event_type}")
```

**详细文档**：请参考 [事件流文档](event_stream.md) 了解完整的事件类型、使用示例和最佳实践。

## 常见问题

### Q: 如何保存和恢复对话历史？

```python
import json

def save_history(history: List[Dict[str, str]], filename: str):
    """保存对话历史到文件"""
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(history, f, ensure_ascii=False, indent=2)

def load_history(filename: str) -> List[Dict[str, str]]:
    """从文件加载对话历史"""
    try:
        with open(filename, 'r', encoding='utf-8') as f:
            return json.load(f)
    except FileNotFoundError:
        return []

# 使用
history = load_history("chat_history.json")
# ... 继续对话 ...
save_history(history, "chat_history.json")
```

### Q: 如何处理 LLM 拒绝或无效响应？

```python
async def robust_chat_with_retry():
    history = []
    max_retries = 3

    for attempt in range(max_retries):
        try:
            collected = ""
            async for chunk, updated_history in multi_turn_chat("测试", history):
                if chunk:
                    collected += chunk
                history = updated_history

            if collected.strip():
                print(f"成功: {collected}")
                break
            else:
                print(f"尝试 {attempt + 1}: 收到空响应，重试...")
        except Exception as e:
            print(f"尝试 {attempt + 1} 失败: {e}")
            if attempt == max_retries - 1:
                raise
```

---

通过这些示例和最佳实践，你可以构建功能强大的对话应用。`llm_chat` 装饰器提供了简洁而强大的方式来实现复杂的对话逻辑。