API Reference

This document specifies the Agentic FinSearch OpenAI-compatible REST API. The API is synchronous (no streaming). All request and response bodies are JSON.

—

Connection 

Base URL 

The API is served by a Django backend on port 8000.

Production (Fedora droplet at 134.122.1.153, IPv4 only):

https://agenticfinsearch.org:8000

Local development:

http://localhost:8000

All endpoint paths below are relative to this base URL.

Authentication 

The API uses Bearer token authentication.

Authorization: Bearer <FINGPT_API_KEY>

The API key is set via the FINGPT_API_KEY environment variable on the server.
If FINGPT_API_KEY is not set, authentication is disabled (development mode) and all requests are accepted.
When authentication is enabled, every request to every endpoint must include the Authorization header.

Error responses (401):

{
  "error": {
    "message": "Missing Authorization header. Use: Authorization: Bearer <api_key>",
    "type": "authentication_error"
  }
}

{
  "error": {
    "message": "Invalid API key",
    "type": "authentication_error"
  }
}

Rate Limiting 

Default: 600 requests per hour per client (configurable via API_RATE_LIMIT env var).

Format: <count>/<period> where period is s (second), m (minute), h (hour), or d (day).

CORS 

CORS restrictions only apply to browser-based requests. HTTP clients (curl, requests, httpx, Postman) are unaffected.

—

Endpoints 

List Models 

Returns all available models in OpenAI-compatible format.

Method	`GET`
Path	`/v1/models`
Auth	Required (when `FINGPT_API_KEY` is set)

Response (200):

{
  "object": "list",
  "data": [
    {
      "id": "FinGPT",
      "object": "model",
      "created": 1740000000,
      "owned_by": "google",
      "permission": [],
      "root": "FinGPT",
      "parent": null
    },
    {
      "id": "FinGPT-Light",
      "object": "model",
      "created": 1740000000,
      "owned_by": "openai",
      "permission": [],
      "root": "FinGPT-Light",
      "parent": null
    },
    {
      "id": "Buffet-Agent",
      "object": "model",
      "created": 1740000000,
      "owned_by": "buffet",
      "permission": [],
      "root": "Buffet-Agent",
      "parent": null
    }
  ]
}

Response fields:

Field	Type	Description
`object`	string	Always `"list"`.
`data`	array	Array of model objects.
`data[].id`	string	Model identifier. Use this value in the `model` field of chat completion requests.
`data[].owned_by`	string	Provider name: `"google"`, `"openai"`, or `"buffet"`.

Example:

curl -H "Authorization: Bearer $API_KEY" \
     https://agenticfinsearch.org:8000/v1/models

Error responses:

401: Authentication error (see Authentication).
405: Wrong HTTP method (must be GET).

—

Available Models 

Model ID	Provider	Underlying Model	Description
`FinGPT`	google	`gemini-3-flash-preview`	Default model. 1M token context. No streaming.
`FinGPT-Light`	openai	`gpt-5.1-chat-latest`	Faster, lighter. 128k token context.
`Buffet-Agent`	buffet	Custom (Hugging Face endpoint)	Fine-tuned financial model.

All models support both thinking (MCP) and research (deep search) modes.

—

Python Benchmarking Quick Start 

Below is a complete, copy-paste-ready Python script for benchmarking the API. It tests all three modes and measures response time.

"""Agentic FinSearch API Benchmark Script."""
import requests
import time
import json

BASE_URL = "https://agenticfinsearch.org:8000"
API_KEY = "<YOUR_API_KEY>"  # omit Authorization header if auth is disabled

HEADERS = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}",
}


def call_completions(mode: str, question: str, model: str = "FinGPT", **kwargs) -> dict:
    """Send a chat completion request and return (response_dict, elapsed_seconds)."""
    payload = {
        "model": model,
        "mode": mode,
        "messages": [{"role": "user", "content": question}],
        **kwargs,
    }

    start = time.time()
    resp = requests.post(
        f"{BASE_URL}/v1/chat/completions",
        headers=HEADERS,
        json=payload,
        timeout=120,
    )
    elapsed = time.time() - start

    resp.raise_for_status()
    data = resp.json()
    return data, elapsed


def test_health():
    """Verify the server is running."""
    resp = requests.get(f"{BASE_URL}/health/", timeout=10)
    assert resp.status_code == 200
    data = resp.json()
    assert data["status"] == "healthy"
    print(f"[PASS] Health check: {data['version']}")


def test_models():
    """Verify the models endpoint returns expected models."""
    resp = requests.get(f"{BASE_URL}/v1/models", headers=HEADERS, timeout=10)
    assert resp.status_code == 200
    data = resp.json()
    model_ids = [m["id"] for m in data["data"]]
    assert "FinGPT" in model_ids
    assert "FinGPT-Light" in model_ids
    print(f"[PASS] Models: {model_ids}")


def test_thinking_mode():
    """Benchmark thinking mode (MCP tools)."""
    data, elapsed = call_completions(
        mode="thinking",
        question="What is the current price of AAPL?",
    )
    content = data["choices"][0]["message"]["content"]
    sources = data["sources"]
    print(f"[PASS] Thinking mode ({elapsed:.1f}s)")
    print(f"  Response length: {len(content)} chars")
    print(f"  Sources: {json.dumps(sources, indent=2)}")
    assert len(content) > 0
    return elapsed


def test_research_mode():
    """Benchmark research mode (deep search)."""
    data, elapsed = call_completions(
        mode="research",
        question="What are analysts saying about NVIDIA earnings?",
        search_domains=["reuters.com", "cnbc.com"],
    )
    content = data["choices"][0]["message"]["content"]
    sources = data["sources"]
    print(f"[PASS] Research mode ({elapsed:.1f}s)")
    print(f"  Response length: {len(content)} chars")
    print(f"  Sources: {len(sources)} URLs")
    for s in sources[:3]:
        print(f"    - {s.get('url', s.get('title', 'N/A'))}")
    assert len(content) > 0
    return elapsed


def test_normal_mode():
    """Benchmark normal mode (no tools, no search)."""
    data, elapsed = call_completions(
        mode="normal",
        question="Explain what a dividend yield is.",
    )
    content = data["choices"][0]["message"]["content"]
    print(f"[PASS] Normal mode ({elapsed:.1f}s)")
    print(f"  Response length: {len(content)} chars")
    assert len(content) > 0
    return elapsed


def test_error_handling():
    """Verify the API returns proper errors for bad requests."""
    # Missing mode
    resp = requests.post(
        f"{BASE_URL}/v1/chat/completions",
        headers=HEADERS,
        json={"model": "FinGPT", "messages": [{"role": "user", "content": "test"}]},
        timeout=30,
    )
    assert resp.status_code == 400
    assert "mode is required" in resp.json()["error"]["message"]

    # Invalid model
    resp = requests.post(
        f"{BASE_URL}/v1/chat/completions",
        headers=HEADERS,
        json={
            "model": "nonexistent",
            "mode": "thinking",
            "messages": [{"role": "user", "content": "test"}],
        },
        timeout=30,
    )
    assert resp.status_code == 404

    # Empty messages
    resp = requests.post(
        f"{BASE_URL}/v1/chat/completions",
        headers=HEADERS,
        json={"model": "FinGPT", "mode": "thinking", "messages": []},
        timeout=30,
    )
    assert resp.status_code == 400

    print("[PASS] Error handling: all validation errors returned correctly")


if __name__ == "__main__":
    print("=" * 60)
    print("Agentic FinSearch API Benchmark")
    print("=" * 60)

    test_health()
    test_models()
    test_error_handling()

    timings = {}
    timings["thinking"] = test_thinking_mode()
    timings["research"] = test_research_mode()
    timings["normal"] = test_normal_mode()

    print("\n" + "=" * 60)
    print("Timing Summary")
    print("=" * 60)
    for mode, t in timings.items():
        print(f"  {mode:12s}: {t:.1f}s")
    print(f"  {'TOTAL':12s}: {sum(timings.values()):.1f}s")

—

Behavioral Notes 

Statelessness 

The API is fully stateless. Each request creates a fresh session context. To maintain conversation history, the client must send the full messages array with every request.

Response Times 

Thinking mode: 5-30 seconds (depends on number of MCP tool calls).
Research mode: 15-90 seconds (depends on search depth, number of sub-queries).
Normal mode: 2-10 seconds.

Set timeout accordingly in your HTTP client (recommended: 120 seconds).

The usage field provides approximate token counts. prompt_tokens comes from the context manager’s internal counter. completion_tokens is estimated as len(response_text) // 4. These are useful for relative benchmarking but are not exact billing-grade counts.

URL Scraping 

When a url is provided, the backend scrapes it using Playwright (headless browser). The scraped content is injected into the agent’s context before response generation. This adds 2-5 seconds to the response time.

Error Safety 

The API never exposes internal error details (stack traces, file paths) to clients. All 500 errors return a generic message. Full error details are logged server-side only.

Field	Type	Description
`id`	string	Unique completion ID, prefixed with `chatcmpl-`.
`object`	string	Always `"chat.completion"`.
`created`	integer	Unix timestamp of when the response was generated.
`model`	string	The model ID used.
`choices`	array	Always contains exactly one choice (index 0).
`choices[0].message.role`	string	Always `"assistant"`.
`choices[0].message.content`	string	The generated response text (Markdown-formatted).
`choices[0].finish_reason`	string	Always `"stop"`.
`usage.prompt_tokens`	integer	Approximate prompt token count from the context manager.
`usage.completion_tokens`	integer	Approximate completion tokens (`len(content) // 4`).
`usage.total_tokens`	integer	Sum of `prompt_tokens` and `completion_tokens`.
`sources`	array	Agentic FinSearch extension. List of source objects. Structure varies by mode (see below).

Code	Type	Cause
400	`invalid_request_error`	Missing `messages`, missing `mode`, invalid `mode` value, or malformed JSON body.
401	`authentication_error`	Missing/invalid `Authorization` header or API key.
404	`invalid_request_error`	Model ID does not exist (use `GET /v1/models` to list valid IDs).
405	(plain)	Wrong HTTP method (e.g., `GET` on `/v1/chat/completions`).
500	`server_error`	Internal error. The `message` field will be generic (no stack traces are exposed). Check server logs.

API Reference

Connection 

Base URL 

Authentication 

Rate Limiting 

CORS 

Endpoints 

Health Check 

List Models 

Chat Completions 

Request Body 

Message Format 

Modes 

Response Body 

Sources Format 

Error Responses 

Available Models 

Usage Examples 

Health Check 

List Models 

Thinking Mode (MCP Tools)

Research Mode (Deep Search)

Research Mode with Domain Scoping 

With URL Context (Page Analysis)

Multi-Turn Conversation 

Normal Mode (No Tools / No Search)

Python Benchmarking Quick Start 

Behavioral Notes 

Statelessness 

Response Times 

Token Usage 

URL Scraping 

Error Safety 

Field	Type	Required	Description
`messages`	array	Yes	Array of message objects (see Message Format below). Must contain at least one message. The last message should be the user’s current question.
`mode`	string	Yes	Agent mode. One of: `"thinking"`, `"research"`, `"normal"`. See Modes below.
`model`	string	No	Model ID from `/v1/models`. Default: `"FinGPT"`. Must be an exact match (case-sensitive).
`url`	string	No	A URL to scrape and inject as page context before generating the response. Used for site-specific analysis (e.g., analyzing a Yahoo Finance stock page).
`search_domains`	array	No	List of domain strings to scope research to (research mode only). Bare domains like `"reuters.com"` are auto-prefixed with `https://`. Merged into `preferred_links`.
`preferred_links`	array	No	List of full URLs to prioritize in research (research mode only).
`user_timezone`	string	No	IANA timezone string (e.g., `"America/New_York"`). Helps the agent give time-aware responses.
`user_time`	string	No	ISO 8601 timestamp of the user’s current time (e.g., `"2026-02-22T10:30:00-05:00"`).
`user`	string	No	An opaque user identifier. When provided, the session ID is derived from it (`api_user_<user>`). When absent, each request gets a unique session.

Field	Type	Description
`role`	string	One of `"system"`, `"user"`, `"assistant"`.
`content`	string	The message text.

Mode	Behavior
`thinking`	Agentic mode. The agent uses MCP tools (SEC-EDGAR, Yahoo Finance) to gather data before responding. Best for specific financial questions. `sources` in the response will list MCP tools used (e.g., `get_stock_info`, `sec_full_text_search`).
`research`	Deep research mode. The agent decomposes the question into sub-queries, performs parallel web searches, synthesizes a comprehensive answer. Best for broad research questions. `sources` in the response will list web URLs used. Supports `search_domains` and `preferred_links` to scope research.
`normal`	Direct mode. The agent responds using its training data and any injected page context (`url` parameter) without performing web searches or using MCP tools.