Skip to content

Data Sources

Data sources provide knowledge for your AI agents through RAG (Retrieval Augmented Generation).

Overview

Source types:

  • File - Documents, PDFs, text files
  • URL - Web pages, APIs
  • Database - External databases
  • API - REST/GraphQL endpoints

Adding Sources

Via Dashboard

  1. Navigate to Context
  2. Click Add Source
  3. Select source type
  4. Configure and upload

Via API

bash
# File upload
curl -X POST https://your-domain.com/api/agents/{id}/sources \
  -F "type=file" \
  -F "name=Product Manual" \
  -F "file=@/path/to/manual.pdf"

# URL source
curl -X POST https://your-domain.com/api/agents/{id}/sources \
  -H "Content-Type: application/json" \
  -d '{
    "type": "url",
    "name": "Documentation",
    "url": "https://docs.example.com"
  }'

Source Types

File

Supported formats:

  • PDF (.pdf)
  • Text (.txt)
  • Markdown (.md)
  • Word (.docx)
  • JSON (.json)
  • CSV (.csv)
json
{
  "type": "file",
  "name": "User Guide",
  "file": "<binary>"
}

URL

Crawl web pages:

json
{
  "type": "url",
  "name": "Help Center",
  "url": "https://help.example.com",
  "config": {
    "depth": 2,
    "max_pages": 100
  }
}

Database

Connect to databases:

json
{
  "type": "database",
  "name": "Product Data",
  "config": {
    "connection_string": "postgresql://...",
    "query": "SELECT * FROM products"
  }
}

API

Fetch from APIs:

json
{
  "type": "api",
  "name": "CRM Contacts",
  "config": {
    "url": "https://api.crm.com/contacts",
    "method": "GET",
    "headers": {
      "Authorization": "Bearer {{api_key}}"
    },
    "schedule": "0 * * * *"
  }
}

Processing Pipeline

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│    Source    │────▶│   Extract    │────▶│    Chunk     │
│    Input     │     │   Content    │     │    Text      │
└──────────────┘     └──────────────┘     └──────┬───────┘


┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Store in   │◀────│   Generate   │◀────│   Clean &    │
│  Vectorize   │     │  Embeddings  │     │   Normalize  │
└──────────────┘     └──────────────┘     └──────────────┘

Source Status

StatusDescription
pendingWaiting to process
processingCurrently processing
readyAvailable for search
errorProcessing failed
updatingRefreshing content

Chunking Strategy

Content is split into searchable chunks:

json
{
  "chunking": {
    "method": "semantic",
    "max_size": 1000,
    "overlap": 200
  }
}

Methods:

  • semantic - Smart paragraph splitting
  • fixed - Fixed character count
  • sentence - Sentence boundaries

Managing Sources

List Sources

bash
curl https://your-domain.com/api/agents/{id}/sources

Get Source Details

bash
curl https://your-domain.com/api/agents/{id}/sources/{sourceId}

Delete Source

bash
curl -X DELETE https://your-domain.com/api/agents/{id}/sources/{sourceId}

Refresh Source

bash
curl -X POST https://your-domain.com/api/agents/{id}/sources/{sourceId}/refresh

Storage

R2 Storage

Files are stored in Cloudflare R2:

  • Automatic replication
  • No egress fees
  • Unlimited storage

Vectorize Index

Embeddings stored in Vectorize:

  • Fast similarity search
  • Automatic indexing
  • Scalable to millions

Integration

With Chat

Sources are automatically searched:

User: "What's the return policy?"

Agent: [Searches sources] → [Finds relevant chunks] → [Generates response]

With Workflows

json
{
  "type": "search-sources",
  "data": {
    "query": "{{input.question}}",
    "limit": 5
  }
}

Best Practices

1. Organize Sources

Group related content:

  • Product documentation
  • FAQ and support
  • Policies and terms

2. Keep Content Fresh

Schedule regular updates:

json
{
  "refresh_schedule": "0 0 * * *"
}

3. Optimize Chunk Size

Balance context and precision:

  • Larger chunks: More context
  • Smaller chunks: Higher precision

4. Use Metadata

Add descriptive metadata:

json
{
  "metadata": {
    "category": "support",
    "version": "2.0",
    "language": "en"
  }
}

5. Monitor Quality

Review search results:

  • Check relevance
  • Update stale content
  • Remove duplicates

API Reference

See Sources API for complete endpoint documentation.

Released under the MIT License.