Data Sources

Data sources provide knowledge for your AI agents through RAG (Retrieval Augmented Generation).

Overview

Source types:

File - Documents, PDFs, text files
URL - Web pages, APIs
Database - External databases
API - REST/GraphQL endpoints

Adding Sources

Via Dashboard

Navigate to Context
Click Add Source
Select source type
Configure and upload

Via API

bash

# File upload
curl -X POST https://your-domain.com/api/agents/{id}/sources \
  -F "type=file" \
  -F "name=Product Manual" \
  -F "file=@/path/to/manual.pdf"

# URL source
curl -X POST https://your-domain.com/api/agents/{id}/sources \
  -H "Content-Type: application/json" \
  -d '{
    "type": "url",
    "name": "Documentation",
    "url": "https://docs.example.com"
  }'

Source Types

File

Supported formats:

PDF (.pdf)
Text (.txt)
Markdown (.md)
Word (.docx)
JSON (.json)
CSV (.csv)

json

{
  "type": "file",
  "name": "User Guide",
  "file": "<binary>"
}

URL

Crawl web pages:

json

{
  "type": "url",
  "name": "Help Center",
  "url": "https://help.example.com",
  "config": {
    "depth": 2,
    "max_pages": 100
  }
}

Database

Connect to databases:

json

{
  "type": "database",
  "name": "Product Data",
  "config": {
    "connection_string": "postgresql://...",
    "query": "SELECT * FROM products"
  }
}

API

Fetch from APIs:

json

{
  "type": "api",
  "name": "CRM Contacts",
  "config": {
    "url": "https://api.crm.com/contacts",
    "method": "GET",
    "headers": {
      "Authorization": "Bearer {{api_key}}"
    },
    "schedule": "0 * * * *"
  }
}

Processing Pipeline

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│    Source    │────▶│   Extract    │────▶│    Chunk     │
│    Input     │     │   Content    │     │    Text      │
└──────────────┘     └──────────────┘     └──────┬───────┘
                                                  │
                                                  ▼
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Store in   │◀────│   Generate   │◀────│   Clean &    │
│  Vectorize   │     │  Embeddings  │     │   Normalize  │
└──────────────┘     └──────────────┘     └──────────────┘

Source Status

Status	Description
`pending`	Waiting to process
`processing`	Currently processing
`ready`	Available for search
`error`	Processing failed
`updating`	Refreshing content

Chunking Strategy

Content is split into searchable chunks:

json

{
  "chunking": {
    "method": "semantic",
    "max_size": 1000,
    "overlap": 200
  }
}

Methods:

semantic - Smart paragraph splitting
fixed - Fixed character count
sentence - Sentence boundaries

Managing Sources

List Sources

bash

curl https://your-domain.com/api/agents/{id}/sources

Get Source Details

bash

curl https://your-domain.com/api/agents/{id}/sources/{sourceId}

Delete Source

bash

curl -X DELETE https://your-domain.com/api/agents/{id}/sources/{sourceId}

Refresh Source

bash

curl -X POST https://your-domain.com/api/agents/{id}/sources/{sourceId}/refresh

Storage

R2 Storage

Files are stored in Cloudflare R2:

Automatic replication
No egress fees
Unlimited storage

Vectorize Index

Embeddings stored in Vectorize:

Fast similarity search
Automatic indexing
Scalable to millions

Integration

With Chat

Sources are automatically searched:

User: "What's the return policy?"

Agent: [Searches sources] → [Finds relevant chunks] → [Generates response]

With Workflows

json

{
  "type": "search-sources",
  "data": {
    "query": "{{input.question}}",
    "limit": 5
  }
}

Best Practices

1. Organize Sources

Group related content:

Product documentation
FAQ and support
Policies and terms

2. Keep Content Fresh

Schedule regular updates:

json

{
  "refresh_schedule": "0 0 * * *"
}

3. Optimize Chunk Size

Balance context and precision:

Larger chunks: More context
Smaller chunks: Higher precision

4. Use Metadata

Add descriptive metadata:

json

{
  "metadata": {
    "category": "support",
    "version": "2.0",
    "language": "en"
  }
}

5. Monitor Quality

Review search results:

Check relevance
Update stale content
Remove duplicates

API Reference

See Sources API for complete endpoint documentation.

Data Sources ​

Overview ​

Adding Sources ​

Via Dashboard ​

Via API ​

Source Types ​

File ​

URL ​

Database ​

API ​

Processing Pipeline ​

Source Status ​

Chunking Strategy ​

Managing Sources ​

List Sources ​

Get Source Details ​

Delete Source ​

Refresh Source ​

Storage ​

R2 Storage ​

Vectorize Index ​

Integration ​

With Chat ​

With Workflows ​

Best Practices ​

1. Organize Sources ​

2. Keep Content Fresh ​

3. Optimize Chunk Size ​

4. Use Metadata ​

5. Monitor Quality ​

API Reference ​

Data Sources

Overview

Adding Sources

Via Dashboard

Via API

Source Types

File

URL

Database

API

Processing Pipeline

Source Status

Chunking Strategy

Managing Sources

List Sources

Get Source Details

Delete Source

Refresh Source

Storage

R2 Storage

Vectorize Index

Integration

With Chat

With Workflows

Best Practices

1. Organize Sources

2. Keep Content Fresh

3. Optimize Chunk Size

4. Use Metadata

5. Monitor Quality

API Reference