web-augmented-generation

This Node.js application performs web-augmented generation using various LLM providers and web search results from SearXNG.

A Node.js application that performs web-augmented generation using web search results from SearXNG and various LLM providers via OpenAI-compatible API calls. It rephrases user queries for better web searching, searches with SearXNG, then fetches and summarizes content from the results before generating a response. It supports streaming responses, content similarity checking and repetition detection, detailed logging, and an interactive CLI. Multiple LLM providers are supported, including Ollama, together.ai, and llama.cpp, and it can apply semantic chunking to scraped page content for higher-quality answers.

Setup

Clone the repository:

git clone https://github.com/jparkerweb/web-augmented-generation.git
cd web-augmented-generation

Install dependencies:
```
npm ci
```
Copy the .env.example file to .env:
```
cp .env.example .env
```
Edit the .env file and update the values as needed:

######################
## General Settings ##
######################
NUM_URLS=10                                                           # Number of URLs to fetch
SEARXNG_URL=https://searx.be/                                         # URL of the SearXNG server
SEARXNG_URL_EXTRA_PARAMETER="key=optional_auth_key_here&language=en"  # Extra parameter for SearXNG URL
SEARXNG_FORMAT=html                                                   # Format for SearXNG results (html or json)
FETCH_TIMEOUT_MS=5000                                                 # Timeout for fetching URLs
DISABLE_SSL_VALIDATION=true                                           # Whether to disable SSL validation

##################
## LLM Settings ##
##################
LLM_STREAM_RESPONSE=true                             # Whether to stream the LLM response

# Ollama Local Configuration
LLM_BASE_URL=http://localhost:11434/v1               # Base URL for the LLM API (OpenAI format)
LLM_API_KEY=ollama!!!                                # API key for the LLM (use 'ollama' for Ollama)
LLM_MODEL=llama3.2:1b                                # Model to use with the LLM API

####################################
## Scraped Page Content Settings ##
####################################

# Semantic Chunking Settings
CHUNK_CONTENT=true                                   # Enable semantic chunking for better quality answers
CHUNK_CONTENT_USE_HYBRID_FALLBACK=true               # Enable hybrid mode to fallback to summarization if no chunks found
## The following parameters are only used by the `chunk-match` library (if CHUNK_CONTENT is set to true)
CHUNK_CONTENT_MAX_RESULTS=10
CHUNK_CONTENT_MIN_SIMILARITY=0.375
CHUNK_CONTENT_MAX_TOKEN_SIZE=500
CHUNK_CONTENT_SIMILARITY_THRESHOLD=0.4
CHUNK_CONTENT_DYNAMIC_THRESHOLD_LOWER_BOUND=0.3
CHUNK_CONTENT_DYNAMIC_THRESHOLD_UPPER_BOUND=0.5
CHUNK_CONTENT_NUM_SIMILARITY_SENTENCES_LOOKAHEAD=3
CHUNK_CONTENT_COMBINE_CHUNKS=true
CHUNK_CONTENT_COMBINE_CHUNKS_SIMILARITY_THRESHOLD=0.5
CHUNK_CONTENT_ONNX_EMBEDDING_MODEL="Xenova/all-MiniLM-L6-v2"
CHUNK_CONTENT_DTYPE="q8"

# Raw Content Settings (used when CHUNK_CONTENT=false)
WEB_PAGE_CONTENT_MAX_LENGTH=1000                     # Maximum length of raw page content to send to LLM

Alternative LLM Provider Configurations:

# together.ai Configuration
LLM_BASE_URL=https://api.together.xyz/v1
LLM_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
LLM_MODEL=meta-llama/Llama-3.2-3B-Instruct-Turbo

# llama.cpp Configuration
LLM_BASE_URL=http://localhost:8080/v1
LLM_API_KEY=not-needed
LLM_MODEL=not-needed

# OpenRouter Configuration
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
LLM_MODEL=google/gemini-pro-1.5-exp

# Google AI Studio Configuration
LLM_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/
LLM_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
LLM_MODEL=gemini-exp-1121

The configuration includes:

General settings for web search and content fetching
LLM provider settings with support for multiple providers
Content processing settings with semantic chunking options
Raw content handling parameters

web-augmented-generation

Setup

Related \\

bedrock-proxy-endpoint \\

bedrock-wrapper \\

down-craft \\

bedrock-proxy-endpoint

bedrock-wrapper

down-craft