# TRAINING GUIDE: ORBIT INDUSTRIAL RAG SYSTEMS

## 📋 Architectural Overview
The Orbit AI Co-Pilot leverages a **Retrieval-Augmented Generation (RAG)** pipeline to surface precise regulatory and operational specifications without generating hallucinations. 
By combining physical sensor telemetry with our local knowledge base repositories, the co-pilot satisfies high-priority enterprise queries.

---

## 🛠️ Ingestion & Document Pre-processing Pipeline
To ingest new manuals, intervention records, or compliance standards, follow this pipeline:
1. **Document Formats**: Supported inputs include `.pdf`, `.md`, `.json`, `.csv`, `.txt`, and `.xlsx`.
2. **Text Extraction**:
   - For Markdown/text files, use standard ASCII decoding.
   - For PDFs, run OCR scanners to extract hidden metadata from wiring schematics and charts.
3. **Chunking Strategy**:
   - Enforce **Hierarchical Chunking** with a token length of **512 tokens** and a **10% overlap (51 tokens)**.
   - Inject context-aware prefixes (e.g., `[CONTEXT: Power Quality / Harmonics / Feeder A]`) to every vector chunk.

---

## 🛰️ Embeddings & Vector Storage
- **Embedding Model**: `orbit-industrial-embeddings-v4` (384-dimensional vector space optimized for thermodynamic and electrical engineering terminology).
- **Vector Store**: Local secure index with Cosine Similarity routing:
  $$Cosine\ Similarity = \frac{A \cdot B}{\|A\| \|B\|}$$
- **Telemetry Hybrid Search**: The retriever combines keyword sparse searches (BM25) with vector dense searches to isolate technical manuals alongside live sub-second telemetry anomalies.
