How OwnAI Works.

From first call to a permanent AI asset on your infrastructure — in 8 weeks.

Architecture, deployment flow, security model, and the AMC lifecycle. Designed to be readable by the architect who will defend it in an audit.

READ TIME8–12 min LAST REVIEWED2026-Q2 FORWARD-FRIENDLYyes
§2 · Architecture

A 3-layer architecture, deliberately separated.

Layer 1 is what runs on your hardware. Layer 2 is ephemeral cloud compute we tear down after every fine-tuning run. Layer 3 is the artefact — adapter plus eval report — shipped from Layer 2 to Layer 1.

Layer 3 Vertical packs Deployed on your hardware

Pre-fine-tuned adapters for the customer's use cases. Versioned in your model registry, swappable per workflow.

Pharma GMP pack NBFC pack Additional verticals · roadmap
Adapter + eval report shipped, signed
Layer 2 Tuning factory Reyatech-internal · ephemeral cloud · never co-resident

Runs in cloud, off-customer-hardware. Spun up per run, torn down after handover. Customer data deleted within 7 days.

Axolotl LoRA DeepSpeed / FSDP Eval harness Model registry
Docker bundle on your hardware
Layer 1 Deployment runtime Customer hardware · no outbound calls

The Docker bundle that ships to your server room. Every component is open-source, replaceable, and independently observable.

Ollama / vLLM LiteLLM Qdrant Langfuse Open WebUI Keycloak Prometheus Grafana Caddy Restic

Layer 1 is what runs on your hardware. Layer 2 is ephemeral cloud compute we tear down after every fine-tuning run. Layer 3 is the artefact (adapter + eval report) shipped from Layer 2 to Layer 1. Each layer is auditable independently.

Forwarding this internally? Get the architecture PDF.
Email-gated. Single-page, print-friendly. No marketing follow-up.
Download Architecture PDF
§3 · Deployment flow

Eight weeks. Four phases. Named deliverables at each gate.

Each phase declares customer responsibility, Reyatech responsibility, and a deliverable artefact. If a phase doesn't produce its deliverable, the next phase doesn't start.

01 Weeks 1–2

Discovery + data intake

Customer

Provides sample SOPs / batch records / credit memos under NDA. Names champion, auditor, IT contact.

Reyatech

Scopes use cases. Defines eval criteria with customer. Finalises pilot SOW with eval pass-bar.

Deliverable

Signed pilot SOW + eval rubric.

02 Weeks 3–4

Fine-tuning + evaluation

Customer

Reviews intermediate eval reports.

Reyatech

Runs LoRA training on cloud GPU. Tears down compute. Deletes training data per §7-day policy.

Deliverable

Adapter file + eval report + certificate of training-data destruction.

03 Weeks 5–6

Deploy + integrate

Customer

Provisions hardware (or accepts shipped bundle). Opens internal firewall to allowed sources.

Reyatech

Deploys Docker bundle. Configures Keycloak with the customer's IdP. Wires Langfuse to the customer's SIEM.

Deliverable

Production system reachable from customer LAN + IQ/OQ artefacts.

04 Weeks 7–8

UAT + go-live

Customer

Runs user-acceptance testing against the agreed eval rubric.

Reyatech

Triages UAT findings. Runs PQ. Signs go-live cert.

Deliverable

PQ certificate · runbook · handover dossier · AMC start date.

§4 · Stack

The complete component map, by function.

Every component is open-source, replaceable, and independently observable. No closed-source artefacts run in production.

Inference02 · runtimes
Ollama macOS · Mac Mini / Studio. Single-binary, GGUF format.
vLLM Linux GPU, single-node. PagedAttention, continuous batching.
API gateway01 · component
LiteLLM OpenAI-compatible REST. /v1/chat/completions, /v1/embeddings. Routing, fallbacks, rate limits.
Vector DB / RAG01 · component
Qdrant On-disk, snapshot-friendly. Hybrid search. Payload filtering for per-RBAC retrieval.
Audit / tracing01 · component
Langfuse Self-hosted. SIEM-forwarding via webhook. Hash-anchored entries. ALCOA+ ready.
UI02 · surfaces
Open WebUI Chat surface. Source citations. Per-conversation context windows.
Vertical pack UIs Custom workflow UIs per vertical pack — deviation triage, KYC review, etc.
Auth / authz01 · component
Keycloak SSO via SAML / OIDC. RBAC. MFA. Federates to your existing IdP.
Observability02 · components
Prometheus + Grafana Latency, error rate, token-throughput, GPU temp. Pre-built dashboards. Alertmanager-ready.
Reverse proxy / TLS01 · component
Caddy Auto-cert against internal CA or Let's Encrypt. HTTP/3. Forward-friendly config.
Backup01 · component
Restic Incremental, encrypted, off-site to customer-chosen target (S3 / NAS / cold storage).
§5 · Hardware

Three configurations. Sized to concurrency, not headcount.

All measurements taken on Qwen 3 32B with 8 K context. Capex shown pre-GST in INR.

Hardware options · Qwen 3 32B @ 8 K context · 10 tok/s user baseline
Configuration Users Model fit Sustained tok/s Power (W typ / peak) Annual electricity (₹ @ ₹8/kWh) Capex (pre-GST)
Mac Mini M4 Pro 64 GBapple-silicon · inference only 1–5 32B Q4 14–18 35 / 80 ~₹3.5 K ₹2.4 L
RTX 4090 Workstation 64 GBsingle-gpu · workstation form-factor 5–15 32B Q4–Q8 38–55 320 / 600 ~₹26 K ₹4.2 L
L40S 48 GB Server (1U)rack-mount · datacentre-ready 15–50 32B FP16 60–85 350 / 700 ~₹30 K ₹8 L

Sustained tok/s measured on Qwen 3 32B with 8 K context — batch 1 for Mac, batch 4 for 4090, batch 8 for L40S. Source: research/Reyatech on-prem LLM TCO.md.

§6 · Fine-tuning

Your documents go in. A model that speaks your language comes out.

For non-ML evaluators. Three claims, each with the engineering detail behind it.

01

We use LoRA (low-rank adaptation). The base model stays unchanged; only a small adapter file (~50–200 MB) is trained. Efficient, reversible, auditable.

02

Training runs on a cloud GPU we rent per-run (currently RunPod A100 80 GB; vendor-substitutable). Your data is encrypted in transit (TLS 1.3), used only for the run, and deleted within 7 days. A signed certificate of destruction is part of every engagement.

03

Typical fine-tuning cost: ₹500–₹3,000 per training run, paid by us as part of the project fee. You see the eval report from every run — pass-rate per task, regression deltas, sample outputs.

The adapter is yours under §7 of the SOW. The base model is open source. We retain no weights after handover.

§7 · Security

Two phases, two threat models, two diagrams.

Training is ephemeral and cloud-bound. Production is steady-state and on-prem. The two never touch.

Training phase EPHEMERAL · CLOUD

Data leaves only on encrypted upload. Cloud volume destroyed within 7 days. Adapter exported back to customer; certificate of destruction signed.

Training-phase data flow diagram CUSTOMER storage/ Source docs storage/ Adapter file REYATECH CLOUD · EPHEMERAL runpod/a100-80g LoRA training Axolotl · DeepSpeed Encrypted volume Destroyed ≤ 7 days TLS 1.3 upload adapter
cert-of-destruction.pdf · signed within 7 days of adapter delivery

Production phase STEADY-STATE · ON-PREM

All traffic stays on the customer LAN. Zero outbound calls to the internet are required for inference. Air-gap deployment is supported.

Production-phase request flow CUSTOMER LAN · AIR-GAP READY End user caddy/ TLS proxy litellm/ Gateway inference/ Ollama / vLLM + your adapter langfuse/ Audit log NO EGRESS
100% on-prem · zero outbound calls · air-gap supported
Security architecture, single PDF for your CISO.
Diagrams, threat model, control mappings. Email-gated, no follow-up sequence.
Download Security PDF
§8 · AMC

Annual maintenance — what's actually in it.

Two columns: included by default, and out-of-scope items that we price separately so there's no ambiguity.

Included

Per AMC tier · 18–22% of setup fee / year
  • Quarterly model refresh. New customer data → re-train → eval → staged rollout with rollback plan.
  • Stack upgrades. Tracked Qwen / DeepSeek / Phi releases evaluated and migrated within one quarter of upstream GA.
  • Priority response. Severity 1 / 2 / 3 per AMC SLA. Named on-call engineer during business hours.
  • Quarterly review. Usage, performance, eval-rubric drift. Recommendations documented.
  • CVE monitoring + patching for every open-source dependency in the bundle.

Out of scope

Priced separately · no surprise invoices
  • New use cases beyond the original scope — separate SOW.
  • New verticals — separate pricing track.
  • Hardware replacement beyond manufacturer warranty.
  • Integration with new internal systems — separate SOW.
Next step

Talk to engineering, not sales.

A 30-minute technical deep-dive with the founder. Bring your architecture questions; we'll bring the runbook, IQ/OQ template, and the SOW redlines.