2026-Q2 In discovery with 6 regulated Indian enterprises

Your AI.
Your Hardware.
Your Compliance. Forever.

OwnAI fine-tunes open-source LLMs on your regulated data and deploys them on your own hardware. No data leaves your building. No tokens to meter. No vendor lock-in.

Data Stays On-Prem Zero Token Cost DPDPA + RBI Aligned
deployment.svg — air-gapped
OwnAI air-gapped deployment Customer building containing LLM and audit log, isolated from the public internet. End users connect over the local network only. YOUR BUILDING · ON-PREM hardware/ Mac Mini · 4090 WS · L40S Qwen 3 32B + your adapter LIVE langfuse/ Audit log ALCOA+ trail qdrant/ Your docs RAG corpus Your QA · Compliance · Ops teams over your LAN · SSO via Keycloak public/ Internet cloud LLM APIs NO OUTBOUND CALLS PRODUCTION TOPOLOGY
The problem

Cloud AI was built for the internet — not for regulated India.

Data sovereignty

Every Cloud LLM Call Crosses a Border

Commercial AI APIs route your regulated data through US-hosted servers. Under DPDPA 2023, exposure can reach ₹250 Cr per Significant Personal Data breach. Phase‑3 obligations bind on 13 May 2027.

₹250 Cr Maximum penalty per significant breach, DPDPA 2023 §33(1) Schedule.

Source: DPDPA 2023 §33(1) Schedule.

Cost runaway

Per-Seat AI Bills Don't Stop

ChatGPT Enterprise for 50 users compounds to roughly ₹1.78 Cr over 5 years. OwnAI is a one-time ₹15 L–25 L + hardware + AMC at 18–22%/year of setup.

76% 5-year cost reduction in the default 50-seat scenario.

See ROI Calculator for your inputs.

How it works

Train once in the cloud. Run forever on your hardware.

Three phases. No telemetry. No call-home. Eight weeks from first call to production go-live.

01 · WEEKS 1–4

Fine-tune in cloud

Your documents → encrypted upload → cloud GPU (LoRA adapter) → adapter shipped to you → cloud data deleted within 7 days.

RunPod A100 80 GB · TLS 1.3 · signed certificate of destruction

02 · WEEKS 5–6

Deploy on your hardware

Docker bundle to your server room. Mac Mini, workstation, or 1U server. We configure on-site or remote. 8 weeks from kickoff.

Ollama / vLLM · Caddy TLS · Keycloak SSO · IQ/OQ artefacts

03 · FOREVER

Use forever, on-prem

All requests stay on your network. Audit logs in your Langfuse instance. Quarterly refreshes via AMC — customer-pulled, never pushed.

Zero outbound calls · Langfuse audit · 18–22% AMC

Read the full technical breakdown

Built on the Most Permissive Open-Source Stack

Apache 2.0 and MIT base models only. No Llama licence flow-down. No Google PUP clauses. No surprise restrictions.

Verticals

Two regulated wedges, both fully scoped.

Pre-built vertical packs — adapters, prompts, eval rubrics — for the two regulators we know best.

For Pharma & Life Sciences

AI for GMP Compliance Workflows

  • Deviation analysis
  • Batch record review
  • SOP question-answering
  • Regulatory filings (CTD / eCTD)
  • Audit preparation
  • Change control assessment
See Pharma Solution
For NBFCs & Financial Services

AI for RBI Compliance Workflows

  • Credit memo drafting
  • KYC review
  • RBI returns drafting
  • Compliance monitoring
  • Audit preparation
  • Customer query triage
See NBFC Solution
By the numbers

Engineering-grade, not marketing-grade.

0
Data packets leaving the building in production.
8wks
From first discovery call to production go-live.
76%
Typical 5-year cost reduction vs cloud LLM seats.
100%
Open-source stack. Weights and adapter are yours.
ROI calculator

See exactly how much you save.

Default scenario: 50 users vs ChatGPT Enterprise (50 seats, 5 years) = save ₹1.36 Cr (76%) over 5 years.

Methodology and assumptions disclosed on the calculator page. Numbers update annually.

Technology

100% open source. Zero vendor lock-in. You own everything.

Every component below is independently observable, replaceable, and licensed permissively. No closed-source artefacts run in production.

Qwen 3 32B
Default base model — strong on multilingual reasoning, instruction-following.
Apache 2.0
DeepSeek-R1-Distill-Qwen-32B
Reasoning lane — chain-of-thought for deviation root-cause work.
MIT
Ollama / vLLM
Inference runtime — Ollama on Apple Silicon, vLLM on GPU servers.
MIT / Apache 2.0
LiteLLM
OpenAI-compatible gateway. /v1/chat/completions, /v1/embeddings.
MIT
Qdrant
Vector DB and RAG. On-disk, snapshot-friendly, your documents.
Apache 2.0
Langfuse
Audit and tracing — every prompt, every response, hash-anchored.
MIT
Open WebUI
Chat UI plus per-vertical workflow surfaces.
MIT
Keycloak
SSO via SAML/OIDC. RBAC. MFA. Your IdP, your roles.
Apache 2.0
Prometheus + Grafana
Latency, error rate, token throughput, GPU temp on dashboards you own.
Apache 2.0
Compliance

Aligned with the four frameworks your auditors will ask about.

DPDPA 2023
Processor by default. Phase 3 obligations binding on 13 May 2027.
Schedule M / ALCOA+
Immutable prompt/response log. Frozen production models.
21 CFR Part 11
RBAC, audit-trail integrity, IQ/OQ/PQ artefacts shipped.
RBI IT Outsourcing MD
Source-code escrow. 6-hour incident reporting. Audit rights.
What we will never do

Four promises we put in writing.

We never use your data to improve any other customer's model.

We never call home from production. No telemetry. No licence pings.

We never retain your weights or adapter post-handover.

We never lock you in to a proprietary base model.

These are clauses, not slogans — they live in §6 of every SOW.

Frequently asked

Questions evaluators ask in the first call.

Is my data used to train other models?
No. Never. Customer data is processed under your DPA only; using it to improve any other model would convert us from Processor to Joint Fiduciary, which we explicitly avoid.
Which base models do you use, and why?
Qwen 3 32B (Apache 2.0) by default; DeepSeek-R1-Distill-Qwen-32B (MIT) for reasoning lanes; Phi-4 (MIT) for low-RAM deployments. We deliberately do not use Llama, Mistral MRL, or Gemma — their licences impose flow-down obligations or restrictions that complicate paid redistribution.
What about hallucinations?
Three layers of mitigation: (a) fine-tuning on your authoritative data narrows behaviour; (b) Qdrant RAG grounds answers in your document corpus with citations; (c) every prompt/response is logged to Langfuse on your hardware, so anomalies are auditable. We publish per-task evaluation scores during the pilot.
What if the hardware fails?
Standard manufacturer warranty applies. AMC includes a spare-parts plan with named SLA. Models and adapters are version-controlled — a replacement box is operational within the SLA window once hardware arrives.
How long does deployment take?
4-week pilot, 8 weeks from pilot kickoff to production go-live.
What's the minimum team size?
1–5 users on a Mac Mini M4 Pro 64 GB (₹2.4 L). 5–15 on a 4090 workstation (₹4.2 L). 15–50 on an L40S server (₹8 L). We size hardware to your concurrency target, not your headcount.
Can our auditors inspect the system?
Yes. The system runs on your infrastructure. RBI, WHO, FDA, or CDSCO auditors can inspect it directly. We provide IQ/OQ/PQ artefacts, data flow maps, and access-control reports.
What happens if Reyatech Systems ceases to operate?
Source-code escrow with a neutral third party is included from the SOW. The base models are open source. Adapter weights are yours under §7 of the SOW. The entire stack runs without any outbound dependency on us.
Pilot terms

Ready to own your AI?

4-week pilot. Objective eval criteria agreed up front. If the pilot doesn't meet the eval bar we set together, you pay zero — documented in §4 of the pilot SOW.

Free PoC Guarantee Data Stays On-Prem Apache 2.0 / MIT only