AI Smart Chat for Retails — AWS Cloud Architecture

Enterprise RAG-based AI chatbot powered by OpenViking context database & NVIDIA NemoClaw agent framework with Amazon Bedrock (Claude Sonnet 4.6) — Omnichannel retail deployment across Web, Mobile & WhatsApp

Region: ap-southeast-5 (Malaysia) Services: 22 Date: March 2026 Architecture: Multi-AZ High Availability AI Stack: OpenViking + NemoClaw + Bedrock
External Actors
🛒
Retail Customers
Web, Mobile & WhatsApp
💬
WhatsApp Business
Meta WhatsApp API
🏢
POS / ERP Systems
Inventory & Pricing Feeds
🛠️
Admin & Content Team
Knowledge Base Management
☁ AWS Cloud
Edge Services
Amazon Route 53
Amazon Route 53
DNS & Health Checks
Amazon CloudFront
Amazon CloudFront
CDN & WebSocket
AWS WAF
AWS WAF
Web Application Firewall
AWS Region (ap-southeast-5 — Malaysia)
API Layer
Amazon API Gateway
Amazon API Gateway
REST & WebSocket APIs
Amazon Cognito
Amazon Cognito
Customer Identity
Amazon VPC (10.0.0.0/16)
Availability Zone 1 (apse5-az1)
Public Subnet (10.0.1.0/24)
Application LB
Application LB
Internal Load Balancer
Private Subnet (10.0.10.0/24)
NemoClaw on EKS
NemoClaw on EKS
AI Agent Framework
OpenViking on EKS
OpenViking on EKS
Context Database
Data Subnet (10.0.20.0/24)
Amazon Aurora
Amazon Aurora
PostgreSQL Database
Amazon ElastiCache
Amazon ElastiCache
Redis Session Cache
Availability Zone 2 (apse5-az2)
Public Subnet (10.0.2.0/24)
NAT Gateway
NAT Gateway
Outbound Internet
Private Subnet (10.0.11.0/24) — HA Replicas
NemoClaw (AZ2)
NemoClaw (AZ2)
HA Agent Replicas
OpenViking (AZ2)
OpenViking (AZ2)
HA Context Replicas
AI / ML Services — Inference & Search
Amazon Bedrock
Amazon Bedrock
Claude Sonnet 4.6 LLM
Amazon OpenSearch
Amazon OpenSearch
Vector & Text Search
Integration & Data Ingestion
Amazon S3
Amazon S3
Knowledge Base Store
Amazon EventBridge
Amazon EventBridge
Event Bus
AWS Lambda
AWS Lambda
Data Processing
Amazon SQS
Amazon SQS
Message Queue
Amazon SNS
Amazon SNS
Notifications
Security, Encryption & Monitoring
AWS KMS
AWS KMS
Encryption Keys
Secrets Manager
Secrets Manager
Secret Storage
CloudWatch
CloudWatch
Monitoring & Logs
GuardDuty
GuardDuty
Threat Detection
CloudTrail
CloudTrail
API Audit Trail
Architecture Flow
StepSourceTargetDescription
1 End Users (Web/Mobile/WhatsApp) Amazon Route 53 DNS resolution — customers access chat via web, mobile, or WhatsApp
2 WhatsApp Business API Amazon API Gateway WhatsApp Business API sends webhooks to API Gateway
3 Amazon Route 53 Amazon CloudFront Route 53 routes to CloudFront CDN for static assets & WebSocket upgrade
4 Amazon CloudFront AWS WAF WAF inspects all incoming requests for threats and bot protection
5 AWS WAF Amazon API Gateway Clean traffic forwarded to API Gateway (REST + WebSocket)
6 Amazon API Gateway Application Load Balancer API Gateway routes chat requests to internal ALB
7 Application Load Balancer NemoClaw on EKS ALB distributes to NemoClaw agent framework on EKS
8 NemoClaw on EKS OpenViking on EKS NemoClaw queries OpenViking context DB for agent memory, skills & resources
9 OpenViking on EKS Amazon S3 OpenViking retrieves retail documents from S3 knowledge store
10 NemoClaw on EKS Amazon Bedrock (Claude Sonnet 4.6) NemoClaw sends enriched prompt to Amazon Bedrock (Claude Sonnet 4.6)
11 Amazon Bedrock NemoClaw on EKS Bedrock returns LLM response — NemoClaw applies safety guardrails
12 NemoClaw on EKS Amazon ElastiCache Redis Session state & conversation cache stored in ElastiCache Redis
13 NemoClaw on EKS Amazon Aurora PostgreSQL Chat history, user profiles & analytics persisted to Aurora PostgreSQL
14 POS / ERP Systems Amazon EventBridge POS/ERP systems push real-time inventory & pricing events
15 Amazon EventBridge AWS Lambda EventBridge triggers Lambda for data ingestion & transformation
16 AWS Lambda Amazon S3 Lambda processes and stores documents in S3 knowledge base
17 Amazon OpenSearch Serverless OpenViking on EKS OpenSearch provides vector search for OpenViking hybrid retrieval
18 Amazon SQS AWS Lambda SQS queues async tasks — bulk ingestion, analytics, notifications
Architecture Deep-Dive

Click any section below to expand detailed architecture documentation for CTO-level review.

Section 1: NemoClaw Agent Framework — Architecture & Deployment

What is NemoClaw?

NVIDIA NemoClaw is an open-source AI agent platform that adds enterprise-grade privacy and security controls to OpenClaw. Announced at GTC 2026, NemoClaw simplifies running always-on autonomous AI assistants with a single command. As part of the NVIDIA Agent Toolkit, it installs the NVIDIA OpenShell runtime — a secure environment for running autonomous agents.

NemoClaw Stack Components on EKS

  • OpenShell Gateway: Secure entry point that authenticates incoming chat requests and routes to the correct agent sandbox. Deployed as a Kubernetes Service with ALB ingress.
  • Agent Sandbox: Isolated execution environment per chat session. Each retail customer conversation gets its own sandboxed agent with restricted network policies — can only reach OpenViking, Bedrock, and ElastiCache endpoints.
  • Inference Provider: Configured to route all LLM calls to Amazon Bedrock (Claude Sonnet 4.6). NemoClaw abstracts the inference backend, allowing future model swaps without code changes.
  • Network Policy Engine: Kubernetes NetworkPolicies enforced by NemoClaw restricting agent egress to approved endpoints only. Prevents data exfiltration and prompt injection attacks that attempt external callbacks.

Retail-Specific Agent Configuration

The NemoClaw agent is configured with retail-specific skills and tools:

  • Product Search Tool: Calls OpenViking L2 context (OpenSearch-backed) to find products matching customer queries with semantic understanding.
  • Order Lookup Tool: Queries Aurora PostgreSQL via a secured API to retrieve order status, tracking information, and delivery estimates.
  • Inventory Check Tool: Real-time inventory availability check via EventBridge-synced data in ElastiCache for instant response.
  • Return/Refund Tool: Initiates return workflows by creating tickets in the ERP system via EventBridge, with customer confirmation flow.
  • Loyalty Points Tool: Queries and updates loyalty program balance, tier status, and available rewards from Aurora.
Customer Query OpenShell Gateway Agent Sandbox OpenViking Context Bedrock LLM Safety Guardrails Response
Section 2: OpenViking Context Database — RAG Architecture

Why OpenViking Over Traditional RAG?

Traditional RAG architectures fragment context across multiple vector stores, creating management complexity and inconsistent retrieval quality. OpenViking (by Volcengine/ByteDance) introduces a file-system paradigm that unifies all agent context — memory, resources, and skills — into a single hierarchical system. This dramatically simplifies the retail knowledge management pipeline.

Three-Tier Context Loading (L0/L1/L2)

  • L0 — Always Loaded (System Context, ~5MB): Retail brand identity, system prompt, greeting templates, language settings (Malay, English, Mandarin), response tone guidelines, compliance rules (Malaysia PDPA). Loaded once at agent initialization.
  • L1 — Session Loaded (Customer Context, ~50MB): Customer profile (name, loyalty tier, preferred language, purchase history), active shopping cart, recent orders (last 30 days), ongoing support tickets, previous chat transcripts (last 5 sessions). Loaded when customer authenticates.
  • L2 — On-Demand (Knowledge Base, unlimited): Full product catalog (100K+ SKUs), return/refund policies, promotional offers, store locations, supplier documentation, training manuals, compliance documents. Retrieved via OpenSearch hybrid search only when needed.

File-System Organization

OpenViking organizes retail knowledge in a familiar file-system structure:

  • /memory/sessions/ — Conversation history per customer, auto-pruned after 90 days
  • /memory/preferences/ — Learned customer preferences (size, color, brand affinities)
  • /resources/catalog/ — Product catalog with daily sync from ERP
  • /resources/policies/ — Return, shipping, warranty, loyalty program policies
  • /resources/promotions/ — Active promotions with start/end dates
  • /resources/stores/ — Store locations, hours, inventory per location
  • /skills/order-lookup/ — Tool definition for order status retrieval
  • /skills/inventory-check/ — Tool definition for real-time stock check
  • /skills/return-initiate/ — Tool definition for return workflow

Self-Evolving Capability

OpenViking learns from successful interactions. When a customer rates a chat session positively, OpenViking analyzes the context selection that led to that success and adjusts its retrieval weightings. Over time, the system becomes increasingly accurate at selecting the right context for each type of retail query — whether it's a product recommendation, order tracking, or policy clarification.

Section 3: Omnichannel Integration — Web, Mobile & WhatsApp

Channel Architecture

The AI Smart Chat platform supports three primary channels, all converging to the same NemoClaw agent backend. This ensures consistent responses regardless of channel while respecting channel-specific UI constraints.

Web Chat Widget

  • Technology: React-based widget bundle served from CloudFront (S3 origin). Single JavaScript embed tag for retailer's website.
  • Connection: WebSocket via API Gateway for real-time bidirectional messaging. Automatic reconnection with exponential backoff.
  • Features: Rich media (product cards with images, carousels), typing indicators, read receipts, file upload for returns (photos), multilingual UI (EN/MS/ZH).
  • Authentication: Cognito-backed login (social + email) or anonymous guest sessions with progressive profiling.

Mobile SDK (iOS & Android)

  • Technology: Native SDKs wrapping the WebSocket API. Push notifications via Amazon SNS for message delivery when app is backgrounded.
  • Connection: Same WebSocket API as web, with JWT authentication from Cognito mobile SDK.
  • Features: Native rich message rendering, deep links to product pages, biometric authentication support, offline message queuing.

WhatsApp Business Integration

  • Technology: Meta WhatsApp Business API → API Gateway REST endpoint → Lambda adapter → NemoClaw.
  • Connection: Webhook-based (not persistent). WhatsApp sends POST to /webhook/whatsapp, Lambda translates to internal format, queues response via SQS FIFO for ordered delivery.
  • Features: Text messages, product catalog messages (WhatsApp Commerce), quick reply buttons, list messages for product selection, location sharing for nearest store.
  • Compliance: WhatsApp 24-hour messaging window respected. Template messages for outbound (order updates, delivery notifications) pre-approved by Meta.
Web Widget (WS) API Gateway NemoClaw
Mobile SDK (WS) API Gateway NemoClaw
WhatsApp (Webhook) API GW REST Lambda Adapter NemoClaw
Section 4: Data Ingestion & Knowledge Pipeline

Enterprise Data Sources

The retail AI chat ingests data from multiple enterprise sources to maintain a comprehensive, up-to-date knowledge base that OpenViking serves to NemoClaw agents.

Real-Time Data Pipeline (EventBridge → Lambda)

  • Inventory Updates: POS/ERP pushes inventory change events every 5 minutes via EventBridge. Lambda updates ElastiCache (instant availability) and triggers OpenSearch re-indexing for affected products.
  • Price Changes: Dynamic pricing events from ERP update both ElastiCache (real-time price display) and S3 catalog files (OpenViking resource refresh).
  • New Products: Product creation events trigger full pipeline: (1) Store metadata in Aurora, (2) Upload images/descriptions to S3, (3) Generate embeddings via Bedrock Titan, (4) Index in OpenSearch, (5) Update OpenViking /resources/catalog/.
  • Promotion Activation: Marketing team activates promotions via admin portal → EventBridge → Lambda updates OpenViking /resources/promotions/ with start/end dates and eligibility rules.

Batch Data Pipeline (S3 → Lambda → OpenSearch)

  • Document Upload: Admin team uploads PDFs (policies, manuals, supplier docs) to S3 knowledge-base bucket.
  • Processing: S3 Event → Lambda → Amazon Textract (PDF text extraction) → Bedrock Titan Embeddings → OpenSearch vector index.
  • Chunking Strategy: 512-token chunks with 50-token overlap. Metadata preserved: document type, category, effective date, version.
  • Quality Gate: Lambda validates extracted text quality (OCR confidence > 95%) before indexing. Failed documents queued for manual review.
POS/ERP EventBridge Lambda S3 + OpenSearch OpenViking L2
Section 5: Security, Compliance & Malaysia PDPA

Data Protection (Malaysia PDPA Compliance)

The Personal Data Protection Act 2010 (PDPA) governs how personal data is collected, processed, and stored in Malaysia. This architecture implements comprehensive controls to ensure full compliance.

  • Data Residency: All customer data remains in ap-southeast-5 (Malaysia). No cross-region replication. S3 bucket policies and Aurora cluster configuration enforce regional containment.
  • Encryption at Rest: All data stores encrypted with AWS KMS customer-managed keys: Aurora (AES-256), S3 (SSE-KMS), ElastiCache (AES-256), OpenSearch (AES-256), SQS (SSE-SQS with CMK).
  • Encryption in Transit: TLS 1.3 enforced on all endpoints: CloudFront, API Gateway, ALB, EKS service mesh (mTLS via App Mesh), Aurora SSL connections, ElastiCache TLS.
  • PII Handling: Bedrock Guardrails configured to redact PII (IC numbers, phone numbers, addresses) from LLM responses. NemoClaw safety filters add additional PII scanning layer.
  • Data Retention: Chat transcripts: 2 years (PDPA requirement). Customer profiles: until account deletion. Knowledge base: permanent. Logs: 1 year.
  • Right to Erasure: Lambda-based data deletion workflow: customer requests deletion → Lambda removes data from Aurora, ElastiCache, OpenViking /memory/, and triggers OpenSearch document deletion.

Network Security

  • VPC Design: Private subnets for all compute and data. Public subnets only for ALB and NAT Gateway. No direct internet access for EKS pods.
  • Security Groups: Least-privilege rules. NemoClaw pods can only reach: OpenViking (9090), ElastiCache (6379), Aurora (5432), Bedrock VPC endpoint (443).
  • VPC Endpoints: PrivateLink for S3, Bedrock, SQS, SNS, CloudWatch, KMS, Secrets Manager — no internet traversal for AWS API calls.
  • WAF Rules: Rate limiting, geo-blocking, SQL injection protection, XSS protection, custom prompt injection detection rules.
  • GuardDuty: Continuous threat monitoring across VPC Flow Logs, CloudTrail, EKS audit logs, and DNS queries.

IAM & Access Control

  • EKS IRSA: Each EKS pod assumes a unique IAM role via IRSA (IAM Roles for Service Accounts). NemoClaw role can invoke Bedrock + read ElastiCache. OpenViking role can read/write S3 + query OpenSearch.
  • Cognito Authorization: JWT tokens with custom claims (loyalty_tier, language_preference) passed to NemoClaw for personalized responses.
  • Admin Access: SSO via IAM Identity Center. MFA enforced. CloudTrail logs all admin actions.
Section 6: Estimated Monthly Cost Breakdown

Cost Assumptions

Based on enterprise retail deployment: ~500K monthly active customers, ~2M chat sessions/month, ~10M messages/month, average 5 messages per session. Prices are estimated for ap-southeast-5 region.

Amazon Bedrock — Claude Sonnet 4.6 (50M input + 15M output tokens) $4,500/mo
Amazon EKS + Fargate — NemoClaw + OpenViking pods (8-20 replicas) $2,800/mo
Amazon Aurora PostgreSQL — Serverless v2 (2-32 ACU) $1,200/mo
Amazon OpenSearch Serverless — Vector search (4 search + 2 indexing OCU) $1,800/mo
Amazon ElastiCache Redis — 4x r7g.large nodes $1,100/mo
Amazon CloudFront — CDN + WebSocket (500GB transfer) $350/mo
Amazon API Gateway — WebSocket + REST (10M+ calls) $450/mo
AWS Lambda — Data processing (10M invocations) $300/mo
Amazon S3 — Knowledge base + assets (500GB) $150/mo
Security & Monitoring — WAF, GuardDuty, CloudWatch, KMS, CloudTrail $600/mo
Networking — NAT Gateway, VPC, Route 53, Data Transfer $500/mo
TOTAL ESTIMATED MONTHLY COST ~$13,750/month

Cost Optimization Strategies

  • Bedrock Caching: ElastiCache response caching for common retail queries (product info, policies) can reduce Bedrock token usage by 30-40%, saving ~$1,500/mo.
  • OpenViking L0/L1 Tiering: The three-tier context loading significantly reduces token consumption by only loading relevant context, estimated 50% fewer input tokens vs. traditional RAG.
  • EKS Fargate Spot: Non-critical workloads (analytics processing, batch indexing) on Fargate Spot can reduce compute costs by up to 70%.
  • Aurora Serverless v2 Scaling: Auto-scales to minimum 2 ACU during off-peak hours (midnight-6AM), reducing database costs by ~40% during low-traffic periods.
  • Reserved Capacity: 1-year commitments for ElastiCache and OpenSearch can yield 30-40% savings (~$900/mo).
AWS Cloud
AWS Region
Amazon VPC
Public Subnet
Private Subnet
Availability Zone
Color-Coded Data Flow
1 2 3 Color-Matched Step Numbers