Shadow Sentinel
A shadow mode deployment system that runs a challenger model silently next to production, measures how often the two agree on real traffic, and automatically promotes or rolls back, all gated by a full CI/CD pipeline.
Outcome
Zero
User impact while testing new models

In the product

Context
Shipping a new model is risky because you cannot tell how it behaves on real traffic until it is already serving users. Shadow Sentinel removes that risk: a challenger model runs on live requests in parallel with production, affecting no one, while the system measures agreement and decides automatically whether the new model is safe to promote.
Approach
- 01On every push, GitHub Actions runs unit tests, code quality checks, and model validation on accuracy and latency, then builds and pushes a Docker image. A failing gate blocks the deploy and notifies the developer.
- 02A FastAPI gateway routes all requests to Model A in production, which serves the user response, while Model B, the challenger, runs silently in parallel on the same traffic.
- 03A comparison layer logs the agreement rate between the two models across requests, plus a confidence gap and high confidence conflicts.
- 04After 200+ requests, the system decides automatically: 80% agreement or higher promotes Model B to production, anything lower triggers an automatic rollback, fires an alert, and keeps Model A. A Streamlit dashboard shows it all live.
How it works
Developer
Push to GitHub
A new model or code change
CI
GitHub Actions
Tests, quality, model validation, Docker build
Gate
Tests pass?
Fail blocks the deploy and alerts the developer
Shadow CD
FastAPI gateway
Model A serves users, Model B runs silently alongside
Compare
Comparison layer
Logs agreement rate over 200+ live requests
Decision
Promote or roll back
≥80% agreement promotes, otherwise auto rollback
Results
≥80%
Agreement gate to auto promote
200+
Live requests before any decision
Auto
Promotion and rollback, no manual step
Reflection
This is the project that thinks most like production. Shadow mode plus automated promotion and rollback is exactly the reliability work that separates a model that demos well from one a team can actually trust on live traffic.