Portfolio Project · Public Kaggle Data · No Production Claims

AI-powered Support
Operations Optimization

A gDATA-style support operations case study that uses public support data, ML/NLP, selective LLM fallback, and human-safe routing policies to automate case routing and turn model confidence into actionable operating recommendations.

View GitHub Implementation Read Technical Case Study

8,325

Structured Ticket Rows

Deduplicated Kaggle dataset

1,665

Held-out Benchmark Tickets

Stratified 80/20 split

+21.1 pts

Macro-F1 Lift

Improved ML vs keyword baseline

12.1% → 33.2% vs keyword baseline

4-Stage

Routing Cascade

Rules → ML → LLM → Human

Sweep

Cost–Coverage Threshold

Policy operating curve

Business Context

The Business Challenge

Support teams face competing goals: automation coverage, human review load, LLM API cost, and routing quality. A principled system makes those tradeoffs explicit and measurable.

Reduce Manual Review Load

Manual triage of every incoming ticket is costly and doesn't scale. Automation needs to be safe enough to handle high-confidence cases without burdening agents.

Control LLM Invocation Cost

LLM-only routing invokes a language model for every ticket — expensive and hard to cost-control at scale. Smart routing limits LLM calls to truly ambiguous cases.

Preserve Routing Quality & Safety

Misrouted tickets harm customer trust and operations. Uncertain cases and LLM failures must always land in human triage — the system should never fail silently.

Methodology

From Data to Decision

A structured pipeline that goes from raw public data to an actionable routing policy with explicit, measurable tradeoffs.

Public Support Data

Two Kaggle datasets: Twitter customer messages and a structured ticket dataset with metadata fields.

Label Provenance

Ticket Type metadata columns map to issue labels. Labels are never derived from model predictions or keyword rules.

Supervised Benchmark

Stratified 80/20 holdout split. Keyword, ML baseline, and improved ML each evaluated on 1,665 unseen tickets.

Routing Cascade

4-stage system: deterministic rules, calibrated ML, LLM fallback, human triage as the safety net.

Threshold Policy

Confidence sweep generates a cost–coverage operating curve to inform threshold selection.

Decision Artifacts

Queue distributions, threshold guides, benchmark metrics, and routing KPIs exported for review.

gDATA Alignment

Why this maps to Business Data Science

Designed to show the full loop from ambiguous support-ops problem to data-driven recommendation.

Business Problem Framing

Balances manual triage cost, automation coverage, LLM invocation cost, and routing quality.

Data Science Implementation

Uses public support data, metadata-derived labels, supervised benchmark, and threshold-sweep analysis.

ML/NLP + LLM Solutioning

Combines keyword rules, TF-IDF/Logistic Regression, selective LLM classification, and human fallback.

Actionable Recommendation

Recommends a hybrid routing policy and confidence-threshold selection based on risk, review capacity, and LLM budget.

System Design

4-Stage Routing Cascade

Each stage handles the cases it's best suited for. Ambiguous tickets pass downstream; uncertain cases always reach a human.

Incoming Ticket

Rule-based Routing

Calibrated ML

LLM Fallback

Human Triage

Stage 1

Rule-based Routing

Deterministic pattern matching for obvious cases — instant, zero model overhead, highest reliability.

Stage 2

Calibrated ML

TF-IDF + Logistic Regression with isotonic calibration. Auto-routes when confidence ≥ high threshold.

Stage 3

LLM Fallback

LLM performs issue-type classification for ambiguous low-confidence tickets. Not invoked for every ticket.

Stage 4

Human Triage

Middle-confidence ambiguity and LLM failures always route here. The safety net — never fails silently.

Not an LLM-only design. LLM calls are intentionally limited to low-confidence ambiguous cases to control API cost and improve operational safety. The majority of tickets are handled by rules or ML without any LLM invocation.

Interactive Demo

Try the routing logic

Paste or select a sample support ticket to see how the hybrid routing policy would classify and route it.

Ticket text

Sample tickets

Select a sample or type a ticket, then click "Route ticket"

Static front-end demo based on the project routing design. Full training, evaluation, and routing pipeline are available in the GitHub implementation.

Results

Results & Insights

Three views of the system: operational health, policy tradeoffs, and measured model quality. Metric types are clearly distinguished.

Proxy + Estimated Metrics

Operations Overview

Shows routing-stage distribution, human triage rate, LLM invocation rate, average confidence score, estimated cost per ticket, and queue distribution across all processed tickets.

Human triage rate is a routing-system proxy, not a downstream escalation rate. Cost estimates are analytic, not measured from live LLM calls.

Estimated Analytic Metrics

Cost–Coverage Policy Tradeoff

Shows how confidence thresholds shift tickets between auto-routing, LLM fallback, and human review. Thresholds are operating policy choices — not just model parameters — and must be selected with risk tolerance and LLM budget in mind.

Computed analytically from ML confidence scores. No live LLM calls required for this sweep.

Measured on Holdout Data

Model Evaluation

Measured ML vs keyword baseline on 1,665 held-out metadata-derived tickets. Improved ML raises Macro-F1 from 12.1% to 33.2% vs keyword rules. ML baseline's 59.4% accuracy reflects majority-class bias; the improved model trades raw accuracy for better class balance. Limitations are transparent.

Labels are metadata-derived (Ticket Type field), not human-reviewed. Absolute scores remain modest and should be interpreted cautiously.

Business Recommendation

Use a hybrid routing policy

rather than LLM-only or rules-only routing.

Keep deterministic rules for high-signal obvious cases — no model overhead, highest reliability.

Use calibrated ML for high-confidence routing — cost-efficient, fast, and measurable.

Trigger LLM classification only for ambiguous low-confidence tickets — not for every case.

Route uncertain cases and LLM failures to human triage — never fail silently.

Select confidence thresholds based on risk tolerance, review capacity, and LLM budget.

Decision Memo

Executive Decision Memo

What I would recommend to a support operations leader.

Recommendation

Use a hybrid routing policy rather than LLM-only routing.

Rationale

Rules handle obvious cases cheaply; calibrated ML handles high-confidence routing; LLM fallback is reserved for ambiguity; human triage protects quality.

Operating policy

Choose confidence thresholds based on routing risk, agent review capacity, and LLM budget.

Next validation

Build a human-reviewed eval set, add queue-level SLA requirements, and monitor calibration/drift before production rollout.

Stakeholder View

How the analysis translates for different partners in a support operations environment.

Support Ops

Use routing-stage and queue metrics to understand review load and staffing pressure.

Data Science

Evaluate ML against keyword baselines with measured accuracy, macro-F1, weighted-F1, and per-class metrics.

Engineering

Implement configurable thresholds, safe LLM fallback, and human-triage failure handling.

Leadership

Use cost–coverage tradeoffs to choose an operating policy aligned with risk tolerance and LLM budget.

Honest Assessment

Limitations & Next Steps

Transparent caveats are a mark of rigorous data science. Here is what this system does and does not claim.

Current Limitations

Metadata-derived labels are not human-reviewed production gold labels
Kaggle ticket descriptions are templated and may not fully reflect production support traffic
The supervised benchmark covers 3 of 6 routing classes reliably present in the raw Kaggle metadata
Absolute ML scores are modest; interpret cautiously against real-world benchmarks

Next Steps

Build a 500–2,000 row human-reviewed eval set for more reliable per-class F1 signal
Add queue-level SLA policies and risk-tier threshold settings per support category
Track true downstream escalation and resolution outcomes beyond routing decisions
Add calibration plots, reliability monitoring, and model drift detection

Implementation

Tech Stack

Built entirely with open-source tools on public Kaggle data. No proprietary data sources.

Data & ML

Python

pandas

scikit-learn

Logistic Regression

TF-IDF

NLP

Tokenization

Char n-grams

Text normalization

FeatureUnion

AI & LLM

Selective LLM fallback

Optional OpenAI API

JSON parsing

LLM fallback

Data Sources

Kaggle

Twitter support

Ticket dataset

Metadata labels

App & Viz

Streamlit

Matplotlib

seaborn

Next.js

Tailwind

AI-powered SupportOperations Optimization

The Business Challenge

Reduce Manual Review Load

Control LLM Invocation Cost

Preserve Routing Quality & Safety

From Data to Decision

Why this maps to Business Data Science

Business Problem Framing

Data Science Implementation

ML/NLP + LLM Solutioning

Actionable Recommendation

4-Stage Routing Cascade

Rule-based Routing

Calibrated ML

LLM Fallback

Human Triage

Try the routing logic

Results & Insights

Operations Overview

Cost–Coverage Policy Tradeoff

Model Evaluation

Use a hybrid routing policy

Executive Decision Memo

Stakeholder View

Support Ops

Data Science

Engineering

Leadership

Limitations & Next Steps

Current Limitations

Next Steps

Tech Stack

AI-powered Support
Operations Optimization