📄 README.md • Last updated: October 2025

Rabindra Kharel

👨‍💻 Staff Engineer – Data, Analytics & AI Platforms

$ whoami
Staff Engineer – Backend/API Platforms • Data Engineering • AI/ML Infrastructure
Experience: 10+ years architecting fault-tolerant, production-scale data systems

📝 Professional Summary

Highly experienced Staff Engineer specializing in Backend/API Platforms, Data Engineering, and AI/ML Infrastructure. Adept at designing fault-tolerant, cost-aware architecture that transforms complex data, analytics, and ML requirements into scalable, observable, production systems. Proven track record translating complex requirements into robust data engineering architecture, data models, and analytic solutions. Experienced in leading cross-functional teams to deliver end-to-end Analytics and AI solutions that unlock insights and drive business value.

💼 Professional Experience

Principal Data Lake Architect / Staff Engineer

Egen • Remote, Chicago, IL

Aug 2022 - Present

Fault-Tolerant Ingestion: Built exactly-once ingestion pipeline streaming PB-scale data from 50+ sources into S3/Iceberg lakehouse (AWS Lake Formation)
Data Lake Architecture: Architected both batch and streaming pipelines using Kafka, Flink, SQS, Lambda, and S3
Data Mesh Implementation: Led end-to-end architecture and provisioning for data mesh and data-product solutions for Alight Commercial Data Lake
Security & Governance: Integrated AWS Identity Center; authored tag-driven policies enforcing RBAC, RLS, and dynamic PII masking
Performance Optimization: Tuned Redshift WLM, Short Query Acceleration, and Query Monitoring Rules—dropped BI latency to <3s while trimming monthly cost 30%
CI/CD Pipeline: Engineered GitHub-based CI/CD pipelines for database objects and ETL code deployment
QuickSight Integration: Wired curated layers to QuickSight and downstream systems driving actionable insights
Lifecycle Management: Developed lifecycle/compaction jobs and hot/warm/cold tiering; piped lineage & metrics to CloudWatch for single-pane SLO dashboards
Advanced Analytics: Data Vault 2.0, clustering keys for collocated joins, evaluating pruning stats & disk spills, Dynamic Tables/MV for precomputed reports

Senior Data Architect

Express Analytics • Remote, USA

Oct 2021 - Aug 2022

Real-time & Batch Pipelines: Engineered pipelines ingesting >3TB/day from eCommerce, clickstream, POS, marketing channels, and social media
ML Pipeline Development: Built ML pipelines including data preprocessing, feature engineering, data quality checks, batch training, and ModelOps for batch/real-time training
Identity Resolution: Built customer identity-resolution, context resolution, de-duplication algorithms, and "anonymous-to-known" modules, boosting match rate
Data Modeling & Optimization: Designed conceptual, logical, and physical data models in Redshift—optimizing denormalization, dist/sort keys, compression encodings, automated analyze/vacuum, improving average query latency 40%
Query Optimization: Assessed business reports, query patterns, and ELT loads to identify optimal Distribution Key, Sort Key, and encoding for each table and column
Marketing Activation: Integrated analytics-ready data into activation systems like Google AdWords, Mailchimp, and Facebook Marketing for closed-loop marketing
Cost Optimization: Drove workload right-sizing, reducing EC2 spend 28%

MLOps Architect

Independent Consultant • United States

Jun 2017 - Sep 2021

ML CI/CD Pipelines: Implemented ML CI/CD pipelines covering automated training, evaluation, drift monitoring, canary releases, and auto-promotion
AI/ML Workflow Automation: Designed and implemented CI/CD pipelines for AI/ML workflows including model training, prediction, and deployment automation using GitHub
Data Pipelines: Engineered batch data training pipelines and real-time streaming prediction systems using Python
Model Serving: Served latency-critical XGBoost & deep-learning models on GPU and EKS with gRPC/REST; C++/Rust inference
Feature Engineering: Developed and deployed feature pipelines for preprocessing and serving data for machine learning models
Kubernetes Deployment: Automated model deployment processes leveraging Kubernetes and Amazon EKS for scalable, containerized applications
Artifact Management: Managed artifacts & lineage in S3/MLflow; added SMOTE and focal-loss strategies plus concept-drift alerts
Containerization: Deployed containerized ML workflows using Docker and Kubernetes for high availability and scalability

Data Architect (Analytics)

Logic Information Systems (Part of Accenture) • Minneapolis, MN

Dec 2014 - Apr 2017

Data Warehouse Design: Designed Data Warehouse & BI Analytics stack, ingesting omni-channel commerce data from disaggregated sources (eCommerce, Brick & Mortar sales, Oracle Pricing Systems, Oracle Customer CX)
ETL Solutions: Designed data warehousing solutions leveraging ETL workflows for eCommerce and retail data
Team Leadership: Led 8 onshore/offshore engineers, writing specs, test plans, and deployment playbooks
Unified Analytics: Combined POS, RMS, RPM, and RESA feeds into Unified Retail Analytics and Customer analytics
Requirements Analysis: Evaluated business requirements, devised adaptable data engineering frameworks, directed development teams, and ensured prompt completion of technical assignments

Business Intelligence Developer

Logic Information Systems (Part of Accenture) • Lalitpur District, Nepal

Nov 2012 - Nov 2014

Data Modeling: Worked as offshore BI developer to build data models (Star Schema and Dimensional Modeling), reports and dashboards
ODI ETL Development: Developed ODI ETL jobs integrating OLTP sources (POS, RMS, RPM, RESA) into Oracle Retail Insight
Business Collaboration: Collaborated with business leads to understand data and business requirements to build impactful and insightful analytic dashboards
Automation: Automated nightly loads & validation scripts—cut manual QA effort 50%, raised SLA adherence to 99.7%
Integration Development: Developed integration code for OLTP systems into data warehouse

Campus Evangelist, W3C Web Standards

Opera Software • Norway

Nov 2010 - Nov 2012

Community Engagement: Organized hackathons and hosted community events to promote W3C web standards
Developer Advocacy: Engaged with developer community to advocate for web accessibility and open web principles
Workshop Facilitation: Facilitated workshops to encourage adoption of modern web technologies and standards
Recognition: Excellence Award recipient at Opera Software

🛠️ Technical Stack

💾 Storage & Lakehouse

S3 • Iceberg Tables • Snowflake • Redshift • MongoDB • DynamoDB

🌊 Streaming & Processing

Kafka • Flink • AWS Lambda • Glue • EventBridge • SQS FIFO • Kinesis

⚙️ Processing & ETL

EMR (Spark) • Flink • SageMaker Processing • Glue ETL

🔄 Orchestration

Airflow • DBT • Talend • IICS

🚀 DevOps & Infrastructure

Terraform • GitHub Workflows • Docker • Kubernetes • EKS • CloudFormation

💻 Programming

Python • Java • SQL • Unix Scripting • Advanced SQL

📊 BI & Analytics

Looker • QuickSight • Athena

🤖 AI/ML Platforms

AWS SageMaker • MLflow • Model Registry • XGBoost

🔐 Data Governance

AWS Lake Formation • IAM • Identity Center • Data Masking • RBAC/RLS/CLS

🎯 Core Expertise

├── Data Platform & API Engineering
│   ├── High-throughput ingestion pipelines
│   ├── Massively parallel/distributed compute
│   └── Backend API stacks (serverless + microservices)
│
├── Data Modeling & Architecture
│   ├── Data Vault 2.0 & Dimensional Modeling
│   ├── SCD behaviors in columnar/distributed databases
│   ├── Data Lake design (S3/Iceberg Lakehouse)
│   ├── Clustering, collocated joins, query performance tuning
│   └── Medallion architecture patterns
│
├── Streaming & Batch Ingestion
│   ├── Lambda/Kappa architectures into AWS data lakes
│   ├── Real-time and batch processing pipelines
│   └── Distributed architectures (Kafka, Kinesis, Flink)
│
├── Performance & Cost Optimization
│   ├── Query-plan forensics & compute right-sizing
│   ├── Hot/warm/cold tiering & compaction
│   ├── Dist/sort keys & collocated joins
│   ├── Scaling policies & resource monitors
│   └── Result caching & multi-cluster setup
│
├── Data Governance & Security
│   ├── Policy as code for governance
│   ├── Dynamic data masking & PII controls
│   ├── Tag-based subscriptions & RBAC/RLS/CLS
│   ├── Schema evolution & registry management
│   ├── Data contracts & single sign-on
│   └── Data subscription frameworks
│
└── DevOps / BIOps / MLOps
    ├── Terraform-driven infrastructure
    ├── GitHub Actions CI/CD
    ├── Event-driven workflows (Kafka, Flink, Lambda)
    └── Reproducible ML pipelines & model deployment

🎓 Education

Degree	Institution	Year
Bachelor of Engineering - Computer Science & Engineering (Honours)	Tribhuvan University – Institute of Engineering, Pulchowk Campus, Nepal	2008 - 2012

📜 Certifications

Certification	Issuer	Validity	Credential ID
AWS Certified Data Analytics – Specialty	Amazon Web Services	Feb 2023 - Feb 2026	NNV34CPC6J1Q1C39
AWS Certified Cloud Practitioner	Amazon Web Services	Jan 2023 - Jan 2026	NNV34CPC6J1QC39

🏆 Honors & Awards

Winner – NASA Space Apps Hackathon (Software Category) - National-level winner for Spatio-temporal Data Anomaly Detection (2013)
Winner – World Bank Open Data Hackathon - Created map visualization of open aid flow data (2013)
Excellence Award – Opera Software - Recognition for outstanding contribution to web standards evangelism