Data Engineer · Lakehouse Architect
I convert fragile legacy stacks into resilient, partition-optimised lakehouses. Specialising in SQL architecture, medallion design, and zero-loss data migrations across Retail, Insurance & Financial Services in South Africa.
// About
Johannesburg-based Data Engineer with deep expertise in redesigning brittle legacy reporting stacks into scalable, cloud-native data platforms. I don't just move data—I architect systems that survive business growth.
My core philosophy: SQL is the fundamental unit of data truth. Every pipeline I build is grounded in proper indexing strategies, partitioning schemes, and semantic models that analysts actually understand.
Python is my orchestration layer: CLI tooling, NLP preprocessing, CV image pipelines—but the heavy lifting lives in the engine.
3+
Years Engineering
Data Systems
10M+
Rows Migrated
Zero Data Loss
4
Production
Data Products
5
Industry
Certifications
// Credentials
Azure Data Engineer Associate
Microsoft certified expertise in Azure data storage, processing pipelines, and security for enterprise analytics workloads.
Data Engineer Associate
Certified expertise in Delta Lake, Spark medallion pipelines, and lakehouse architecture on the Databricks Unified Analytics Platform.
Analytics Engineer
Certified in OneLake, Lakehouse, Data Warehouse, Real-Time Analytics, and Power BI integration across the unified Fabric platform.
// Technical Stack
Primary Discipline
Azure · Databricks · Fabric
Automation & Intelligence Layer
// Featured Projects
Each project includes a Technical Deep Dive with real SQL architecture. No tutorials, no SELECT * examples.
Semantic Layer Rebuild
Migrated a mid-size retail business from flat-file CSV reporting to a fully normalised Star Schema on Azure Synapse. Rebuilt the semantic layer from scratch, achieving a 72% reduction in Power BI refresh time through incremental partition-based loads and materialised pre-aggregates.
[ Data Flow ]
CSV Flat Files → Azure Blob Ingest → Staging SchemaTrack historical changes to dimension records without losing prior state. A filtered index on is_current=1 makes live-record lookups microsecond-fast.
Monthly range partition + aligned columnstore index. A distributed materialized view pre-aggregates revenue so Power BI refreshes only the current partition slice.
Scalable Medallion Lakehouse
Designed a full Bronze / Silver / Gold Medallion Architecture on Databricks + Azure Data Lake Gen2 for a high-volume synthetic insurance dataset. The Gold layer exposes risk-scoring analytics via RANK and LAG window functions, enabling claims pattern detection 12× faster than the prior flat-table approach.
[ Medallion Architecture ]
Raw JSON/CSV → Bronze (raw, full history)Horizontal partition by year eliminates full-table scans. Z-Order on (Claim_ID, Policy_Number) co-locates correlated rows so multi-table joins hit minimal files.
High-frequency claimant detection using LAG (days-between-claims), cumulative exposure totals, and per-year claim ranking to feed the actuarial risk model.
Legacy SQL Server → Azure Data Lake
A production-grade Python CLI that orchestrates the full ETL lifecycle from legacy SQL Server 2012 to Azure Data Lake Gen2. Python handles pre-flight schema validation and checksum generation; SQL handles bulk loads and post-load clustered index rebuilds. Migrated 1.4 million rows with automated 0% data-loss verification.
Before a row moves, the migrator validates schemas match, computes source checksums, and aborts on mismatch — preventing silent data corruption.
Indexes are intentionally dropped before bulk insert (minimally logged) and rebuilt after — the correct sequence for maximum throughput on large row sets.
Trading Signal Engine
Real-time financial event processing pipeline. Ingests tick-level market data and computes moving averages, volatility metrics, and cross-asset signals using sliding window SQL. A composite index on (Ticker, Timestamp) enables sub-second signal retrieval across millions of daily events.
[ Architecture Diagram Placeholder — Chaos Arbitrageur ]
Market Feed API → Kafka Ingest → Bronze (raw ticks)All query patterns are ticker-first with a time range. The composite clustered index on (ticker, trade_ts) guarantees index seeks, not scans, at any scale.
SMA-5, SMA-30, rolling standard deviation (volatility), VWAP, and Golden/Death Cross detection all in a single CTE chain — no application-layer computation needed.
// Consulting Experience
Described using industry-level terminology to respect client NDAs. Sectors and outcomes are accurate.
Analytics Consulting Firm · Johannesburg
Rebuilt a legacy SSAS cube into a modern star schema on Azure Synapse for a large retail client. Introduced SCD Type 2 across all slowly-changing dimensions, eliminating aggregation inconsistencies in monthly management reporting packs.
Applied partition-aligned materialized views and incremental refresh policies, reducing 30-minute daily refresh cycles to under 4 minutes — a 7.5× improvement by restricting refreshes to the most recently modified partition.
Built row-level audit tables, CHECKSUM_AGG validation, and data lineage tagging across 14 ADF pipelines. Zero pipeline failures in the 3 months post-implementation.
Data & Analytics Specialists · SA
Led the design of a Medallion Architecture migration for an insurance client, moving from on-prem Oracle to Azure Data Lake Gen2 + Databricks. Defined Bronze/Silver/Gold zone schemas and all ingestion patterns.
Implemented window-function risk scoring (RANK, LEAD/LAG) in Databricks SQL to support an actuarial team. Reduced time-to-insight from weekly batch reports to near-real-time dashboards.
Built a Python profiling & validation suite (Great Expectations), enforcing schema contracts at the Silver layer boundary. Caught 23 upstream schema breaks before they propagated to Gold.
// High-Demand Niche · South Africa
South African banking, retail, and insurance enterprises are sitting on decades of technical debt: Oracle environments, SSAS cubes, flat-file ETL, and fragile SSIS packages. There is a growing — and largely unmet — demand for engineers who can speak both the legacy language and the cloud future.
I bridge that gap: legacy stack fluency for safe migrations, cloud architecture knowledge for building something worth migrating to. That combination — legacy fluency + modern lakehouse architecture — is the niche.
// Let's Build
Whether you need a lakehouse built from scratch, a legacy migration executed cleanly, or a senior engineer to close a technical gap — let's talk.