Shiva Manhar

Senior / Staff Data Engineer • Databricks • Spark • Cloud Data Platforms

About Me

I am a data engineer with over 12 years of experience building scalable, reliable, and cost-efficient data platforms. My primary focus is on Databricks-based lakehouse architectures using Apache Spark, Delta Lake, and cloud-native services on AWS and Azure.

I enjoy solving complex data problems, optimizing large-scale pipelines, and designing systems that support analytics and business decision-making. I prefer hands-on individual contributor roles where I can own architecture and implementation end to end.

What I Work On

Typical areas I work on day to day:

Projects

Selected projects that reflect my work as a senior data engineer:

PySpark ETL Template

I have used Databricks features and created an end-to-end ETL pipeline. I have used the medallion architecture (Bronze, Silver, Gold) pattern. I have written all code in class and object style, because we can easily achieve scalable, reliable transformations and business-ready analytics for sales data. The architecture uses Databricks Auto Loader, Structured Streaming, Delta Lake, and Unity Catalog. Using SCD Type 1 and SCD Type 2. In this project, I ensure data quality, reliability, and governance.

Key Features:
  • Medallion architecture
  • Apply autoloader
  • Handle duplicate key
  • Stream join
  • Apply foreach function
  • Handle multiple delimiter file
  • Class and object structure
Case study

Enterprise Analytics Lakehouse

Built a scalable Databricks lakehouse processing multi-terabyte batch data. Focused on Spark performance tuning, AQE, Z-ORDER, and cost optimization.

Case study (coming soon)

CDC & Incremental Data Platform

Designed CDC-based incremental ingestion pipelines into Delta tables with SCD Type 1 and Type 2 modeling to support analytics and BI workloads.

Case study (coming soon)

Cloud Migration & Optimization

Migrated legacy analytics workloads to Databricks on AWS and Azure, applying auto-scaling, cluster policies, and storage optimizations.

Case study (coming soon)

Core Technologies

Databricks
Apache Spark
PySpark
Delta Lake
Structured Streaming
AWS
Azure
Snowflake
Python
SQL

Certifications