Senior Software Engineer | Senior Data Engineer
Senior Software / Data Engineer specializing in real-time streaming systems, distributed architectures, and cloud-native platforms. Expert in Kafka, Apache Flink, and AWS, with experience building petabyte-scale data pipelines and high-performance APIs.
Implemented a distributed caching system supporting 100K+ concurrent users, Bloom Filter In Redis/ElasticCache : Deduplication Logic For Media Article
Built streaming data pipeline processing 1TB+ data daily
Architected and managed Elasticsearch clusters on Kubernetes with tailored compute and storage configurations to meet diverse workload and performance requirements. Optimized cluster performance for large-scale data ingestion and search by fine-tuning resource allocation, indexing strategies, and shard distribution. Designed scalable and resilient Elasticsearch infrastructure leveraging Kubernetes for orchestration, auto-scaling, and fault tolerance. Maintained performance SLAs across multiple clusters handling high-volume data and query workloads.
Architected a data lineage framework to track end-to-end data flow across microservices by capturing metadata events into Kafka topics. Designed metadata schemas capturing processing stage, timestamps, and error states enabling full visibility into data lifecycle. Implemented event-driven lineage tracking where each microservice publishes processing metadata for observability and traceability. Enabled real-time monitoring and debugging of data pipelines by capturing failure points and processing delays within the lineage system. Integrated lineage data with AWS Athena allowing teams to query, analyze, and audit data flow across distributed systems.
Architected and implemented an end-to-end ETL pipeline on AWS using ECS, Kinesis, and Lambda to ingest and process data from multiple third-party APIs. Designed an API polling system on ECS to continuously fetch data from hundreds of external paid APIs and stream it into Kinesis for real-time processing. Built event-driven data processing workflows using AWS Lambda to transform, enrich, and route streaming data to Elasticsearch. Handled diverse data sources with custom transformation logic ensuring consistent data quality and schema alignment. Enabled scalable and fault-tolerant ingestion of high-volume data streams supporting near real-time analytics and search use cases.