PinnedPublished inITNEXTBuilding a Schema Inference Data Pipeline for Large CSV filesA parallel implementation with pythonJul 9, 20221Jul 9, 20221
PinnedPublished inITNEXTBuilding Real-time communication with Apache Spark through Apache LivyDockerizing and Consuming an Apache Livy environmentJun 12, 20221Jun 12, 20221
PinnedPublished inITNEXTHow to build a DAG based Task Scheduling tool for Multiprocessor systems using pythonScheduling Big Data Workloads and Data Pipelines in the Cloud with pyDagJun 7, 2022Jun 7, 2022
PinnedPublished inGeek CultureDesign, Development and Deployment of a simple Data PipelineData Engineering technical challenge (part 1)Jun 5, 2022Jun 5, 2022
PinnedPublished inAWS in Plain EnglishBuilding an ETL pipeline with Apache Airflow and Visualizing AWS Redshift data using Power BITracking Uber Rides and Uber Eats expenses with Apache Airflow, AWS Redshift and Power BI.Apr 30, 20212Apr 30, 20212
Designing and Planning an Event Store SystemCQRS and Event Sourcing design patternsDec 11, 2022Dec 11, 2022
Published inPython in Plain EnglishBuilding, Preparing and Cleaning a Real Estate DatasetDockerizing a Python Script for Faster Web ScrapingJun 14, 2022Jun 14, 2022
Published inPython in Plain EnglishHow to Build a Lossless Data Compression and Data Decompression PipelineA parallel implementation of the bzip2 high-quality data compressor tool in Python.Apr 20, 2022Apr 20, 2022