PinnedPublished inITNEXTBuilding a Schema Inference Data Pipeline for Large CSV filesA parallel implementation with pythonJul 9, 2022A response icon1Jul 9, 2022A response icon1
PinnedPublished inITNEXTBuilding Real-time communication with Apache Spark through Apache LivyDockerizing and Consuming an Apache Livy environmentJun 12, 2022A response icon1Jun 12, 2022A response icon1
PinnedPublished inITNEXTHow to build a DAG based Task Scheduling tool for Multiprocessor systems using pythonScheduling Big Data Workloads and Data Pipelines in the Cloud with pyDagJun 7, 2022Jun 7, 2022
PinnedPublished inGeek CultureDesign, Development and Deployment of a simple Data PipelineData Engineering technical challenge (part 1)Jun 5, 2022Jun 5, 2022
PinnedPublished inAWS in Plain EnglishBuilding an ETL pipeline with Apache Airflow and Visualizing AWS Redshift data using Power BITracking Uber Rides and Uber Eats expenses with Apache Airflow, AWS Redshift and Power BI.Apr 30, 2021A response icon2Apr 30, 2021A response icon2
Designing and Planning an Event Store SystemCQRS and Event Sourcing design patternsDec 11, 2022Dec 11, 2022
Published inPython in Plain EnglishBuilding, Preparing and Cleaning a Real Estate DatasetDockerizing a Python Script for Faster Web ScrapingJun 14, 2022Jun 14, 2022
Published inPython in Plain EnglishHow to Build a Lossless Data Compression and Data Decompression PipelineA parallel implementation of the bzip2 high-quality data compressor tool in Python.Apr 20, 2022Apr 20, 2022