Data-engineer

Exploring the World of Python and Lets Learn Together

Roadmaps สำหรับเริ่มศึกษา Apache Airflow ฉบับ Zero to โคตร Advance
Published on
January 5, 2025
แนวทางการเรียนรู้ Apache Airflow จากประสบการณ์ตรง แบ่งเป็น 3 ระดับตั้งแต่พื้นฐานจนถึงระดับเชี่ยวชาญ พร้อม checkpoint สำหรับประเมินความรู้ในแต่ละระดับ
airflow Python Data-Engineer
Introduction to Apache Airflow
Published on
May 20, 2024
Apache Airflow เป็นเครื่องมือจัดการ Workflow สำหรับงาน Data Engineering และ Data Pipeline
airflow Python Data-Engineer
Getting Started with Apache Cassandra
Published on
July 16, 2023
Apache Cassandra is a popular distributed NoSQL database known for its scalability, fault-tolerance, and high performance. Whether you are a beginner or an experienced developer, this blog post will serve as a comprehensive guide to help you get started with Apache Cassandra. We will cover the fundamental concepts, installation process, and provide practical code examples to demonstrate key operations.
cassandra Python Data-Engineer
15 Useful extra columns for ETL jobs
Published on
May 1, 2023
Extra columns in your ETL jobs can provide valuable context and information for downstream data consumers, allowing them to better understand the source and quality of the data. By considering these extra columns, you can improve the quality and reliability of your data pipelines, and make it easier for downstream consumers to extract value from your data
Data-Engineer ETL
Setting up a Spark cluster using Docker Compose
Published on
April 20, 2023
Extra columns in your ETL jobs can provide valuable context and information for downstream data consumers, allowing them to better understand the source and quality of the data. By considering these extra columns, you can impr28|ove the quality and reliability of your data pipelines, and make it easier for downstream consumers to extract value from your data
Data-Engineer Spark
Compare code Pandas, PySpark, and Apache Hive
Published on
April 8, 2023
This comparison article provides an overview of data manipulation in three popular tools: Pandas, PySpark, and Apache Hive. By providing code examples and discussing the pros and cons of each approach, the article aims to help data engineers and data scientists choose the best tool for their specific use case.
Python Data-Engineer Pandas Pyspark Apache-Hive
Apache Livy คืออะไร
Published on
January 26, 2023
An overview of the apache livy, submmit spark job via Rest API and pylivy this blog including source code
Python Data-Engineer Pyspark
Python WebHDFS คืออะไร?
Published on
January 21, 2023
An overview of the python for big data project to commands to HDFS using web hdfs
Python Data-Engineer Hadoop