Description
As part of this course, you will learn all the Data Engineering Essentials related to building Data Pipelines using SQL, Python as well as Spark.
About Data Engineering
Data Engineering is nothing but processing the data depending upon our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc.
Course Details
As part of this course, you will be learning Data Engineering Essentials such as SQL, Programming using Python and Spark. Here is the detailed agenda for the course.
- Data Engineering Labs – Python and SQL
You will start with setting up self-support Data Engineering Labs either on GCP or Cloud9 so that you can learn the key skills related to Data Engineering with a lot of practice leveraging tasks and exercises provided by us. As you pass the sections related to SQL and Python, you will also be guided to set up Hadoop and Spark Lab.
- Provision GCP Server or AWS Cloud9 Instance
- Setup Docker to host Postgres Database
- Setup Postgres Database to practice SQL
- Setup Jupyter Lab
Once Jupyter Lab is setup, you can upload the Jupyter Notebooks and start practicing all the key skills related to Data Engineering.
- Database Essentials – SQL using Postgres
It is important for one to be proficient with SQL to take care of building data engineering pipelines. SQL is used for understanding the data, perform ad-hoc analysis, and also in building data engineering pipelines.
- Getting Started with Postgres
- Basic Database Operations (CRUD or Insert, Update, Delete)
- Writing Basic SQL Queries (Filtering, Joins, and Aggregations)
- Creating Tables and Indexes
- Partitioning Tables and Indexes
- Predefined Functions (String Manipulation, Date Manipulation, and other functions)
- Writing Advanced SQL Queries