Complete PySpark Developer Course

Description

This is a complete PySpark Developer course for Data Engineers and Data Scientists and others who wants to process Big Data in an effective manner. We will cover below topics and more:

Complete Curriculum for a successful PySpark Developer
Complete Flow of Installation of PySpark
Introduction to Spark (Why Spark was Developed, Spark Features, Spark Components)
Understand SparkSession
Spark RDD Fundamentals
How to Create RDDs
RDD Operations (Transformations & Actions)
Spark Cluster Architecture – Execution, YARN, JVM Processes, DAG Scheduler, Task Scheduler
RDD Persistence
Spark Shared Variables (Broadcast and Accumulators)
Spark SQL Architecture, Catalyst Optimizer, Volcano Iterator Model, Tungsten Execution Engine, Different Benchmarks
Spark Commonly Used Functions – Version, range, createDataFrame, sql, table, SparkContext, conf, read, udf, newSession, stop, catalog etc
DataFrame Built-in functions – new column, encryption, string, regexp, date, null, collection, na, math and statistics, explode, flatten, formatting and json
What is Partition, Repartition and Coalesce
Repartition Vs Coalesce
Extraction – csv file, text file, Parquet File, orc file, json file, avro file, hive, jdbc
DataFrame Fundamentals (What is a DataFrame, DataFrame Sources, DataFrame Features, DataFrame Organization)
DataFrame Rows, Columns and DataTypes. Practical examples.
ETL Using DataFrame (Extraction APIs, Transformation APIs, and Loading APIs). Practical Examples.
Optimization and Management – Join Strategies, Driver Conf, Parallelism Configurations, Executor Conf etc
HDFS Commands (Will be added shortly)
Python Fundamentals (Will be added shortly)
More will be added

Complete PySpark Developer Course

Share IT knowledge for everyone

Complete PySpark Developer Course

Description

Leave a Reply Cancel reply

IELTS Success Online – IELTS Writing Task 2

Ielts Speaking Test: A Complete Guide

Pass IELTS with MasterClass Band 6.5 or Higher