Apache_Spark_Lecture_Notes
Apache_Spark_Lecture_Notes
--------------------------------------
- Provides APIs for batch, streaming, machine learning, and graph processing
--------------------------------------
Core Components:
- Spark Core: Basic functionalities (task scheduling, memory management, fault tolerance)
----------------------------
------------------------------
What is an RDD?
RDD Operations:
-------------------------------
- Dataset: Type-safe structured API in Scala & Java (not available in PySpark)
-------------------
Example:
df.createOrReplaceTempView("table")
words.count().pprint()
ssc.start()
ssc.awaitTermination()
----------------------------------------
lr = LogisticRegression()
model = lr.fit(training_data)
predictions = model.transform(test_data)
----------------------
---------------------------------
--------------------------------------------
-------------------------------