0% found this document useful (0 votes)

325 views9 pages

MET CS777 Summer2-2022 Big-Data-Analytics

This document provides information about a Big Data Analytics course offered at Boston University Metropolitan College. The course is an introduction to large-scale data analytics using tools like Hadoop MapReduce, Apache Spark, and Flink. It will cover cluster computing techniques, programming in PySpark, and machine learning algorithms for large datasets. The course will be taught in-person on Tuesday and Thursday evenings over 6 weeks in the summer of 2022. Students will complete assignments using cloud computing platforms like Google Cloud and Amazon AWS.

Uploaded by

boremshiva1201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

325 views9 pages

MET CS777 Summer2-2022 Big-Data-Analytics

Uploaded by

boremshiva1201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Boston University Metropolitan College

Big Data Analytics

MET CS 777 SUM2 On-Campus
Classes: Tuesdays 6-9:30 pm and Thursdays 6-9:30 pm classroom CAS 208
(from 7/5/22 to 8/11/22)

Prof. Dimitar Trajanov, Ph.D.

Mail: [email protected]
Office hours: Thursday 3-5 pm on my Zoom link or by appointment

Course Description
This course is an introduction to large-scale data analytics. Big Data analytics is the study of how
to extract actionable, non-trivial knowledge from a massive number of data sets. This class will
focus both on the cluster computing software tools and programming techniques used by data
scientists and the important mathematical and statistical models used in learning from
large-scale data processing. On the tool's side, we will cover the basic systems and techniques
to store large volumes of data and modern systems for cluster computing based on MapReduce
patterns such as Hadoop MapReduce, Apache Spark, and Flink.
Students will implement data mining algorithms and execute them on real cloud systems like
Amazon AWS, Google Cloud, or Microsoft Azure by using educational accounts. On the data
mining models side, this course will cover the main standard supervised and unsupervised
models and will introduce improvement techniques on the model side.
Course Prerequisites
● We expect you to have a solid background in Python programming and understand basic
statistics and machine learning. The following classes are required/recommended: MET
CS 521, MET CS 544 and MET CS 555, or MET CS 677.
● If you do not have the required/recommended courses, you need the instructor's
consent.
● This class includes topics from Cloud Computing, Parallel Processing, and Machine
Learning, making the course very compact for a six-week online course.
● To implement the assignments, students need to have excellent knowledge of the
Python programming language and some basic Linux knowledge. Assignments are very
time-consuming, and you should take this course when you have at least 20 hours per
week.

1
Boston University Metropolitan College

Learning Objectives
By completing this course, you will be able to:
● Explain the main challenges of Big Data Processing.
● Run a Big Data Processing pipeline on Google Cloud (or Amazon AWS).
● Implement Big Data code in Apache Spark (in PySpark).
● Run Supervised and Unsupervised machine learning on Large-Scale Data.
Laptop Requirement
Students should have a personal laptop. We will use laptops to write Python programs and do
the quizzes in the classroom. Also, for the Final exam, Laptops are required.
Materials
Required Book
There is no required textbook for the class. There are detailed lecture notes, and all class
material will be conveyed during the lecture.
Recommended Books

Murphy, K. (2012). Machine learning: a probabilistic perspective

The MIT Press

ISBN-13: 978-0262018029

Hastie, T. and Tibshirani, R. (2009). The elements of statistical learning:

Data mining, inference, and prediction (2nd ed.).
Springer-Verlag.
ISBN-13: 978-0-387-84858-7
This book is available for PDF download.

Han, J., Kamber, M., Pei, J. (2009). Data mining: Concepts and
techniques (3rd ed.).
Morgan Kaufmann.
ISBN-13: 978-9380931913

2
Boston University Metropolitan College

Leskovec, J. Rajaraman, A., Ullman, J. (2014). Mining of massive

datasets.
Cambridge University Press.
By agreement with the publisher, you can download the book for free
from this page

Other Materials and Resources

Perrin, J. (2020). Spark in action (2nd ed.). (Covers Apache Spark 3 with
examples in Java, Python, and Scala)
O'Reilly Media Inc.

Damji, J., Wenig, B., Das, T., Lee, D. (2020). Learning spark (2nd ed.)
O'Reilly Media Inc.

Nudurupati, S. (2021). Essential PySpark for scalable data analytics: A

beginner's guide to harnessing the power and ease of PySpark 3
Packt Publishing

Ramcharan, K., Sundar, K., Alla, S. (2020). Applied data science using
PySpark: Learn the end-to-end predictive model-building cycle
Apress

3
Boston University Metropolitan College

Main Apache Spark documentation website

GitHub
This course has a GitHub repository (https://github.com/trajanov/BigDataAnalytics) with all of
the course code examples.
Course website
This course will use the Blackboard Learn site. Students are required to have a BU ID and
password to log in. If you do not have a BU ID yet, note that this takes some time so be sure to
start this process well before class starts. The BlackBoard site is https://learn.bu.edu

Usage of Cloud Machines

In this class, we use real-world cloud systems existing on Google Cloud (or Amazon AWS). You
will receive educational credit coupons or credited access to such cloud systems. You should
never use your private account or use your credit card for this class assignment. You will receive
enough education credits to run successful assignments on Google Cloud.
The credit amount is 50 USD for Google Cloud. You should use only this amount to finish your
assignments. This would be more than enough to finish the assignments, learn how Google
Cloud (or AWS) works, and have your first enjoyable experience with it. You can choose different
numbers of Machines and different configurations of those machines. And each will cost you
differently!
Since this is real money, it makes sense to develop your code and run your jobs locally, on your
laptop, using the small data set (we will provide two types of the same data set, small and big).
Once things are working, you’ll then move to Amazon AWS or Google Cloud. We will ask you to
run your Spark jobs over the “real” data using a set of cluster machines.
Class Policies
Assignment Completion & Late Work
All assignments should be submitted on time.
Work submitted late without any reason provided will result in a grade deduction:
● 10% penalty for 24 hours late
● 20% penalty for 48 hours late
● After 48 hours, the assignments are not accepted

4
Boston University Metropolitan College

Attendance & Absences

We recognize that emergencies occur in professional and personal lives. If an emergency
prevents your completion of homework by a deadline, please notify your instructor. This must
be done before the deadline (unless the emergency makes this impossible, of course).
Additional documentation may be requested. Work submitted late without any reason provided
will result in a grade deduction: we want to be fair to everyone in this process, including the
vast majority of you who sacrifice so much to submit your homework on time in this demanding
schedule.
Academic Conduct Code
Cheating and plagiarism will not be tolerated in any Metropolitan College course. They will
result in no credit for the assignment or examination and may lead to disciplinary actions.
Please take the time to review the Student Academic Conduct Code:
http://www.bu.edu/met/metropolitan_college_people/student/resources/conduct/cod e.html.
This should not be understood as discouragement from discussing the material or your
particular approach to a problem with other students in the class. On the contrary – you should
share your thoughts, questions, and solutions. Naturally, if you choose to work in a group, you
will be expected to come up with more than one highly original solution rather than the same
mistakes.
Academic Misconduct Regarding Programming
In a programming class like ours, there is sometimes a very fine line between ”cheating” and
acceptable and beneficial interaction between peers. Thus, it is essential that you fully
understand what is and what is not allowed in collaboration with your classmates. We want to
be 100% precise, so there can be no confusion.
The rule on collaboration and communication with your classmates is very simple: you cannot
transmit or receive code from or to anyone in the class in any way—visually (by showing
someone your code), electronically (by emailing, posting, or otherwise sending someone your
code), verbally (by reading code to someone) or in any other way we have not yet imagined.
Any other collaboration is acceptable.
The rule on collaboration and communication with people who are not your classmates (or your
TAs or instructor) is also very simple: it is not allowed in any way, period. This disallows (for
example) posting any questions of any nature to programming forums such as StackOverflow. As
far as going to the web and using Google, we will apply the ”two-line rule.” Go to any web page
you like and do any search you like. But you cannot take more than two lines of code from an
external resource and include it in your assignment in any form. Note that changing variable
names or otherwise transforming or obfuscating code you found on the web does not render
the ”two-line rule” inapplicable. It is still a violation to obtain more than two lines of code from
an external resource and turn it in, whatever you do to those two lines after you first obtain
them.

5
Boston University Metropolitan College

Furthermore, you should cite your sources. Add a comment to your code that includes the
URL(s) you consulted when constructing your solution. This turns out to be very helpful when
you’re looking at something you wrote a while ago, and you need to remind yourself what you
were thinking.
Grading Criteria
Please check the Study Guide in the syllabus for Live Classroom dates and specific due dates for
assignments and assessments.
Grading Structure and Distribution
The grade for the course is determined by the following:

Activity Percentages

5 x Homework Assignments 40%

5 x Weekly Quizzes 20%

Term Project and Presentation 10%

Final Exam 30%

Assignments
Homework assignments are focused on applying theory learned in the week’s module to a set of
data and analyzing that data in PySpark. Weekly homework assignments will focus on
implementing data processing and machine learning algorithms in Apache Spark (PySpark). You
will use Google Cloud to run your Spark code on large data sets. Free of charge usage credits for
Google Cloud will be provided through Education accounts.
Due Time: At the end of each module (Please check the Study Guide or the Syllabus for the
specific due date).
Where to submit: The "Assignments" section in the left-hand course menu.
Weekly Quizzes
Quizzes will evaluate students' understanding of concepts presented in the previous week’s
module. Students should ensure adequate preparation. Doing well on the quiz will not be
possible without first reviewing the course material in-depth, attempting to understand all
examples, and testing yourself. There are five quizzes.

6
Boston University Metropolitan College

Term Project and Presentation

At the end of this course, you will work on your own Big Data project. You will work on a large
data set and analyze and train machine learning algorithms. You will present your project in the
form of a 10 minutes presentation. Clear project development guidelines will be provided in the
course content in the "Assignment" section.
Final Exam
The Final Exam will be an open book/open notes, and its duration is three hours. The exam
features a combination of multiple-choice, essay, and coding tasks.
Translation between letter grades and percentages.

A (Excellent) 95-100

A- (Excellent; minor improvement needed) 90-94.99

B+ (Very good) 87-89.99

B (Good) 83–86.99

B- (Good; some improvements needed) 80-82.99

C+ (Satisfactory; some significant improvements needed) 77-79.99

C (Satisfactory; significant improvements needed) 73–86.99

C- (Satisfactory; significant improvements required) 70-82.99

D (Many significant improvements required) 65

Unacceptable 0

Important Dates: Add/drop

Standard six-week course in Summer Session 2 (SUM2)
● Course start date: Tuesday, July 5, 2022,
● Last day to add: Monday, July 11
● Last day to drop without a “W” grade: Monday, July 11.
● Last day to drop with a “W” grade: Thursday, July 28.
● Course end date: Thursday, August 11.

7
Boston University Metropolitan College

Class Meetings, Lectures & Assignments

Quiz Assignment
Date Module Topics date due date
● Introduction to Big Data Analytics. What is Big
5 Jun Data? What are the challenges?
2022 Module 1 ● Introduction to Apache Hadoop and MapReduce.
Introduction to Apache Spark. 12
7 Jun Big Data ● Spark programming. (Python and PySpark) Jun 14 Jun
2022 Processing ● Spark - Resilient Distributed Dataset (RDDs). 2022 2022
● Spark - RDDs, DataFrames, Spark SQL
12 Jun
Module 2 ● PySpark + NumPy + SciPy, Code Optimization,
2022
Large-Scale Data Cluster Configurations 19
14 Jun Processing With ● Linear Algebra Computation in Large Scale. Jun 21 Jun
2022 PySpark ● Distributed File Storage Systems 2022 2022
● Introduction to modeling: numerical vs. probabilistic
19 Jun vs. Bayesian
2022 ● Introduction to Optimization Problems
Module 3 ● Batch and stochastic Gradient Descent
Data Modeling ● Newton’s Method 26
21 Jun and Optimization ● Expectation-Maximization, Jun 28 Jun
2022 Problems ● Markov Chain Monte Carlo (MCMC) 2022 2022
● Introduction to Supervised learning
26 Jun
● Generalized Linear Models and Logistic Regression
2022
Module 4 ● Regularization
Large-Scale ● Support Vector Machine (SVM) and the kernel trick 2
28 Jun Supervised ● Outlier Detection Aug 3 Aug
2022 Learning ● Spark ML library 2022 2022
● Introduction to Unsupervised learning
Module 5 ● K-means / K-medoids
Large-Scale ● Gaussian Mixture Models 4
2 Aug Unsupervised ● Dimensionality Reduction Aug 6 Aug
2022 Learning ● Spark MLlib for Unsupervised Learning 2022 2022
● Latent Semantic Indexing
Module 6 ● Topic models
4 Aug Large Scale Text ● Latent Dirichlet Allocation No No
2022 Mining ● Spark ML library for NLP quiz assignment
9 Aug
2022 Team Project Presentations
11 Aug
2022 Final Exam
*Lectures, Readings, and Assignments are subject to change and will be announced in class as
applicable within a reasonable time frame.

8
Boston University Metropolitan College

Instructor Biography
Prof. Dimitar Trajanov, Ph.D. is Visiting Research Professor at Boston
University and Head of the Department of Information systems and
network technologies at the Faculty of Computer Science and
Engineering - ss. Cyril and Methodius University—Skopje. From
March 2011 until September 2015, he was the founding Dean of the
Faculty of Computer Science and Engineering. In his tenure, the
Faculty has become the largest technical Faculty in Macedonia.
Dimitar Trajanov is the leader of the Regional Social Innovation Hub,
established in 2013 in cooperation with the United Nations
Development Programme. His professional experience includes working as a Data Science
Consultant for one of the largest Pharmaceutical companies, a Data Science consultant for
UNDP in North Macedonia, and a software architect in a couple of startups. Dimitar Trajanov is
the author of more than 170 journal and conference papers and seven books. He has been
involved in more than 70 research and industry projects.

Apache Spark 24 Hours PDF
100% (6)
Apache Spark 24 Hours PDF
1,129 pages
Classroom Observation Tool 2023
100% (14)
Classroom Observation Tool 2023
2 pages
ServiceNow Solution Brief Implementing ServiceNow
No ratings yet
ServiceNow Solution Brief Implementing ServiceNow
2 pages
Syllabus - CIS 509 Data Mining II (Fall 2019)
No ratings yet
Syllabus - CIS 509 Data Mining II (Fall 2019)
7 pages
Discrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP
From Everand
Discrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP
Sukhpreet Kaur Gill
No ratings yet
McDonalds Recruitment Selection Training
No ratings yet
McDonalds Recruitment Selection Training
6 pages
Gagne's 9 Events of Instructional Design
No ratings yet
Gagne's 9 Events of Instructional Design
3 pages
Hypnotherapy For Dummies Cheat Sheet - For Dummies
100% (1)
Hypnotherapy For Dummies Cheat Sheet - For Dummies
4 pages
Python Quick Interview Guide: Top Expert-Led Coding Interview Question Bank for Python Aspirants (English Edition)
From Everand
Python Quick Interview Guide: Top Expert-Led Coding Interview Question Bank for Python Aspirants (English Edition)
Shyamkant Limaye
No ratings yet
BUDT 758B Big Data - Syllabus-2016 - Gao & Gopal - 0
No ratings yet
BUDT 758B Big Data - Syllabus-2016 - Gao & Gopal - 0
4 pages
CS677A1 Fall 2022
No ratings yet
CS677A1 Fall 2022
4 pages
Syllabus E63 2018 Fall PDF
No ratings yet
Syllabus E63 2018 Fall PDF
3 pages
Big Data Analytics (BDA) : Name of The Faculty: Affiliation: Teaching Area
No ratings yet
Big Data Analytics (BDA) : Name of The Faculty: Affiliation: Teaching Area
8 pages
BUDT704: Data Processing and Analysis in Python
No ratings yet
BUDT704: Data Processing and Analysis in Python
9 pages
CC ZG522 Course Handout
No ratings yet
CC ZG522 Course Handout
6 pages
PySpark Course
No ratings yet
PySpark Course
2 pages
SAMPLE Winter _ MBA Data & Program Analytics (1)
No ratings yet
SAMPLE Winter _ MBA Data & Program Analytics (1)
7 pages
Big Data Applications, Software, Hardware and Curricula
No ratings yet
Big Data Applications, Software, Hardware and Curricula
71 pages
Syllabus E63 Spring2016-2
No ratings yet
Syllabus E63 Spring2016-2
3 pages
Information Technology Engineering Syllabus Sem Viii Mumbai University
No ratings yet
Information Technology Engineering Syllabus Sem Viii Mumbai University
60 pages
bda 1
No ratings yet
bda 1
95 pages
MDU B.Tech CSE 8th Sem Syllabus
No ratings yet
MDU B.Tech CSE 8th Sem Syllabus
7 pages
C Se 487 Course Outline Jan 28
No ratings yet
C Se 487 Course Outline Jan 28
4 pages
Big Data Autonomous Syllabus
No ratings yet
Big Data Autonomous Syllabus
6 pages
BDA2023Outline
No ratings yet
BDA2023Outline
7 pages
Big Data Management Syllabus
100% (1)
Big Data Management Syllabus
5 pages
BUDT 737 Big Data and Artificial Intelligence For Business Spring 2022 - Syllabus
No ratings yet
BUDT 737 Big Data and Artificial Intelligence For Business Spring 2022 - Syllabus
7 pages
A227252356_7_2025_Lecture 0 INT315
No ratings yet
A227252356_7_2025_Lecture 0 INT315
25 pages
Syllabus of Course Big Data Integration
No ratings yet
Syllabus of Course Big Data Integration
9 pages
Applied Data Analytics With Python
No ratings yet
Applied Data Analytics With Python
14 pages
M.E CSE Syllabus
No ratings yet
M.E CSE Syllabus
7 pages
2019 Spring Syllabus ISM6562 Muma 1
No ratings yet
2019 Spring Syllabus ISM6562 Muma 1
8 pages
Fall Term 2023 Full Term CSCI E 82 1 Advanced Machine Learning, Data Mining, And Artificial Intelligence
No ratings yet
Fall Term 2023 Full Term CSCI E 82 1 Advanced Machine Learning, Data Mining, And Artificial Intelligence
17 pages
CSE 460 - Syllabusf23
No ratings yet
CSE 460 - Syllabusf23
4 pages
CS378 Cloud Computing Syllabus
No ratings yet
CS378 Cloud Computing Syllabus
12 pages
BCA-BIGDATA-FIFTH_SEM-APPROVED-SYLLABUS
No ratings yet
BCA-BIGDATA-FIFTH_SEM-APPROVED-SYLLABUS
23 pages
Programming for the Puzzled: Learn to Program While Solving Puzzles
From Everand
Programming for the Puzzled: Learn to Program While Solving Puzzles
Srini Devadas
No ratings yet
Mit Data Science Program
100% (1)
Mit Data Science Program
15 pages
Big Daa R18 Manual
No ratings yet
Big Daa R18 Manual
84 pages
BE-AIDS-R-20-VII-VIII-Sem-Syllabus_compressed
No ratings yet
BE-AIDS-R-20-VII-VIII-Sem-Syllabus_compressed
55 pages
Appendix-74
No ratings yet
Appendix-74
42 pages
EdX Online Courses
No ratings yet
EdX Online Courses
20 pages
Applied Deep Learning: Design and implement your own Neural Networks to solve real-world problems (English Edition)
From Everand
Applied Deep Learning: Design and implement your own Neural Networks to solve real-world problems (English Edition)
Dr. Rajkumar Tekchandani
No ratings yet
Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
From Everand
Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
Avishek Nag
No ratings yet
Ultimate Enterprise Data Analysis and Forecasting using Python: Leverage Cloud platforms with Azure Time Series Insights and AWS Forecast Components for Deep learning Modeling using Python (English Edition)
From Everand
Ultimate Enterprise Data Analysis and Forecasting using Python: Leverage Cloud platforms with Azure Time Series Insights and AWS Forecast Components for Deep learning Modeling using Python (English Edition)
Shanthababu Pandian
No ratings yet
Machine Learning Upgrade: A Data Scientist's Guide to MLOps, LLMs, and ML Infrastructure
From Everand
Machine Learning Upgrade: A Data Scientist's Guide to MLOps, LLMs, and ML Infrastructure
Kristen Kehrer
No ratings yet
Big Data Engineer Course (2) (1)
No ratings yet
Big Data Engineer Course (2) (1)
31 pages
Brochure MIT XPRO - Professional Certificate in Data Engineering - V44
No ratings yet
Brochure MIT XPRO - Professional Certificate in Data Engineering - V44
15 pages
Byte Academy: Data Science
No ratings yet
Byte Academy: Data Science
11 pages
Mit Data Science Program
No ratings yet
Mit Data Science Program
16 pages
Mit Data Science Program
No ratings yet
Mit Data Science Program
16 pages
Brochure Professional Certificate in Data Engineering
100% (1)
Brochure Professional Certificate in Data Engineering
14 pages
HPC Week1 Samp
No ratings yet
HPC Week1 Samp
23 pages
Vth Sem Syllabus
No ratings yet
Vth Sem Syllabus
37 pages
Mit Data Science Program 3 8
No ratings yet
Mit Data Science Program 3 8
6 pages
Basics of Python Programming: Learn Python in 30 days (Beginners approach) - 2nd Edition
From Everand
Basics of Python Programming: Learn Python in 30 days (Beginners approach) - 2nd Edition
Dr. Pratiyush Guleria
No ratings yet
rptIpPrintNew (4)
No ratings yet
rptIpPrintNew (4)
14 pages
Big Data HW
No ratings yet
Big Data HW
6 pages
Intro
No ratings yet
Intro
22 pages
Data Engineering Brochure
No ratings yet
Data Engineering Brochure
23 pages
Mit Data Science Program
No ratings yet
Mit Data Science Program
14 pages
Ch1_tools
No ratings yet
Ch1_tools
78 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
6 pages
AP® Computer Science Principles Crash Course
From Everand
AP® Computer Science Principles Crash Course
Jacqueline Corricelli
No ratings yet
Summer Training 2020: Advanced Data Science With IBM & Bionic Robotic Arm
No ratings yet
Summer Training 2020: Advanced Data Science With IBM & Bionic Robotic Arm
10 pages
Term Paper On School Administration
100% (1)
Term Paper On School Administration
5 pages
Breast Cancer Case File
No ratings yet
Breast Cancer Case File
2 pages
Trauma Informed Education 1675651481. - Print
No ratings yet
Trauma Informed Education 1675651481. - Print
94 pages
CAPE_Soc_CoverSheetForESBA_Unit1_Fillable_May2020
No ratings yet
CAPE_Soc_CoverSheetForESBA_Unit1_Fillable_May2020
2 pages
Ahlam Lee Dissertation
100% (2)
Ahlam Lee Dissertation
6 pages
The Chemical Basis of Medical Climatolo (Y: Professor Giorgio Piccardi
No ratings yet
The Chemical Basis of Medical Climatolo (Y: Professor Giorgio Piccardi
10 pages
Review of Chemical Demonstrations Volume 5 A Handb
No ratings yet
Review of Chemical Demonstrations Volume 5 A Handb
3 pages
W4 DLL Acp9 Q4
No ratings yet
W4 DLL Acp9 Q4
4 pages
Dr. Gabriel Fernandes, MSC, Ph. D
No ratings yet
Dr. Gabriel Fernandes, MSC, Ph. D
8 pages
1743569972_Position_holders_2023-24_April_2025
No ratings yet
1743569972_Position_holders_2023-24_April_2025
6 pages
Contrastive Linguistics
No ratings yet
Contrastive Linguistics
38 pages
ENGLISH HOME LANGUAGE 2023 (Updated March 2021)
No ratings yet
ENGLISH HOME LANGUAGE 2023 (Updated March 2021)
71 pages
Sunshine Holidays: Jobs For Students With
100% (1)
Sunshine Holidays: Jobs For Students With
3 pages
Bio-024-Lab-Activity-3 Navarro, Mia Grace G.
No ratings yet
Bio-024-Lab-Activity-3 Navarro, Mia Grace G.
6 pages
The Bharat Scouts and Guides
75% (4)
The Bharat Scouts and Guides
13 pages
Personal Statement
No ratings yet
Personal Statement
2 pages
Ispec 2018 A Blessed Experience in Botswana
No ratings yet
Ispec 2018 A Blessed Experience in Botswana
70 pages
AI Driven Vocabulary Building
No ratings yet
AI Driven Vocabulary Building
8 pages
Wa0000.
No ratings yet
Wa0000.
5 pages
Why Uk1
No ratings yet
Why Uk1
3 pages
Mini Project Eocs
No ratings yet
Mini Project Eocs
83 pages
Public Speaking Speech
No ratings yet
Public Speaking Speech
3 pages
Introducing Psycholinguistics: More Information
No ratings yet
Introducing Psycholinguistics: More Information
14 pages
Department of Education: Summary of Seminar Attendance of Teachers in Science Department
No ratings yet
Department of Education: Summary of Seminar Attendance of Teachers in Science Department
9 pages
Clinical Neurophysiology 5th Edition Devon I. Rubin download pdf
100% (1)
Clinical Neurophysiology 5th Edition Devon I. Rubin download pdf
47 pages

MET CS777 Summer2-2022 Big-Data-Analytics

Uploaded by

MET CS777 Summer2-2022 Big-Data-Analytics

Uploaded by

Boston University Metropolitan College

Big Data Analytics

Prof. Dimitar Trajanov, Ph.D.

Murphy, K. (2012). Machine learning: a probabilistic perspective

Hastie, T. and Tibshirani, R. (2009). The elements of statistical learning:

Leskovec, J. Rajaraman, A., Ullman, J. (2014). Mining of massive

Other Materials and Resources

Nudurupati, S. (2021). Essential PySpark for scalable data analytics: A

Main Apache Spark documentation website

Usage of Cloud Machines

Attendance & Absences

5 x Homework Assignments 40%

5 x Weekly Quizzes 20%

Term Project and Presentation 10%

Final Exam 30%

Term Project and Presentation

A- (Excellent; minor improvement needed) 90-94.99

B+ (Very good) 87-89.99

B- (Good; some improvements needed) 80-82.99

C+ (Satisfactory; some significant improvements needed) 77-79.99

C (Satisfactory; significant improvements needed) 73–86.99

C- (Satisfactory; significant improvements required) 70-82.99

D (Many significant improvements required) 65

Important Dates: Add/drop

Class Meetings, Lectures & Assignments

You might also like