Apache Spark - Aram Mkrtchyan

Jul 8, 2015Download as PPTX, PDF0 likes413 views

Apache Spark is a cluster computing platform designed to be fast and general-purpose. It provides a unified analytics engine for large-scale data processing across SQL, streaming, machine learning, and graph processing. Spark programs can be written in Java, Scala, Python and R. It works by building resilient distributed datasets (RDDs) that can be operated on in parallel. RDDs support transformations like map, filter and join and actions like count, collect and save. Spark also provides caching of RDDs in memory for improved performance.

Lightning-fast cluster computing
Apache Spark

What is Apache Spark?
Cluster computing platform designed to be fast and general-purpose.
Fast
Universal
Highly Accessible

Comparison with MR
val textFile = spark.textFile("hdfs://...")
val counts = textFile.flatMap(line =>
line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")

Examples:Word Count
val sc = new SparkContext(...)
val textFile = sc.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")

val sc = new SparkContext(...)
val inputRDD = sc.textFile("log.txt")
val errorsRDD = inputRDD.filter(line => line.contains("error"))
val warningsRDD = inputRDD.filter(line => line.contains("warning"))
val badLinesRDD = errorsRDD.union(warningsRDD)
badLinesRDD.persist()
badLinesRDD.count()
badLinesRDD.collect()
Examples:Log Mining

How it works?
RDD
Resilient
Distributed
Dataset

Example Hadoop RDD
partitions = One per HDFS block
dependencies = none
compute = read corresponding block
preferredLocations = HDFS block locations
partitioner = none
Advanced: RDD as interface

Direct Acyclic Graph (DAG)
hadoopRDD
errorsRDD warningsRDD
badLinseRDD
filterfilter
union

Function Name Purpose Example
map() Apply a function to each element in the RDD and return an RDD of the result. rdd.map(x => x + 1)
flatMap() Apply a function to each element in the RDD and return an RDD of the contents of
the iterators returned. Often used to extract words.
rdd.flatMap(x => x.to(3))
filter() Return an RDD consisting of only elements that pass the condition passed to
filter().
rdd.filter(x => x != 1)
distinct() Remove duplicates. rdd.distinct()
union() Produce an RDD containing elements from both RDDs. rdd.union(other)
intersection() RDD containing only elements found in both RDDs. rdd.intersection(other)
join() Perform an inner join between two RDDs. rdd.join(other)
groupByKey() Group values with same key rdd.groupByKey(other)
RDD Transformations

RDD actions
Function Name Purpose Example
count() Number of elements in RDD rdd.count()
collect() Return all elements from the RDD rdd.collect()
saveAsTextFile() Saves RDD elements to an external
storage system
rdd.saveAsTextFile(“hdfs://...”)
take(num) Return num elements from RDD rdd.take(10)
reduce(func) Combine the elements of the RDD
together in parallel (e.g., sum)
rdd.reduce((x, y) => x + y)
takeOrdered(num)(ordering) Return num elements regarding provided
ordering
rdd.takeOrdered(2)(myOrdering)

RDD Caching
Level Space Used CPU Time In Memory On disk Comments
MEMORY_ONLY High Low Y N
MEMORY_ONLY_SER Low High Y N
MEMORY_AND_DISK High Medium Some Some Spills to disk if there is too
much data to fit in memory.
MEMORY_AND_DISK_SER Low High Some Some Spills to disk if there is too
much data to fit in memory.
Stores serialized
representation in memory.
DISK_ONLY Low High N Y

How it works?
Main program which controls the flow
Driver Executors
Nodes that execute actions

How it works?
DAG
Scheduler
Coordination between
RDDs, driver and
nodes

Spark Stack
SQL
Streaming
Machine Learning
GraphX

This document discusses control structures and break and continue statements in JavaScript. It begins by providing an example of a for loop that counts from 1 to 6000. It then discusses arrays in JavaScript, including how to declare and access single and multi-dimensional arrays. Some key array methods like reverse() and sort() are also mentioned. The document concludes by explaining how to write a web page that prompts the user for 10 words and displays them in sorted order.

Cache and DrupalKornel Lugosi

DomainService の Repository 排除と エラー表現のパターンhogesuzuki

This document appears to be notes from experimenting with different approaches to implementing a domain service and repository architecture based on domain-driven design principles. It mentions trying different options for handling ordering of entities and handling errors. The notes cover multiple iterations denoted as TRY-00 through TRY-13 where different techniques for the domain service and interaction with the database repository were attempted. It also references a GitHub repository for code related to these experiments.

MongoDB - Aggregation PipelineJason Terpko

This presentation will demonstrate how you can use the aggregation pipeline with MongoDB similar to how you would use GROUP BY in SQL and the new stage operators coming 3.4. MongoDB’s Aggregation Framework has many operators that give you the ability to get more value out of your data, discover usage patterns within your data, or use the Aggregation Framework to power your application. Considerations regarding version, indexing, operators, and saving the output will be reviewed.

Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo

This document provides an overview of analytics with MongoDB and Hadoop Connector. It discusses how to collect and explore data, use visualization and aggregation, and make predictions. It describes how MongoDB can be used for data collection, pre-aggregation, and real-time queries. The Aggregation Framework and MapReduce in MongoDB are explained. It also covers using the Hadoop Connector to process large amounts of MongoDB data in Hadoop and writing results back to MongoDB. Examples of analytics use cases like recommendations, A/B testing, and personalization are briefly outlined.

The Aggregation FrameworkMongoDB

The document discusses MongoDB's Aggregation Framework, which allows users to perform ad-hoc queries and reshape data in MongoDB. It describes the key components of the aggregation pipeline including $match, $project, $group, $sort operators. It provides examples of how to filter, reshape, and summarize document data using the aggregation framework. The document also covers usage and limitations of aggregation as well as how it can be used to enable more flexible data analysis and reporting compared to MapReduce.

MongoDB Chicago - MapReduce, Geospatial, & Other Cool Featuresajhannan

1. MapReduce allows aggregation queries across collections by mapping documents to key-value pairs, reducing by key, and finalizing results. It can calculate averages, count tag frequencies, and scale across servers. 2. Geospatial indexing supports location-based queries by distance and area. Documents have location fields indexed for $near and $within operators to find closest or contained documents. 3. findAndModify atomically reads, modifies, and writes a single document, such as dequeuing from a shared queue or incrementing a counter.

MongoDB World 2016 : Advanced AggregationJoe Drumgoole

This document discusses MongoDB's aggregation framework and provides an example of creating a summary of test results from a public MOT (Ministry of Transport) dataset containing over 25 million records. It shows how to use aggregation pipeline stages like $match, $project, $group to filter the data to only cars from 2013, calculate the age of each car, and then group the results to output statistics on counts, average mileages, and number of passes for each make and age combination. The aggregation framework allows processing large collections in parallel and creating new data from existing data.

MongoDBGanesh Kunwar

MongoDB is a document database that provides high performance, high availability, and easy scalability. It uses a document-oriented data model where data is stored in documents that contain field and value pairs similar to JSON objects. Documents can be embedded within other documents to create complex hierarchical relationships between data. MongoDB supports replication and automatic sharding for scalability and high availability.

Apache avro and overview hadoop toolsalireza alikhani

Serialization is the process of converting data structures into a binary or textual format for transmission or storage. Avro is an open-source data serialization framework that uses JSON schemas and remote procedure calls (RPCs) to serialize data. It allows for efficient encoding of complex data structures and schema evolution. Avro provides APIs for Java, C, C++, C#, Python and Ruby to serialize and deserialize data according to Avro schemas.

Spark: Taming Big DataLeonardo Gamas

This document provides an overview of Apache Spark, a unified analytics engine for large-scale data processing. It discusses what Spark is, its key features like speed, integration, and simplicity. It also covers Spark's language support in Scala, Java, and Python. The document then discusses Resilient Distributed Datasets (RDDs), which are Spark's fundamental data structure, as well as transformations and actions. It provides examples of RDD operations. The document also covers Spark projects like Spark SQL, Spark Streaming, MLlib and GraphX and provides brief descriptions of their functionality.

MongoDB Aggregation FrameworkCaserta

These are slides from our Big Data Warehouse Meetup in April. We talked about NoSQL databases: What they are, how they’re used and where they fit in existing enterprise data ecosystems. Mike O’Brian from 10gen, introduced the syntax and usage patterns for a new aggregation system in MongoDB and give some demonstrations of aggregation using the new system. The new MongoDB aggregation framework makes it simple to do tasks such as counting, averaging, and finding minima or maxima while grouping by keys in a collection, complementing MongoDB’s built-in map/reduce capabilities. For more information, visit our website at http://casertaconcepts.com/ or email us at [email protected].

working with filesSangeethaSasi1

This document discusses working with files in C++. It covers opening and closing files using constructors and the open() function. It describes using input and output streams to read from and write to files. It also discusses the different file stream classes like ifstream, ofstream, and fstream and their functions. Finally, it mentions the different file opening modes that can be used with the open() function.

JavaScript client API for Google Apps Script API primerBruce McPherson

Aggregation Framework in MongoDB Overview Part-1Anuj Jain

The document discusses MongoDB's aggregation framework. It defines aggregation as gathering data together to perform computations and return computed results. The aggregation framework in MongoDB uses pipelines similar to UNIX pipes to perform aggregation operations like $group, $match, $project, etc. on data. It also supports map-reduce operations and provides connectors to Hadoop. The document provides examples of translating common SQL queries to the aggregation framework and discusses concepts like optimization, restrictions and references for further reading.

MongoDB Aggregation Amit Ghosh

This document provides an overview of MongoDB aggregation which allows processing data records and returning computed results. It describes some common aggregation pipeline stages like $match, $lookup, $project, and $unwind. $match filters documents, $lookup performs a left outer join, $project selects which fields to pass to the next stage, and $unwind deconstructs an array field. The document also lists other pipeline stages and aggregation pipeline operators for arithmetic, boolean, and comparison expressions.

Java JVM Memory Cheat SheetMark Papis

Mongo indexesMehmet Çetin

Indexes can be created to support specific queries by including the fields used in the query predicates. Indexes can have prefixes that include a subset of fields to support sorting. If MongoDB cannot use an index for sorting, it will perform a blocking sort that is limited to 100MB of memory by default unless allowDiskUse() is specified. The explain() method can identify if a blocking sort is required. Field order in indexes is important for sorting. Ensuring indexes have high selectivity can reduce the number of documents needing to be scanned for a query. Explain results provide details on the query planner and execution statistics for stages like collection scans, index scans, and merging results.

MongoDB Aggregation MongoSF May 2011Chris Westin

This document discusses MongoDB's new aggregation framework, which provides a more performant and declarative way to perform data aggregation tasks compared to MapReduce. The framework includes pipeline operations like $match, $project, and $group that allow filtering, reshaping, and grouping documents. It also features an expression language for computed fields. The initial release will support aggregation pipelines and sharding, with future plans to add more operations and expressions.

MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB

This document discusses analyzing flight data using MongoDB aggregation. It provides examples of aggregation pipelines to group, match, project, sort, unwind and other stages. It explores questions about major carriers, airport cancellations, delays by distance and carrier. It also discusses visualizing route data and hub airports. Finally, it proposes a quiz on analyzing NYC flight data by importing data and performing queries on origins, cancellations, delays and weather impacts by month between the three major NYC airports.

Using spark data frame for sqlDaeMyung Kang

1) This document provides examples of how to use Spark DataFrames and SQL to load and analyze Iris flower data. It shows how to load data from files and Kafka, define schemas, select, filter, sort, group, and join dataframes. 2) Methods like spark.read, dataframe.select(), dataframe.filter(), and dataframe.groupBy() are used to load and query the data. StructType and case classes define the schema. SQL statements can also be used via the sqlContext. 3) User defined functions (UDFs) are demonstrated to handle custom data types like maps. The examples provide an overview of basic Spark DataFrame and SQL functionality.

Unsupervised Learning with Apache SparkDB Tsai

Unsupervised learning refers to a branch of algorithms that try to find structure in unlabeled data. Clustering algorithms, for example, try to partition elements of a dataset into related groups. Dimensionality reduction algorithms search for a simpler representation of a dataset. Spark's MLLib module contains implementations of several unsupervised learning algorithms that scale to huge datasets. In this talk, we'll dive into uses and implementations of Spark's K-means clustering and Singular Value Decomposition (SVD). Bio: Sandy Ryza is an engineer on the data science team at Cloudera. He is a committer on Apache Hadoop and recently led Cloudera's Apache Spark development.

The Aggregation FrameworkMongoDB

MongoDB offers two native data processing tools: MapReduce and the Aggregation Framework. MongoDB’s built-in aggregation framework is a powerful tool for performing analytics and statistical analysis in real-time and generating pre-aggregated reports for dashboarding. In this session, we will demonstrate how to use the aggregation framework for different types of data processing including ad-hoc queries, pre-aggregated reports, and more. At the end of this talk, you should walk aways with a greater understanding of the built-in data processing options in MongoDB and how to use the aggregation framework in your next project.

Data warehouse or conventional database: Which is right for you?Data Con LA

Data Con LA 2020 Description Developers have a plethora of choice for application data stores. In this talk we'll explore the differences between transaction processing systems like MySQL and analytic databases like ClickHouse to help you make the best choice for your application. Confused about when to use a data warehouse vs a traditional relational database? Open source has so many choices! Using MySQL and ClickHouse as examples, we'll work through use cases to see where each shines. Along the way we'll explore key technical differences like: * row vs. column storage * indexing and compression * query parallelization * concurrency support * transaction models. Finally we'll discuss how to handle use cases that require capabilities of both. Listeners will leave with clear criteria and and deeper understanding of database internals that enable them to make the right choice(s) for their own use cases. Speaker Robert Hodges, Altinity, Inc, CEO

Get docs from sp doc librarySudip Sengupta

MongoDb and NoSQLTO THE NEW | Technology

The agenda of the slides are to discuss some basic and in-depth details of MongoDB and NoSQL. A snapshot of the topics discussed: - Introduction to NoSQL and MongoDB - Installation - Queries - Indexing - Schema modeling - Aggregation This tutorial is an introduction to MongoDB and NoSQL. The tutorial includes an introduction to MongoDb and NoSQL, installation, queries related to MongoDB and NoSQL, aggregation framework, indexing of MongoDB and NoSQL and schema modelling. The tutorial begins with a section on introduction. This section includes an introduction to NoSQL, its data models like document model, graph model, key value etc. It also includes an introduction to MongoDB and its data model. The introduction section is then followed by the installation section. This section includes installing MongoDB, default directory, starting MongoDB server, starting Mongo shell and more steps. It also includes adding documents. The next section is about queries related to MongoDB and NoSQL. This section includes query collection which are selecting all documents, find by example, use OR condition, use AND condition, update query. It also includes removing documents. Then comes a section about aggregation framework. This section includes a brief about aggregation framework process and its samples. The next section is about indexing. This section involves indexing for speeding up of search and sorting, types of indexes like single field, compound field, multiple index etc. The last section of the tutorial is about schema modelling. This section includes schema design factors like rich documents, no mongo joins, no constraints, atomic operation etc.

Google apps script database abstraction exposed versionBruce McPherson

This document describes a database abstraction library for Google Apps Script that provides a consistent API for NoSQL databases. It allows code to be reused across different database backends by handling queries, authentication, caching, and more transparently. The library exposes the capabilities through a JSON REST API that can be accessed from other programming languages. It also includes a VBA client library that translates VBA calls into requests to the JSON API, allowing VBA code to access databases in the same way as Google Apps Script.

R statistics with mongo dbMongoDB

This document discusses using R for statistical analysis with MongoDB as the database. It introduces MongoDB as a NoSQL database for storing large, complex datasets. It describes the rmongodb package for connecting R to MongoDB, allowing users to query, aggregate, and analyze MongoDB data directly in R without importing entire datasets into memory. Examples show performing queries, aggregations, and accessing results as native R objects. The document promotes R and MongoDB as a solution for big data analytics.

Rss та wiki Alla239

UXPA 2016 - Using UX Skills to Shape Your CareerAmanda Stockwell

The document appears to be notes from a presentation on using UX skills to shape one's career. Some of the key points discussed include: - There are many potential paths for success in UX, such as consulting, in-house roles, product strategy/management, and leadership. - Effective communication of one's skills, experiences, and impact is important for career opportunities. User research skills can be applied to learn about potential employers/clients. - Content strategy techniques like creating a project inventory and PARR (Problem, Action, Role, Result) statements can help showcase work experience and value. - Visual representations like the "Broken Comb" can demonstrate UX skills like UI design, and personal projects

More Related Content

What's hot (20)

MongoDBGanesh Kunwar

Apache avro and overview hadoop toolsalireza alikhani

Spark: Taming Big DataLeonardo Gamas

MongoDB Aggregation FrameworkCaserta

working with filesSangeethaSasi1

JavaScript client API for Google Apps Script API primerBruce McPherson

Aggregation Framework in MongoDB Overview Part-1Anuj Jain

MongoDB Aggregation Amit Ghosh

Java JVM Memory Cheat SheetMark Papis

Mongo indexesMehmet Çetin

MongoDB Aggregation MongoSF May 2011Chris Westin

MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB

Using spark data frame for sqlDaeMyung Kang

Unsupervised Learning with Apache SparkDB Tsai

The Aggregation FrameworkMongoDB

Data warehouse or conventional database: Which is right for you?Data Con LA

Get docs from sp doc librarySudip Sengupta

MongoDb and NoSQLTO THE NEW | Technology

Google apps script database abstraction exposed versionBruce McPherson

R statistics with mongo dbMongoDB

MongoDBGanesh Kunwar

Apache avro and overview hadoop toolsalireza alikhani

Spark: Taming Big DataLeonardo Gamas

MongoDB Aggregation FrameworkCaserta

working with filesSangeethaSasi1

JavaScript client API for Google Apps Script API primerBruce McPherson

Aggregation Framework in MongoDB Overview Part-1Anuj Jain

MongoDB Aggregation Amit Ghosh

Java JVM Memory Cheat SheetMark Papis

Mongo indexesMehmet Çetin

MongoDB Aggregation MongoSF May 2011Chris Westin

MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB

Using spark data frame for sqlDaeMyung Kang

Unsupervised Learning with Apache SparkDB Tsai

The Aggregation FrameworkMongoDB

Data warehouse or conventional database: Which is right for you?Data Con LA

Get docs from sp doc librarySudip Sengupta

MongoDb and NoSQLTO THE NEW | Technology

Google apps script database abstraction exposed versionBruce McPherson

R statistics with mongo dbMongoDB

Viewers also liked (18)

Rss та wiki Alla239

UXPA 2016 - Using UX Skills to Shape Your CareerAmanda Stockwell

Research is not just for the UX team. Amanda Stockwell

Slides from Amanda Stockwell's talk at Agile2015, "Research is not just for the UX team: Strategies for everyone to understand end-users." Covers an an overview of the key goals of user research, the key methodologies that any team member can employ, concrete tips for how to select the best method given your goal, and advice to craft your research plans the best way to get the information you’re looking for.

A toolset for a modern dev companyHovhannes Kuloghlyan

Rss та wiki Alla239

RSS та Wiki-технології в упралінні персоналом. Ефективне управління підприємством і його включення у світовий ін-формаційний простір передбачає необхідність сформувати своє мережеве представлення у цифровому форматі в Internet, зважаючи на поступальний розвиток цифрової економіки. За допомогою RSS та Wiki в управлінні персоналом можна достить на багато спростити собі роботу.

Linked In PPLuis Grullon

Luis Grullon is currently a Merchandise Presentation Specialist at Walt Disney World Resort. His responsibilities include designing window and ledge displays for retail locations, creating PowerPoint presentations, budgeting for projects, and installing visual window displays. He has a Bachelor's Degree in Industrial Design from the Art Institute of Ft. Lauderdale and an Associate's Degree also from the Art Institute of Ft. Lauderdale. He has taken courses in product design, packaging, furniture, transportation design, and other areas.

resume2HARENDRA SINGH

Harendra Singh is a civil engineer with over 25 years of experience in construction project management. He is currently working as a project manager for JKumar Infraproject Ltd. on a project in Alwar, Rajasthan. His objective is to secure a managerial position that allows him to utilize his qualifications and experience while embracing new strategies. He has extensive experience managing commercial, high-rise building, and industrial projects. Harendra Singh is proficient in Microsoft Office applications and construction management software. He is seeking new opportunities in a progressive organization.

Using UX Skills to Craft Your CareerAmanda Stockwell

This is Amanda Stockwell's session from UX Australia 2015 in Brisbane. The session discussed the unique challenges that UX professionals face when crafting their career path and finding roles that are both appropriate fits for their existing skill sets and offer opportunities to grow. It helped the attendees understand UX career options and help them craft their work samples and personal interactions to maximise their chances for success, whatever that looks like to them. The session included a discussion of: The varying career paths within UX and definitions of success What employers are looking for in UX professionals Ways to utilise existing UX skills to illustrate strengths and articulate value within a work environment or to potential employers Tips to improve work samples to demonstrate expertise Methods to present and brands oneself

SISTEM INFORMASI KESEHATAN STIKES AL IRSYAD AL ISLAMIYAH CILACAP (108114020 &...Ibrahim Lubis

UX is not just for designers. UX IRLAmanda Stockwell

Respetar a los demáslorenieto

SISTEM INFPRMASI KESEHATAN STIKES AL IRSYAD AL ISLAMIYAH CILACAP (108114020 &...Ibrahim Lubis

Dokumen tersebut membahas tentang sistem informasi kesehatan (SIK), yang didefinisikan sebagai sistem pengelolaan data dan informasi kesehatan di seluruh tingkat pemerintahan secara terintegrasi untuk mendukung manajemen kesehatan. Tujuan SIK adalah membantu pengambilan keputusan untuk mendeteksi dan mengendalikan masalah kesehatan serta memantau dan meningkatkan layanan kesehatan. Ruang lingkup SIK mencakup registrasi

COMPANY PROFILE 2015VIJAY XAVIER

RFS Components Supply Sdn. Bhd. is an electrical and telecommunications engineering company that has been operating in Malaysia since 2001. It has 27 staff members, including directors and project managers. The company owns vehicles and equipment used for its projects. It has completed over 27 electrical installation and wiring projects, and is currently working on 3 more. RFS aims to provide quality electrical and telecom services to its clients.

Propiedades de la PotenciaAdriana Barrios

Este documento presenta las propiedades de las potencias en números enteros. Explica que una potencia es la multiplicación de un número por sí mismo el número de veces indicado por el exponente. Luego detalla cinco propiedades clave de las potencias: 1) cualquier número elevado a la potencia 0 es igual a 1, 2) cualquier número elevado a la potencia 1 es igual al número, 3) la multiplicación de potencias iguales es la suma de los exponentes, 4) la división de potencias iguales es la resta de los exponentes, y 5) una potencia elevada a otro ex

Serve your customers better with User Experience ResearchAmanda Stockwell

This document discusses user experience (UX) research and marketing research. It defines UX research as understanding users and the context in which they use products in order to uncover opportunities and understand why things happen from the user's perspective. Marketing research is defined as understanding purchasers and the context of purchase in order to uncover market opportunities and understand what is happening from the company's perspective. The document then outlines different types of research methods that can be used for UX and marketing research like interviews, usability testing, surveys, and analytics reviews. It provides guidance on choosing methods based on the product stage and type of questions being asked.

Apron feederelunaedgar

PraveenPraveen Kumar

This document provides a summary of Praveen Kumar's professional experience and qualifications. It states that he has over 4.9 years of experience in commercial banking, marketing, branch operations, and customer relationship management. Currently he works as a Manager of Marketing and Business Development at Bank of Baroda. He has a strong background in areas like credit analysis, branch management, business development, and customer service. Praveen holds a PGDM in Marketing and Operations as well as an MSc and BSc in Chemistry.

PPT encontro com Professores CoordenadoresGiani de Cássia Santana

Este documento propõe práticas inovadoras para o ensino médio e fundamental anos finais discutindo o papel do coordenador pedagógico e desafios atuais da educação. Apresenta dados sobre a matrícula por série em São Paulo e sugere focar no aluno, suas necessidades e desejos. Discute a importância da colaboração entre professores e a adoção de métodos que deem sentido à aprendizagem.

Rss та wiki Alla239

UXPA 2016 - Using UX Skills to Shape Your CareerAmanda Stockwell

Research is not just for the UX team. Amanda Stockwell

A toolset for a modern dev companyHovhannes Kuloghlyan

Rss та wiki Alla239

Linked In PPLuis Grullon

resume2HARENDRA SINGH

Using UX Skills to Craft Your CareerAmanda Stockwell

SISTEM INFORMASI KESEHATAN STIKES AL IRSYAD AL ISLAMIYAH CILACAP (108114020 &...Ibrahim Lubis

UX is not just for designers. UX IRLAmanda Stockwell

Respetar a los demáslorenieto

SISTEM INFPRMASI KESEHATAN STIKES AL IRSYAD AL ISLAMIYAH CILACAP (108114020 &...Ibrahim Lubis

COMPANY PROFILE 2015VIJAY XAVIER

Propiedades de la PotenciaAdriana Barrios

Serve your customers better with User Experience ResearchAmanda Stockwell

Apron feederelunaedgar

PraveenPraveen Kumar

PPT encontro com Professores CoordenadoresGiani de Cássia Santana

Similar to Apache Spark - Aram Mkrtchyan (20)

Operations on rddsparrowAnalytics.com

The document describes various transformations and actions that can be performed on RDDs in Apache Spark. It explains functions like map(), filter(), reduceByKey() for transformations. Actions to extract data from RDDs like collect(), count(), take() are also covered. Examples of working with key-value pairs and performing joins on pair RDDs are provided. The document also includes code examples to analyze sales data from a CSV file using Spark RDD functions.

Introduction to Apache SparkMohamed hedi Abidi

Spark workshopWojciech Pituła

This document provides an agenda and overview for a Spark workshop covering Spark basics and streaming. The agenda includes sections on Scala, Spark, Spark SQL, and Spark Streaming. It discusses Scala concepts like vals, vars, defs, classes, objects, and pattern matching. It also covers Spark RDDs, transformations, actions, sources, and the spark-shell. Finally, it briefly introduces Spark concepts like broadcast variables, accumulators, and spark-submit.

Introduction to Spark with ScalaHimanshu Gupta

Apache spark: in and outBen Fradet

This document summarizes Apache Spark batch APIs, provides real-world examples of Spark jobs, addresses shortcomings of the Spark APIs, and outlines how to run and configure Spark jobs on AWS EMR. The document introduces the RDD, SQL, DataFrame and Dataset APIs in Spark and compares them. It then gives examples of enriching and shredding data with Spark. It discusses type-safe APIs to address issues in the default Spark APIs. Finally, it outlines the configuration needed to run optimized Spark jobs on EMR, including memory, parallelism and allocation settings.

20130912 YTC_Reynold Xin_Spark and SharkYahooTechConference

In this talk, we present two emerging, popular open source projects: Spark and Shark. Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. It outperform Hadoop by up to 100x in many real-world applications. Spark programs are often much shorter than their MapReduce counterparts thanks to its high-level APIs and language integration in Java, Scala, and Python. Shark is an analytic query engine built on top of Spark that is compatible with Hive. It can run Hive queries much faster in existing Hive warehouses without modifications. These systems have been adopted by many organizations large and small (e.g. Yahoo, Intel, Adobe, Alibaba, Tencent) to implement data intensive applications such as ETL, interactive SQL, and machine learning.

Big Data Analytics with Scala at SCALA.IO 2013Samir Bessalah

This document provides an overview of big data analytics with Scala, including common frameworks and techniques. It discusses Lambda architecture, MapReduce, word counting examples, Scalding for batch and streaming jobs, Apache Storm, Trident, SummingBird for unified batch and streaming, and Apache Spark for fast cluster computing with resilient distributed datasets. It also covers clustering with Mahout, streaming word counting, and analytics platforms that combine batch and stream processing.

Spark & Spark Streaming Internals - Nov 15 (1)Akhil Das

This document summarizes Spark and Spark Streaming internals. It discusses the Resilient Distributed Dataset (RDD) model in Spark, which allows for fault tolerance through lineage-based recomputation. It provides an example of log mining using RDD transformations and actions. It then discusses Spark Streaming, which provides a simple API for stream processing by treating streams as series of small batch jobs on RDDs. Key concepts discussed include Discretized Stream (DStream), transformations, and output operations. An example Twitter hashtag extraction job is outlined.

Meetup ml spark_pptSnehal Nagmote

This document provides an overview of Apache Spark and machine learning using Spark. It introduces the speaker and objectives. It then covers Spark concepts including its architecture, RDDs, transformations and actions. It demonstrates working with RDDs and DataFrames. Finally, it discusses machine learning libraries available in Spark like MLib and how Spark can be used for supervised machine learning tasks.

Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks

This document summarizes a presentation on Spark SQL and its capabilities. Spark SQL allows users to run SQL queries on Spark, including HiveQL queries with UDFs, UDAFs, and SerDes. It provides a unified interface for reading and writing data in various formats. Spark SQL also allows users to express common operations like selecting columns, joining data, and aggregation concisely through its DataFrame API. This reduces the amount of code users need to write compared to lower-level APIs like RDDs.

Testing batch and streaming Spark applicationsŁukasz Gawron

Apache Spark is a general engine for processing data on a large scale. Employing this tool in a distributed environment to process large data sets is undeniably beneficial. But what about fast feedback loop while developing such application with Apache Spark? Testing it on a cluster is essential, but it does not seem to be what most developers accustomed to TDD workflow would like to do. In the talk, ŁLLukasz will share with you some tips on how to write the unit and integration tests, and how Docker can be applied to test Spark application on a local machine. Examples will be presented within the ScalaTest framework, and it should be easy to grasp by people who know Scala and other JVM languages.

[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark ApplicationsFuture Processing

Apache Spark jest narzędziem do przetwarzania danych na dużą skalę. Zastosowanie tego narzędzia w rozproszonym środowisku, w celu przetwarzania dużych zbiorów danych daje ogromne korzyści. Ale co z szybką pętlą zwrotną podczas opracowywania aplikacji z użyciem Apache Spark? Testowanie aplikacji w klastrze jest niezbędne, lecz nie wydaje się być tym, do czego większość programistów przywykło podczas praktykowania TDD. Podczas wystąpienia, Łukasz podzielił się z kilkoma wskazówkami, jak można napisać testy jednostkowe oraz integracyjne i jak Docker może być używany do testowania Sparka na lokalnej maszynie.

Spark training-in-bangaloreKelly Technologies

CloudCamp Chicago lightning talk "Spark: A Quick Ignition" - Matthew Kem...CloudCamp Chicago

Scala meetup - Intro to sparkJavier Arrieta

This document provides an introduction to Apache Spark, including its core components, architecture, and programming model. Some key points: - Spark uses Resilient Distributed Datasets (RDDs) as its fundamental data structure, which are immutable distributed collections that allow in-memory computing across a cluster. - RDDs support transformations like map, filter, reduce, and actions like collect that return results. Transformations are lazy while actions trigger computation. - Spark's execution model involves a driver program that coordinates tasks on worker nodes using an optimized scheduler. - Spark SQL, MLlib, GraphX, and Spark Streaming extend the core Spark API for structured data, machine learning, graph processing, and stream processing

Introduction to Map-Reduce Programming with HadoopDilum Bandara

Simple Apache Spark Introduction - Part 2chiragmota91

Introduction to Scalding and MonoidsHugo Gävert

Scalding - Hadoop Word Count in LESS than 70 lines of codeKonrad Malawski

Apache Spark for Library Developers with William Benton and Erik ErlandsonDatabricks

As a developer, data engineer, or data scientist, you’ve seen how Apache Spark is expressive enough to let you solve problems elegantly and efficient enough to let you scale out to handle more data. However, if you’re solving the same problems again and again, you probably want to capture and distribute your solutions so that you can focus on new problems and so other people can reuse and remix them: you want to develop a library that extends Spark. You faced a learning curve when you first started using Spark, and you’ll face a different learning curve as you start to develop reusable abstractions atop Spark. In this talk, two experienced Spark library developers will give you the background and context you’ll need to turn your code into a library that you can share with the world. We’ll cover: Issues to consider when developing parallel algorithms with Spark, Designing generic, robust functions that operate on data frames and datasets, Extending data frames with user-defined functions (UDFs) and user-defined aggregates (UDAFs), Best practices around caching and broadcasting, and why these are especially important for library developers, Integrating with ML pipelines, Exposing key functionality in both Python and Scala, and How to test, build, and publish your library for the community. We’ll back up our advice with concrete examples from real packages built atop Spark. You’ll leave this talk informed and inspired to take your Spark proficiency to the next level and develop and publish an awesome library of your own.

Operations on rddsparrowAnalytics.com

Introduction to Apache SparkMohamed hedi Abidi

Spark workshopWojciech Pituła

Introduction to Spark with ScalaHimanshu Gupta

Apache spark: in and outBen Fradet

20130912 YTC_Reynold Xin_Spark and SharkYahooTechConference

Big Data Analytics with Scala at SCALA.IO 2013Samir Bessalah

Spark & Spark Streaming Internals - Nov 15 (1)Akhil Das

Meetup ml spark_pptSnehal Nagmote

Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks

Testing batch and streaming Spark applicationsŁukasz Gawron

[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark ApplicationsFuture Processing

Spark training-in-bangaloreKelly Technologies

CloudCamp Chicago lightning talk "Spark: A Quick Ignition" - Matthew Kem...CloudCamp Chicago

Scala meetup - Intro to sparkJavier Arrieta

Introduction to Map-Reduce Programming with HadoopDilum Bandara

Simple Apache Spark Introduction - Part 2chiragmota91

Introduction to Scalding and MonoidsHugo Gävert

Scalding - Hadoop Word Count in LESS than 70 lines of codeKonrad Malawski

Apache Spark for Library Developers with William Benton and Erik ErlandsonDatabricks

Recently uploaded (20)

Big Data Analytics Quick Research Guide by Arthur MorganArthur Morgan

This is a Quick Research Guide (QRG). QRGs include the following: - A brief, high-level overview of the QRG topic. - A milestone timeline for the QRG topic. - Links to various free online resource materials to provide a deeper dive into the QRG topic. - Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic. QRGs planned for the series: - Artificial Intelligence QRG - Quantum Computing QRG - Big Data Analytics QRG - Spacecraft Guidance, Navigation & Control QRG (coming 2026) - UK Home Computing & The Birth of ARM QRG (coming 2027) Any questions or comments? - Please contact Arthur Morgan at [email protected]. 100% human made.

AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB

I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.

AI and Data Privacy in 2025: Global TrendsInData Labs

In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy. This infographic contains: -AI and data privacy: Key findings -Statistics on AI data privacy in the today’s world -Tips on how to overcome data privacy challenges -Benefits of AI data security investments. Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.

Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...BookNet Canada

Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next. Link to recording, presentation slides, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/ Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.

Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan

ThousandEyes Partner Innovation Updates for May 2025ThousandEyes

Electronic_Mail_Attacks-1-35.pdf by xploitniftliyevhuseyn

What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat

The MCP (Model Context Protocol) is a framework designed to manage context and interaction within complex systems. This SlideShare presentation will provide a detailed overview of the MCP Model, its applications, and how it plays a crucial role in improving communication and decision-making in distributed systems. We will explore the key concepts behind the protocol, including the importance of context, data management, and how this model enhances system adaptability and responsiveness. Ideal for software developers, system architects, and IT professionals, this presentation will offer valuable insights into how the MCP Model can streamline workflows, improve efficiency, and create more intuitive systems for a wide range of use cases.

Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity

This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots. 📕 Here's what you can expect: - Modeling: Build end-to-end processes using BPMN. - Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes. - Operating: Control process instances with rewind, replay, pause, and stop functions. - Monitoring: Use dashboards and embedded analytics for real-time insights into process instances. This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes. 👨‍🏫 Speaker: Andrei Vintila, Principal Product Manager @UiPath This session streamed live on April 29, 2025, 16:00 CET. Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.

Cybersecurity Identity and Access Solutions using Azure ADVICTOR MAESTRE RAMIREZ

Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB

Want to learn practical tips for designing systems that can scale efficiently without compromising speed? Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development. As you explore key principles of designing low-latency systems with Rust, you will learn how to: - Create and compile a real-world app with Rust - Connect the application to ScyllaDB (NoSQL data store) - Negotiate tradeoffs related to data modeling and querying - Manage and monitor the database for consistently low latencies

Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company

Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.

Rusty Waters: Elevating Lakehouses Beyond Sparkcarlyakerly1

Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark? At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍 Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀

Build Your Own Copilot & Agents For DevsBrian McKeiver

Splunk Security Update | Public Sector Summit Germany 2025Splunk

DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock

Building 10x Organizations with Modern Productivity Metrics 10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’ Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them. But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?

Linux Professional Institute LPIC-1 Exam.pdfRHCSA Guru

HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/ HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar. Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten. In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich - Zugriff auf die Konsole - Auffinden und Interpretieren von Protokolldateien - Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS) - Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien - Nutzung der Client Clocking-Funktion

AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix

Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025 https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/ Is AI just another technology, or does it fundamentally change the way we live and think? Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater. At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts. At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.

2025-05-Q4-2024-Investor-Presentation.pptxSamuele Fogagnolo

Big Data Analytics Quick Research Guide by Arthur MorganArthur Morgan

AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB

AI and Data Privacy in 2025: Global TrendsInData Labs

Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...BookNet Canada

Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan

ThousandEyes Partner Innovation Updates for May 2025ThousandEyes

Electronic_Mail_Attacks-1-35.pdf by xploitniftliyevhuseyn

What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat

Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity

Cybersecurity Identity and Access Solutions using Azure ADVICTOR MAESTRE RAMIREZ

Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB

Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company

Rusty Waters: Elevating Lakehouses Beyond Sparkcarlyakerly1

Build Your Own Copilot & Agents For DevsBrian McKeiver

Splunk Security Update | Public Sector Summit Germany 2025Splunk

DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock

Linux Professional Institute LPIC-1 Exam.pdfRHCSA Guru

HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda

AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix

2025-05-Q4-2024-Investor-Presentation.pptxSamuele Fogagnolo

Apache Spark - Aram Mkrtchyan

1. Lightning-fast cluster computing Apache Spark

2. What is Apache Spark? Cluster computing platform designed to be fast and general-purpose. Fast Universal Highly Accessible

3. Unified stack

4. Comparison with MR val textFile = spark.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...")

5. Examples:Word Count val sc = new SparkContext(...) val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...")

6. val sc = new SparkContext(...) val inputRDD = sc.textFile("log.txt") val errorsRDD = inputRDD.filter(line => line.contains("error")) val warningsRDD = inputRDD.filter(line => line.contains("warning")) val badLinesRDD = errorsRDD.union(warningsRDD) badLinesRDD.persist() badLinesRDD.count() badLinesRDD.collect() Examples:Log Mining

7. How it works? RDD Resilient Distributed Dataset

8. Example Hadoop RDD partitions = One per HDFS block dependencies = none compute = read corresponding block preferredLocations = HDFS block locations partitioner = none Advanced: RDD as interface

9. Direct Acyclic Graph (DAG) hadoopRDD errorsRDD warningsRDD badLinseRDD filterfilter union

10. Function Name Purpose Example map() Apply a function to each element in the RDD and return an RDD of the result. rdd.map(x => x + 1) flatMap() Apply a function to each element in the RDD and return an RDD of the contents of the iterators returned. Often used to extract words. rdd.flatMap(x => x.to(3)) filter() Return an RDD consisting of only elements that pass the condition passed to filter(). rdd.filter(x => x != 1) distinct() Remove duplicates. rdd.distinct() union() Produce an RDD containing elements from both RDDs. rdd.union(other) intersection() RDD containing only elements found in both RDDs. rdd.intersection(other) join() Perform an inner join between two RDDs. rdd.join(other) groupByKey() Group values with same key rdd.groupByKey(other) RDD Transformations

11. RDD actions Function Name Purpose Example count() Number of elements in RDD rdd.count() collect() Return all elements from the RDD rdd.collect() saveAsTextFile() Saves RDD elements to an external storage system rdd.saveAsTextFile(“hdfs://...”) take(num) Return num elements from RDD rdd.take(10) reduce(func) Combine the elements of the RDD together in parallel (e.g., sum) rdd.reduce((x, y) => x + y) takeOrdered(num)(ordering) Return num elements regarding provided ordering rdd.takeOrdered(2)(myOrdering)

12. RDD Caching Level Space Used CPU Time In Memory On disk Comments MEMORY_ONLY High Low Y N MEMORY_ONLY_SER Low High Y N MEMORY_AND_DISK High Medium Some Some Spills to disk if there is too much data to fit in memory. MEMORY_AND_DISK_SER Low High Some Some Spills to disk if there is too much data to fit in memory. Stores serialized representation in memory. DISK_ONLY Low High N Y

13. How it works? Main program which controls the flow Driver Executors Nodes that execute actions

14. How it works? DAG Scheduler Coordination between RDDs, driver and nodes

15. What is Spark Application

16. Advanced Topics: Stages

17. Advanced Topics: Shuffling

18. Spark Stack SQL Streaming Machine Learning GraphX

19. if not… DEMO everyone? ?

Apache Spark - Aram Mkrtchyan

Recommended

More Related Content

What's hot (20)

Viewers also liked (18)

Similar to Apache Spark - Aram Mkrtchyan (20)

Recently uploaded (20)

Apache Spark - Aram Mkrtchyan