SlideShare a Scribd company logo
Scalding Big ADta или обжигая горшки с рекламой 
Boris Trofimov 
@b0ris_1
Agenda 
•Two stories on how AD is served inside AD company 
•Awesome Scalding 
The stories mention one company that has built multimillion-dollar business over ordinary cookies
The story about shoes or Big Brother is watching on you
We will answer this question in a few slides 
or be careful while buying shoes
What is common between these things?
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
They are simple just for the first glance The same with loading web sites
Open any site 
with Ad
… 1 sec 
20 ms 
100 ms 
150 ms 
Publisher receives request 
Publisher 
sends 
response 
Content 
delivered 
to user 
170 
Site sends request to Ad Server 
200 
80 ms 
280 
SSP picks the winning bid and sends redirect url back to ad Server 
Every bidder/DSP receives info about user: 
•ssp_cookie_id 
•geo data 
•site url 
300 
SSP (Ad Exchange) receives ad request 
and opens 
RTB Auction 
210 
Ad Server receives ad request and redirects to Ad Exchange 
All bidders should send their decision (participate? & price) back 
350 
Ad Server shows page to user which redirects to the bidder’s server 
User’s web page asks Ad banner from CDN Showing ad & bidder’s 1x1 pixel (impression) 
400 
The first second… 
~70% users have this cookie aboard 
>>1 independent companies take part in this auction
Return info about new user’s interests with special markers (segments) that indicates the new fact about user, e.g user is man who has iphone and lives in NYC and has dog. 
Major format: <cookie_id – segment_id> 
Data Scientists 
Real time 
Offline 
Pixel Tracking 
Farm 
Warehouse 
Bidder Farm 
Auction requests 
SSP Ad 
Exchange 
Hourly 
Logs 
3rd part data 
House holders data 
… 
Hadoop’s HDFS 
Updating user profiles 
Hive 
Oozie 
MapReduce 
Partners 
HBASE 
Scalding 
hbase keeps user profiles 
Update user’s profiles with new segments 
Data export 
Brand new feed about user interests 2 3 4 5 6 7 8 9 0 1 
•Impressions 
•Clicks 
•Post-click Activities 5
Why do we need all this science? 
•Deep audience targeting 
•Case: customer would like to show ad for all men who live in NYC have iPhone and dog
Facts about Data Scientists 
•Data Scientists do: 
–Audience Modeling 
identifying new user interests [segments] and finding ways to track them 
–Audience Bridging 
–Insights and Analytics 
•They use IBM Netezza as local warehouse 
•They use R language
Facts about Realtime team 
•Scala, Java 
•Restful Services 
•Akka 
•In Memory Cache : Aerospike, Redis
Facts about Offline team 
•The tasks we solve over hadoop: 
–As a Storage to keep all logs we need 
–As Profile DB to keep all users and their interests [segments] 
–As MapReduce Engine to run jobs on transformations between data 
–As a Warehouse to export data via hive 
•We use Clouderra CDH 5.1.2 
•Major language: Scala 
•Pure MapReduce jobs & Scalding/Cascading 
•All map reduce applications are wrapped by Oozie’s workflow(s) 
•Developing nextgen paltform version based on Spark Streaming/Kafka
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
hdfs 
Scalding in a nutshell 
•Concise DSL 
•Configurable Source(s) and sink(s) 
•Data transform operations: 
–map/flatMap 
–pivot/unpivot 
–project 
–groupBy/reduce/foldLeft 
hdfs
Just one example (Java way) 
public class WordCount { 
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { 
private final static IntWritable one = new IntWritable(1); 
private Text word = new Text(); 
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
String line = value.toString(); 
StringTokenizer tokenizer = new StringTokenizer(line); 
while (tokenizer.hasMoreTokens()) { 
word.set(tokenizer.nextToken()); 
context.write(word, one); 
} 
} 
} 
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { 
public void reduce(Text key, Iterable<IntWritable> values, Context context) 
throws IOException, InterruptedException { 
int sum = 0; 
for (IntWritable val : values) { 
sum += val.get(); 
} 
context.write(key, new IntWritable(sum)); 
} 
} 
public static void main(String[] args) throws Exception { 
Configuration conf = new Configuration(); 
Job job = new Job(conf, "wordcount"); 
job.setOutputKeyClass(Text.class); 
job.setOutputValueClass(IntWritable.class); 
job.setMapperClass(Map.class); 
job.setReducerClass(Reduce.class); 
job.setInputFormatClass(TextInputFormat.class); 
job.setOutputFormatClass(TextOutputFormat.class); 
FileInputFormat.addInputPath(job, new Path(args[0])); 
FileOutputFormat.setOutputPath(job, new Path(args[1])); 
job.waitForCompletion(true); 
} 
}
Source 
Just one example (Scalding way) 
class WordCountJob(args : Args) extends Job(args) { TextLine( args("input") ) .flatMap('line -> 'word) { line : String => tokenize(line) } .groupBy('word) { _.size } .write( Tsv( args("output") ) ) // Split a piece of text into individual words. def tokenize(text : String) : Array[String] = { // Lowercase each word and remove punctuation. text.toLowerCase.split("s+") } } 
Sink 
Transform operations
Use Case 1 Split 
•Motivation: reuse calculated streams 
val common = Tsv("./file").map(...) 
val branch1 = common.map(..).write(Tsv("output")) 
val branch2 = common.groupby(..).write(Tsv("output"))
Use Case 2 Exotic Sources JDBC (out of the box) 
case object YourTableSource extends JDBCSource { 
override val tableName = "tableName" 
override val columns = List( 
varchar("col1", 64), 
date("col2"), 
tinyint("col3"), 
double("col4"), 
) 
override def currentConfig = ConnectionSpec("www.gt.com", "username", "password", "mysql") 
} 
YourTableSource.read.map(...) ...
Use Case 2 Exotic Sources HBASE 
HBaseSource (https://github.com/ParallelAI/SpyGlass) 
•SCAN_ALL, 
•GET_LIST, 
•SCAN_RANGE 
HBaseRawSource (https://github.com/andry1/SpyGlass) 
•Advanced filtering via base64Scan 
val hbs3 = new HBaseSource( tableName, quorum, 'key, List("data"), List('data), sourceMode = SourceMode.SCAN_ALL) .read 
val scan = new Scan() 
scan.setCaching(caching) 
val activity_filters = new FilterList(MUST_PASS_ONE, { 
val scvf = new SingleColumnValueFilter(toBytes("family"), toBytes("column"), GREATER_OR_EQUAL, toBytes(value)) 
scvf.setFilterIfMissing(true) 
scvf.setLatestVersionOnly(true) 
val scvf2 = ... 
List(scvf, scvf2) 
}) 
scan.setFilter(activity_filters) 
new HBaseRawSource(tableName, quorum, families, 
base64Scan = convertScanToBase64(scan)).read. ...
Use Case 3 Join 
•Motivation: joining two streams by key 
•Different join strategies: 
–joinWithLarger 
–joinWithSmaller 
–joinWithTiny 
•Inner, Left, Right, strategies 
val pipe1 = Tsv("file1").read 
val pipe2 = Tsv("file2").read // small file 
val pipe3 = Tsv("file3").read // huge file 
val joinedPipe = pipe1.joinWithTiny('id1 -> 'id2, pipe2) 
val joinedPipe2 = pipe1.joinWithLarge('id1 -> 'id2, pipe3)
Use Case 4 Distributed Caching and Counters 
//somewhere outside Job definition 
val fl = DistributedCacheFile("/user/boris/zooKeeper.json") 
// next value can be passed through any Scalding's jobs via Args object for instance 
val fileName = fl.path 
... 
class Job(val args:Args) { 
// once we receive fl.path we can read it like a ordinary file 
val fileName = args.get("fileName") 
lazy val data = readJSONFromFile(fileName) 
... 
TSV(args.get("input")).read.map('line -> 'word ) { 
line => ... /* using data json object*/ ... } 
} 
// counter example Stat("jdbc.call.counter","myapp").incBy(1)
Use Case 5 Bridging Profiles 
Motivation: bridge information from different sources and build complete person profile 
imp 
Own company’s private cookie thanks to 1x1 pixel impression 
Bridging two ssp_cookies via private cookie 
ssp_cookie_Id1 
ssp_cookie_Id2 
Bridging via ip address
Bridging Profiles 
General task definition: 
•Build graph 
•Identify connected components
Connected components Let’s scalding it 
class ConnectedComponentsJob(args : Args) extends Job(args) { var attempt = 0 while( attempt < 20 ) { val vertexes = Tsv( args("vertexes") ).read // ‘vertex t ‘gid, by default it is equal to vertex val edges = Tsv( args("edges") ) // 'gid_a t 'gid_b var groups = vertexes.joinWithSmaller('id -> 'id_b, vertexes.joinWithSmaller('id -> 'id_a, edges).discard('id ).rename('gid ->'gid_a)) .discard('id ) .rename('gid ->'gid_b) groups = groups.filter('gid_a, 'gid_b) {gid : (String, String) => gid._1 != gid._2 } .project ('gid_a, 'gid_b) .map(('gid_a, 'gid_b) -> ('gid_a, 'gid_b)) {gid : (Integer, Integer) =>(max(gid._1, gid._2), min(gid._1, gid._2))} val count = groups.groupBy( ('gid_a, 'gid_b) ){ _.size } if (count==0) attempt = 20 val new_groups = groups.groupBy('gid_a) {_.min('gid_b)}.rename(('source, 'target)) val new_vertexes = vertexes.joinWithSmaller('id -> 'source, groups, joiner = new LeftJoin ) .mapTo( ('id,'gid,'source,'target)->('id, 'gid)) { param:(Integer, Integer, Integer, Integer) => val (id, gid, source,target) = param if (target != null) ( id , min( gid, target ) ) else ( id, gid ) } new_vertexes.write( Tsv( args("vertexes") ) ) } }
Other nice things 
•Typed pipes 
•Elegant and fast Matrix operations 
•Simple migration on Spark/Kafka 
•Way to retrieve data from hive’s hcatalog
Useful Resources 
•http://www.adopsinsider.com/ad-serving/how-does-ad-serving- work/ 
•http://www.adopsinsider.com/ad-serving/diagramming-the-ssp- dsp-and-rtb-redirect-path/ 
•https://github.com/twitter/scalding 
•https://github.com/ParallelAI/SpyGlass 
•https://github.com/branky/cascading.hive
Thank you!

More Related Content

What's hot (20)

10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
MongoDB
 
Indexing
IndexingIndexing
Indexing
Mike Dirolf
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
MongoDB
 
MongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: Tutorial
MongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: TutorialMongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: Tutorial
MongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: Tutorial
MongoDB
 
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
MongoDB
 
PistonHead's use of MongoDB for Analytics
PistonHead's use of MongoDB for AnalyticsPistonHead's use of MongoDB for Analytics
PistonHead's use of MongoDB for Analytics
Andrew Morgan
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
MongoDB
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 
Indexing and Query Optimization
Indexing and Query OptimizationIndexing and Query Optimization
Indexing and Query Optimization
MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
Caserta
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
rogerbodamer
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 
Building Apps with MongoDB
Building Apps with MongoDBBuilding Apps with MongoDB
Building Apps with MongoDB
Nate Abele
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
Dealing with Azure Cosmos DB
Dealing with Azure Cosmos DBDealing with Azure Cosmos DB
Dealing with Azure Cosmos DB
Mihail Mateev
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB
Uwe Printz
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
MongoDB
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
MongoDB
 
MongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: Tutorial
MongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: TutorialMongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: Tutorial
MongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: Tutorial
MongoDB
 
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
MongoDB
 
PistonHead's use of MongoDB for Analytics
PistonHead's use of MongoDB for AnalyticsPistonHead's use of MongoDB for Analytics
PistonHead's use of MongoDB for Analytics
Andrew Morgan
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
MongoDB
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 
Indexing and Query Optimization
Indexing and Query OptimizationIndexing and Query Optimization
Indexing and Query Optimization
MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
Caserta
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
rogerbodamer
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 
Building Apps with MongoDB
Building Apps with MongoDBBuilding Apps with MongoDB
Building Apps with MongoDB
Nate Abele
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
Dealing with Azure Cosmos DB
Dealing with Azure Cosmos DBDealing with Azure Cosmos DB
Dealing with Azure Cosmos DB
Mihail Mateev
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB
Uwe Printz
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 

Viewers also liked (8)

Distilled mongo db by Boris Trofimov
Distilled mongo db by Boris TrofimovDistilled mongo db by Boris Trofimov
Distilled mongo db by Boris Trofimov
Alex Tumanoff
 
Spark-driven audience counting by Boris Trofimov
Spark-driven audience counting by Boris TrofimovSpark-driven audience counting by Boris Trofimov
Spark-driven audience counting by Boris Trofimov
JavaDayUA
 
Code contracts by Dmytro Mindra
Code contracts by Dmytro MindraCode contracts by Dmytro Mindra
Code contracts by Dmytro Mindra
Alex Tumanoff
 
Microsoft Office 2013 новая модель разработки приложений
Microsoft Office 2013 новая модель разработки приложенийMicrosoft Office 2013 новая модель разработки приложений
Microsoft Office 2013 новая модель разработки приложений
Alex Tumanoff
 
LightSwitch - different way to create business applications
LightSwitch - different way to create business applicationsLightSwitch - different way to create business applications
LightSwitch - different way to create business applications
Alex Tumanoff
 
Enterprise or not to enterprise
Enterprise or not to enterpriseEnterprise or not to enterprise
Enterprise or not to enterprise
Alex Tumanoff
 
Kostenko ux november-2014_1
Kostenko ux november-2014_1Kostenko ux november-2014_1
Kostenko ux november-2014_1
Alex Tumanoff
 
Purely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaPurely Functional Data Structures in Scala
Purely Functional Data Structures in Scala
Vladimir Kostyukov
 
Distilled mongo db by Boris Trofimov
Distilled mongo db by Boris TrofimovDistilled mongo db by Boris Trofimov
Distilled mongo db by Boris Trofimov
Alex Tumanoff
 
Spark-driven audience counting by Boris Trofimov
Spark-driven audience counting by Boris TrofimovSpark-driven audience counting by Boris Trofimov
Spark-driven audience counting by Boris Trofimov
JavaDayUA
 
Code contracts by Dmytro Mindra
Code contracts by Dmytro MindraCode contracts by Dmytro Mindra
Code contracts by Dmytro Mindra
Alex Tumanoff
 
Microsoft Office 2013 новая модель разработки приложений
Microsoft Office 2013 новая модель разработки приложенийMicrosoft Office 2013 новая модель разработки приложений
Microsoft Office 2013 новая модель разработки приложений
Alex Tumanoff
 
LightSwitch - different way to create business applications
LightSwitch - different way to create business applicationsLightSwitch - different way to create business applications
LightSwitch - different way to create business applications
Alex Tumanoff
 
Enterprise or not to enterprise
Enterprise or not to enterpriseEnterprise or not to enterprise
Enterprise or not to enterprise
Alex Tumanoff
 
Kostenko ux november-2014_1
Kostenko ux november-2014_1Kostenko ux november-2014_1
Kostenko ux november-2014_1
Alex Tumanoff
 
Purely Functional Data Structures in Scala
Purely Functional Data Structures in ScalaPurely Functional Data Structures in Scala
Purely Functional Data Structures in Scala
Vladimir Kostyukov
 

Similar to Java/Scala Lab: Борис Трофимов - Обжигающая Big Data. (20)

Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
b0ris_1
 
2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction
antoinegirbal
 
Scalding Big (Ad)ta
Scalding Big (Ad)taScalding Big (Ad)ta
Scalding Big (Ad)ta
b0ris_1
 
Siddhi - cloud-native stream processor
Siddhi - cloud-native stream processorSiddhi - cloud-native stream processor
Siddhi - cloud-native stream processor
Sriskandarajah Suhothayan
 
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
WSO2
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
OrientDB - The 2nd generation of (multi-model) NoSQL
OrientDB - The 2nd generation of  (multi-model) NoSQLOrientDB - The 2nd generation of  (multi-model) NoSQL
OrientDB - The 2nd generation of (multi-model) NoSQL
Roberto Franchini
 
OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchApps
Bradley Holt
 
Learning with F#
Learning with F#Learning with F#
Learning with F#
Phillip Trelford
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
MongoDB
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
christkv
 
Streaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talkStreaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talk
Amrit Sarkar
 
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Lucidworks
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
Antonios Giannopoulos
 
Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revised
MongoDB
 
Eagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessEagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational Awareness
MongoDB
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
DataStax
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
Databricks
 
Cubes 1.0 Overview
Cubes 1.0 OverviewCubes 1.0 Overview
Cubes 1.0 Overview
Stefan Urbanek
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
Maxime Beugnet
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
b0ris_1
 
2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction
antoinegirbal
 
Scalding Big (Ad)ta
Scalding Big (Ad)taScalding Big (Ad)ta
Scalding Big (Ad)ta
b0ris_1
 
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
WSO2
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
OrientDB - The 2nd generation of (multi-model) NoSQL
OrientDB - The 2nd generation of  (multi-model) NoSQLOrientDB - The 2nd generation of  (multi-model) NoSQL
OrientDB - The 2nd generation of (multi-model) NoSQL
Roberto Franchini
 
OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchApps
Bradley Holt
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
MongoDB
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
christkv
 
Streaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talkStreaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talk
Amrit Sarkar
 
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Lucidworks
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
Antonios Giannopoulos
 
Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revised
MongoDB
 
Eagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessEagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational Awareness
MongoDB
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
DataStax
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
Databricks
 

More from GeeksLab Odessa (20)

DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
GeeksLab Odessa
 
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
GeeksLab Odessa
 
DataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторDataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский Виктор
GeeksLab Odessa
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображение
GeeksLab Odessa
 
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
GeeksLab Odessa
 
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
GeeksLab Odessa
 
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
GeeksLab Odessa
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
GeeksLab Odessa
 
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
GeeksLab Odessa
 
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
GeeksLab Odessa
 
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
GeeksLab Odessa
 
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
GeeksLab Odessa
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
GeeksLab Odessa
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
GeeksLab Odessa
 
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
GeeksLab Odessa
 
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
GeeksLab Odessa
 
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
GeeksLab Odessa
 
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
GeeksLab Odessa
 
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
GeeksLab Odessa
 
DataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторDataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский Виктор
GeeksLab Odessa
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображение
GeeksLab Odessa
 
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
GeeksLab Odessa
 
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
GeeksLab Odessa
 
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
GeeksLab Odessa
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
GeeksLab Odessa
 
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
GeeksLab Odessa
 
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
GeeksLab Odessa
 
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
GeeksLab Odessa
 
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
GeeksLab Odessa
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
GeeksLab Odessa
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
GeeksLab Odessa
 
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
GeeksLab Odessa
 
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
GeeksLab Odessa
 
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
GeeksLab Odessa
 

Recently uploaded (20)

Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 

Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.

  • 1. Scalding Big ADta или обжигая горшки с рекламой Boris Trofimov @b0ris_1
  • 2. Agenda •Two stories on how AD is served inside AD company •Awesome Scalding The stories mention one company that has built multimillion-dollar business over ordinary cookies
  • 3. The story about shoes or Big Brother is watching on you
  • 4. We will answer this question in a few slides or be careful while buying shoes
  • 5. What is common between these things?
  • 7. They are simple just for the first glance The same with loading web sites
  • 8. Open any site with Ad
  • 9. … 1 sec 20 ms 100 ms 150 ms Publisher receives request Publisher sends response Content delivered to user 170 Site sends request to Ad Server 200 80 ms 280 SSP picks the winning bid and sends redirect url back to ad Server Every bidder/DSP receives info about user: •ssp_cookie_id •geo data •site url 300 SSP (Ad Exchange) receives ad request and opens RTB Auction 210 Ad Server receives ad request and redirects to Ad Exchange All bidders should send their decision (participate? & price) back 350 Ad Server shows page to user which redirects to the bidder’s server User’s web page asks Ad banner from CDN Showing ad & bidder’s 1x1 pixel (impression) 400 The first second… ~70% users have this cookie aboard >>1 independent companies take part in this auction
  • 10. Return info about new user’s interests with special markers (segments) that indicates the new fact about user, e.g user is man who has iphone and lives in NYC and has dog. Major format: <cookie_id – segment_id> Data Scientists Real time Offline Pixel Tracking Farm Warehouse Bidder Farm Auction requests SSP Ad Exchange Hourly Logs 3rd part data House holders data … Hadoop’s HDFS Updating user profiles Hive Oozie MapReduce Partners HBASE Scalding hbase keeps user profiles Update user’s profiles with new segments Data export Brand new feed about user interests 2 3 4 5 6 7 8 9 0 1 •Impressions •Clicks •Post-click Activities 5
  • 11. Why do we need all this science? •Deep audience targeting •Case: customer would like to show ad for all men who live in NYC have iPhone and dog
  • 12. Facts about Data Scientists •Data Scientists do: –Audience Modeling identifying new user interests [segments] and finding ways to track them –Audience Bridging –Insights and Analytics •They use IBM Netezza as local warehouse •They use R language
  • 13. Facts about Realtime team •Scala, Java •Restful Services •Akka •In Memory Cache : Aerospike, Redis
  • 14. Facts about Offline team •The tasks we solve over hadoop: –As a Storage to keep all logs we need –As Profile DB to keep all users and their interests [segments] –As MapReduce Engine to run jobs on transformations between data –As a Warehouse to export data via hive •We use Clouderra CDH 5.1.2 •Major language: Scala •Pure MapReduce jobs & Scalding/Cascading •All map reduce applications are wrapped by Oozie’s workflow(s) •Developing nextgen paltform version based on Spark Streaming/Kafka
  • 16. hdfs Scalding in a nutshell •Concise DSL •Configurable Source(s) and sink(s) •Data transform operations: –map/flatMap –pivot/unpivot –project –groupBy/reduce/foldLeft hdfs
  • 17. Just one example (Java way) public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
  • 18. Source Just one example (Scalding way) class WordCountJob(args : Args) extends Job(args) { TextLine( args("input") ) .flatMap('line -> 'word) { line : String => tokenize(line) } .groupBy('word) { _.size } .write( Tsv( args("output") ) ) // Split a piece of text into individual words. def tokenize(text : String) : Array[String] = { // Lowercase each word and remove punctuation. text.toLowerCase.split("s+") } } Sink Transform operations
  • 19. Use Case 1 Split •Motivation: reuse calculated streams val common = Tsv("./file").map(...) val branch1 = common.map(..).write(Tsv("output")) val branch2 = common.groupby(..).write(Tsv("output"))
  • 20. Use Case 2 Exotic Sources JDBC (out of the box) case object YourTableSource extends JDBCSource { override val tableName = "tableName" override val columns = List( varchar("col1", 64), date("col2"), tinyint("col3"), double("col4"), ) override def currentConfig = ConnectionSpec("www.gt.com", "username", "password", "mysql") } YourTableSource.read.map(...) ...
  • 21. Use Case 2 Exotic Sources HBASE HBaseSource (https://github.com/ParallelAI/SpyGlass) •SCAN_ALL, •GET_LIST, •SCAN_RANGE HBaseRawSource (https://github.com/andry1/SpyGlass) •Advanced filtering via base64Scan val hbs3 = new HBaseSource( tableName, quorum, 'key, List("data"), List('data), sourceMode = SourceMode.SCAN_ALL) .read val scan = new Scan() scan.setCaching(caching) val activity_filters = new FilterList(MUST_PASS_ONE, { val scvf = new SingleColumnValueFilter(toBytes("family"), toBytes("column"), GREATER_OR_EQUAL, toBytes(value)) scvf.setFilterIfMissing(true) scvf.setLatestVersionOnly(true) val scvf2 = ... List(scvf, scvf2) }) scan.setFilter(activity_filters) new HBaseRawSource(tableName, quorum, families, base64Scan = convertScanToBase64(scan)).read. ...
  • 22. Use Case 3 Join •Motivation: joining two streams by key •Different join strategies: –joinWithLarger –joinWithSmaller –joinWithTiny •Inner, Left, Right, strategies val pipe1 = Tsv("file1").read val pipe2 = Tsv("file2").read // small file val pipe3 = Tsv("file3").read // huge file val joinedPipe = pipe1.joinWithTiny('id1 -> 'id2, pipe2) val joinedPipe2 = pipe1.joinWithLarge('id1 -> 'id2, pipe3)
  • 23. Use Case 4 Distributed Caching and Counters //somewhere outside Job definition val fl = DistributedCacheFile("/user/boris/zooKeeper.json") // next value can be passed through any Scalding's jobs via Args object for instance val fileName = fl.path ... class Job(val args:Args) { // once we receive fl.path we can read it like a ordinary file val fileName = args.get("fileName") lazy val data = readJSONFromFile(fileName) ... TSV(args.get("input")).read.map('line -> 'word ) { line => ... /* using data json object*/ ... } } // counter example Stat("jdbc.call.counter","myapp").incBy(1)
  • 24. Use Case 5 Bridging Profiles Motivation: bridge information from different sources and build complete person profile imp Own company’s private cookie thanks to 1x1 pixel impression Bridging two ssp_cookies via private cookie ssp_cookie_Id1 ssp_cookie_Id2 Bridging via ip address
  • 25. Bridging Profiles General task definition: •Build graph •Identify connected components
  • 26. Connected components Let’s scalding it class ConnectedComponentsJob(args : Args) extends Job(args) { var attempt = 0 while( attempt < 20 ) { val vertexes = Tsv( args("vertexes") ).read // ‘vertex t ‘gid, by default it is equal to vertex val edges = Tsv( args("edges") ) // 'gid_a t 'gid_b var groups = vertexes.joinWithSmaller('id -> 'id_b, vertexes.joinWithSmaller('id -> 'id_a, edges).discard('id ).rename('gid ->'gid_a)) .discard('id ) .rename('gid ->'gid_b) groups = groups.filter('gid_a, 'gid_b) {gid : (String, String) => gid._1 != gid._2 } .project ('gid_a, 'gid_b) .map(('gid_a, 'gid_b) -> ('gid_a, 'gid_b)) {gid : (Integer, Integer) =>(max(gid._1, gid._2), min(gid._1, gid._2))} val count = groups.groupBy( ('gid_a, 'gid_b) ){ _.size } if (count==0) attempt = 20 val new_groups = groups.groupBy('gid_a) {_.min('gid_b)}.rename(('source, 'target)) val new_vertexes = vertexes.joinWithSmaller('id -> 'source, groups, joiner = new LeftJoin ) .mapTo( ('id,'gid,'source,'target)->('id, 'gid)) { param:(Integer, Integer, Integer, Integer) => val (id, gid, source,target) = param if (target != null) ( id , min( gid, target ) ) else ( id, gid ) } new_vertexes.write( Tsv( args("vertexes") ) ) } }
  • 27. Other nice things •Typed pipes •Elegant and fast Matrix operations •Simple migration on Spark/Kafka •Way to retrieve data from hive’s hcatalog
  • 28. Useful Resources •http://www.adopsinsider.com/ad-serving/how-does-ad-serving- work/ •http://www.adopsinsider.com/ad-serving/diagramming-the-ssp- dsp-and-rtb-redirect-path/ •https://github.com/twitter/scalding •https://github.com/ParallelAI/SpyGlass •https://github.com/branky/cascading.hive