From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets with Avi Aminov

4 likes•1,241 views

The document discusses methods for detecting botnets using data analysis techniques, including behavioral pattern analysis and domain blacklisting. It highlights the challenges of model training and implementation in production environments, proposing solutions such as using Scala/Spark pipelines and the predictive model markup language (PMML) for efficient model export. Key lessons learned emphasize the importance of adapting workflows based on scale and utilizing existing frameworks for model management.

Data & Analytics

More Related Content

PDF

Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Databricks

PDF

Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng ShiDatabricks

PDF

SSR: Structured Streaming for R and Machine Learningfelixcss

PPTX

Spark r under the hood with Hossein FalakiDatabricks

PDF

Huawei Advanced Data Science With Spark StreamingJen Aman

PDF

Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit

PDF

Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Co...Databricks

PDF

Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyDatabricks

Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Databricks

Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng ShiDatabricks

SSR: Structured Streaming for R and Machine Learningfelixcss

Spark r under the hood with Hossein FalakiDatabricks

Huawei Advanced Data Science With Spark StreamingJen Aman

Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit

Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Co...Databricks

Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyDatabricks

What's hot (20)

PDF

Spark Summit EU talk by Jakub HavaSpark Summit

PDF

A Journey into Databricks' Pipelines: Journey and Lessons LearnedDatabricks

PDF

Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks

PDF

Random Walks on Large Scale Graphs with Apache Spark with Min ShenDatabricks

PDF

Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...Databricks

PDF

Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit

PDF

Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...Spark Summit

PDF

Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangDatabricks

PDF

Resource-Efficient Deep Learning Model Selection on Apache SparkDatabricks

PDF

Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuDatabricks

PDF

Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...Spark Summit

PDF

Apache Spark Performance is too hard. Let's make it easierDatabricks

PDF

Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkDatabricks

PDF

Spark Summit EU talk by Heiko KorndorfSpark Summit

PDF

Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman

PDF

Overview of Apache Spark 2.3: What’s New? with Sameer AgarwalDatabricks

PDF

Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit

PDF

Spark Summit EU talk by Berni SchieferSpark Summit

PPTX

Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit

PDF

Spark Streaming and MLlib - Hyderabad Spark GroupPhaneendra Chiruvella