In machine learning, training large models on a massive amount of data usually improves results. Our customers report, however, that training such models and deploying them is either operationally prohibitive or outright impossible for them. We created a collection of machine learning algorithms that scale to any amount of data, including k-means clustering for data segmentation, factorization machines for recommendations, time-series forecasting, linear regression, topic modeling, and image classification. This talk will discuss those algorithms, understand where and how they can be used.
Machine learning and linear regression programmingSoumya Mukherjee
Overview of AI and ML
Terminology awareness
Applications in real world
Use cases within Nokia
Types of Learning
Regression
Classification
Clustering
Linear Regression Single Variable with python
Leveraging Machine Learning or IA in order to detect Credit Card Fraud and suspicious transations. The aim of this presentation is to help you to improve your knowledge in Machnie Learning and to start development of multiple families of algorithms in Python.
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...cscpconf
EM algorithm is popular in maximum likelihood estimation of parameters for state-space models. However, extant approaches for the realization of EM algorithm are still not able to fulfill the task of identification systems, which have external inputs and constrained parameters. In this paper, we propose new approaches for both initial guessing and MLE of the parameters of a constrained state-space model with an external input. Using weighted least square for the initial guess and the partial differentiation of the joint log-likelihood function for the EM algorithm, we estimate the parameters and compare the estimated values with the “actual” values, which are set to generate simulation data. Moreover, asymptotic variances of the estimated parameters are calculated when the sample size is large, while statistics of the estimated parameters are obtained through bootstrapping when the sample size issmall. The results demonstrate that the estimated values are close to the “actual” values.Consequently, our approaches are promising and can applied in future research.
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
The document summarizes the top 10 machine learning algorithms for machine learning newbies. It discusses linear regression, logistic regression, linear discriminant analysis, classification and regression trees, naive bayes, k-nearest neighbors, and learning vector quantization. For each algorithm, it provides a brief overview of the model representation and how predictions are made. The document emphasizes that no single algorithm is best and recommends trying multiple algorithms to find the best one for the given problem and dataset.
This document provides an introduction to computer simulation. It discusses how simulation can be used to model real systems on a computer in order to understand system behavior and evaluate alternatives. It describes different types of models including iconic, symbolic, deterministic, stochastic, static, dynamic, continuous and discrete models. Monte Carlo simulation is introduced as a technique that uses random numbers. The document outlines the steps in a simulation study and provides examples of systems and their components that can be modeled using simulation.
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)Anmol Dwivedi
This mini-project will consider performing inference and learning in Linear Chain CRFs. In particular, it will consider an application to hand-written word recognition. Handwritten word recognition is a task many have explored with different methods of machine learning. Some written characters can be evaluated individually or as a whole word to account for the context in characters. In this mini-project, we use linear chain CRF models to account for context between the characters of a word to improve word recognition accuracy.
Tutorial on Markov Random Fields (MRFs) for Computer Vision ApplicationsAnmol Dwivedi
The goal of this mini-project is to implement a pairwise binary label-observation Markov Random Field
model for bi-level image segmentation. Specifically, two inference algorithms, i.e., the Iterative
Conditional Mode (ICM) and Gibbs sampling methods will be implemented to perform image segmentation.
This document discusses various machine learning concepts related to data processing, feature selection, dimensionality reduction, feature encoding, feature engineering, dataset construction, and model tuning. It covers techniques like principal component analysis, singular value decomposition, correlation, covariance, label encoding, one-hot encoding, normalization, discretization, imputation, and more. It also discusses different machine learning algorithm types, categories, representations, libraries and frameworks for model tuning.
This document provides an overview of supervised and unsupervised learning, with a focus on clustering as an unsupervised learning technique. It describes the basic concepts of clustering, including how clustering groups similar data points together without labeled categories. It then covers two main clustering algorithms - k-means, a partitional clustering method, and hierarchical clustering. It discusses aspects like cluster representation, distance functions, strengths and weaknesses of different approaches. The document aims to introduce clustering and compare it with supervised learning.
1) The document discusses various methods for interpreting machine learning models, including global and local surrogate models, feature importance plots, Shapley values, partial dependence plots, and individual conditional expectation plots.
2) It explains that interpretability refers to how understandable the reasons for a model's predictions are to humans. Interpretability methods can provide global explanations of entire models or local explanations of individual predictions.
3) The document advocates that improving interpretability is important for addressing issues like bias in machine learning systems and increasing trust in applications used for high-stakes decisions like criminal justice.
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...ijaia
A support vector machine (SVM) learns the decision surface from two different classes of the input points, there are misclassifications in some of the input points in several applications. In this paper a bi-objective quadratic programming model is utilized and different feature quality measures are optimized simultaneously using the weighting method for solving our bi-objective quadratic programming problem. An important contribution will be added for the proposed bi-objective quadratic programming model by getting different efficient support vectors due to changing the weighting values. The numerical examples, give evidence of the effectiveness of the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions.
The document discusses sequence diagrams and their use in system analysis and design. Sequence diagrams show the interactions between objects in a system and the order that messages and method calls are made. They can incorporate elements like alternate paths using frames to represent conditional logic. While not required, sequence diagrams are useful for testing a system design by validating interactions and method accessibility between classes.
The document discusses various matrix and tensor tools for computer vision, including principal component analysis (PCA), singular value decomposition (SVD), robust PCA, low-rank representation, non-negative matrix factorization, tensor decompositions, and incremental methods for SVD and tensor learning. It provides definitions and explanations of the techniques along with references for further information.
The document discusses template matching techniques for image analysis. It describes intensity-based template matching using metrics like sum of squared differences and normalized correlation to measure similarity between template and image intensities. Feature-based template matching is also covered, using distance transforms of binary edge images and metrics like chamfer distance and hausdorff distance. The document proposes using edge orientation information and spatial coherence of matches to make template matching more robust. It suggests hierarchical search techniques to efficiently prune the search space for the optimal template position.
IRJET - Application of Linear Algebra in Machine LearningIRJET Journal
This document discusses the application of linear algebra concepts in machine learning. It begins with an introduction to linear algebra and key concepts like vectors, matrices, and linear transformations. It then provides an introduction to machine learning, including the different types of machine learning algorithms like supervised, unsupervised, and reinforcement learning. It discusses how machine learning is closely related to statistics and introduces some common statistical concepts. Finally, it discusses how linear algebra is widely used in machine learning algorithms like linear regression and support vector machines. Linear algebra allows machine learning models to represent data and map it to specific feature spaces.
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchJisang Yoon
1) AutoML-Zero is a framework that uses an evolutionary algorithm to evolve machine learning algorithms from basic mathematical operations, with minimal human constraints on the search space.
2) Experiments showed AutoML-Zero could find simple neural networks like linear and nonlinear regression models in difficult search spaces, outperforming random search.
3) When applied to image classification tasks on MNIST and CIFAR-10, the discovered algorithms achieved performance on par or better than standard models like logistic regression and multilayer perceptrons, trained with minimal human input.
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionIJECEIAES
Branch-and-Bound algorithm is the basis for the majority of solving methods in mixed integer linear programming. It has been proving its efficiency in different fields. In fact, it creates little by little a tree of nodes by adopting two strategies. These strategies are variable selection strategy and node selection strategy. In our previous work, we experienced a methodology of learning branch-and-bound strategies using regression-based support vector machine twice. That methodology allowed firstly to exploit information from previous executions of Branch-and-Bound algorithm on other instances. Secondly, it created information channel between node selection strategy and variable branching strategy. And thirdly, it gave good results in term of running time comparing to standard Branch-and-Bound algorithm. In this work, we will focus on increasing SVM performance by using cross validation coupled with model selection.
Surrogate modeling for industrial designShinwoo Jang
We describe GTApprox | a new tool for medium-scale surrogate modeling in industrial design. Compared to existing software, GTApprox brings several innovations: a few novel approximation algorithms, several advanced methods of automated model selection, novel options in the form of hints. We demonstrate the efficiency of GTApprox on a large collection of test problems. In addition, we describe several applications of GTApprox to real engineering problems.
It covers all the basics of MATLAB required for beginners. After going through these slides, anyone can write a MATLAB program and apply it to his field of interest.
This document discusses various applications of interpolation in computer science and engineering. It describes interpolation as a method of constructing new data points within the range of a known discrete data set. Some examples of interpolation applications mentioned include estimating population values, image processing through transformations like resizing and rotation, zooming digital images using different interpolation functions, and ray tracing in computer graphics. Numerical integration techniques like the trapezoidal rule, Simpson's rule, and Romberg's method are also briefly covered.
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...ijcsit
Enterprise financial distress or failure includes bankruptcy prediction, financial distress, corporate performance prediction and credit risk estimation. The aim of this paper is that using wavelet networks innon-linear combination prediction to solve ARMA (Auto-Regressive and Moving Average) model problem.ARMA model need estimate the value of all parameters in the model, it has a large amount of computation.Under this aim, the paper provides an extensive review of Wavelet networks and Logistic regression. Itdiscussed the Wavelet neural network structure, Wavelet network model training algorithm, Accuracy rateand error rate (accuracy of classification, Type I error, and Type II error). The main research opportunity exist a proposed of business failure prediction model (wavelet network model and logistic regression
model). The empirical research which is comparison of Wavelet Network and Logistic Regression on training and forecasting sample, the result shows that this wavelet network model is high accurate and the overall prediction accuracy, Type Ⅰerror and Type Ⅱ error, wavelet networks model is better thanlogistic regression model.
This document describes an experiment using a least squares method to identify the dynamic model of a level control system in a didactic plant. The plant uses a Foundation Fieldbus communication protocol. The experiment applies a PRBS signal to excite the system and records the input and output signals. It then uses a non-recursive least squares estimator to identify the system and approximate its behavior with a second order transfer function. The results showed that the identification technique was able to accurately model the dynamic response of the level control loop.
The document discusses recommendation systems and machine learning models for recommendations. It covers the goals of recommendation systems, basic models including collaborative filtering, content-based, and knowledge-based systems. Neighborhood-based collaborative filtering is explained along with matrix factorization models. Deep learning methods for recommendations are also summarized, including neural collaborative filtering, graph-based models, and temporal models that handle dynamic graphs.
Incremental and Multi-feature Tensor Subspace Learning applied for Background...Andrews Cordolino Sobral
ICIAR'14 - International Conference on Image Analysis and Recognition. Incremental and Multi-feature Tensor Subspace Learning applied for Background Modeling and Subtraction.
The Factorization Machines algorithm for building recommendation system - Paw...Evention
One of successful examples of data science applications in the Big Data domain are recommendation systems. The goal of my talk is to present the Factorization Machines algorithm, available in the SAS Viya platform.
The Factorization Machines is a good choice for making predictions and recommendations based on large sparse data, in particular specific for the Big Data. In practical part of the presentation, a low level granularity data from the NBA league will be used to build an application recommending optimal game strategies as well as predicting results of league games.
This document discusses various machine learning concepts related to data processing, feature selection, dimensionality reduction, feature encoding, feature engineering, dataset construction, and model tuning. It covers techniques like principal component analysis, singular value decomposition, correlation, covariance, label encoding, one-hot encoding, normalization, discretization, imputation, and more. It also discusses different machine learning algorithm types, categories, representations, libraries and frameworks for model tuning.
This document provides an overview of supervised and unsupervised learning, with a focus on clustering as an unsupervised learning technique. It describes the basic concepts of clustering, including how clustering groups similar data points together without labeled categories. It then covers two main clustering algorithms - k-means, a partitional clustering method, and hierarchical clustering. It discusses aspects like cluster representation, distance functions, strengths and weaknesses of different approaches. The document aims to introduce clustering and compare it with supervised learning.
1) The document discusses various methods for interpreting machine learning models, including global and local surrogate models, feature importance plots, Shapley values, partial dependence plots, and individual conditional expectation plots.
2) It explains that interpretability refers to how understandable the reasons for a model's predictions are to humans. Interpretability methods can provide global explanations of entire models or local explanations of individual predictions.
3) The document advocates that improving interpretability is important for addressing issues like bias in machine learning systems and increasing trust in applications used for high-stakes decisions like criminal justice.
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...ijaia
A support vector machine (SVM) learns the decision surface from two different classes of the input points, there are misclassifications in some of the input points in several applications. In this paper a bi-objective quadratic programming model is utilized and different feature quality measures are optimized simultaneously using the weighting method for solving our bi-objective quadratic programming problem. An important contribution will be added for the proposed bi-objective quadratic programming model by getting different efficient support vectors due to changing the weighting values. The numerical examples, give evidence of the effectiveness of the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions.
The document discusses sequence diagrams and their use in system analysis and design. Sequence diagrams show the interactions between objects in a system and the order that messages and method calls are made. They can incorporate elements like alternate paths using frames to represent conditional logic. While not required, sequence diagrams are useful for testing a system design by validating interactions and method accessibility between classes.
The document discusses various matrix and tensor tools for computer vision, including principal component analysis (PCA), singular value decomposition (SVD), robust PCA, low-rank representation, non-negative matrix factorization, tensor decompositions, and incremental methods for SVD and tensor learning. It provides definitions and explanations of the techniques along with references for further information.
The document discusses template matching techniques for image analysis. It describes intensity-based template matching using metrics like sum of squared differences and normalized correlation to measure similarity between template and image intensities. Feature-based template matching is also covered, using distance transforms of binary edge images and metrics like chamfer distance and hausdorff distance. The document proposes using edge orientation information and spatial coherence of matches to make template matching more robust. It suggests hierarchical search techniques to efficiently prune the search space for the optimal template position.
IRJET - Application of Linear Algebra in Machine LearningIRJET Journal
This document discusses the application of linear algebra concepts in machine learning. It begins with an introduction to linear algebra and key concepts like vectors, matrices, and linear transformations. It then provides an introduction to machine learning, including the different types of machine learning algorithms like supervised, unsupervised, and reinforcement learning. It discusses how machine learning is closely related to statistics and introduces some common statistical concepts. Finally, it discusses how linear algebra is widely used in machine learning algorithms like linear regression and support vector machines. Linear algebra allows machine learning models to represent data and map it to specific feature spaces.
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchJisang Yoon
1) AutoML-Zero is a framework that uses an evolutionary algorithm to evolve machine learning algorithms from basic mathematical operations, with minimal human constraints on the search space.
2) Experiments showed AutoML-Zero could find simple neural networks like linear and nonlinear regression models in difficult search spaces, outperforming random search.
3) When applied to image classification tasks on MNIST and CIFAR-10, the discovered algorithms achieved performance on par or better than standard models like logistic regression and multilayer perceptrons, trained with minimal human input.
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionIJECEIAES
Branch-and-Bound algorithm is the basis for the majority of solving methods in mixed integer linear programming. It has been proving its efficiency in different fields. In fact, it creates little by little a tree of nodes by adopting two strategies. These strategies are variable selection strategy and node selection strategy. In our previous work, we experienced a methodology of learning branch-and-bound strategies using regression-based support vector machine twice. That methodology allowed firstly to exploit information from previous executions of Branch-and-Bound algorithm on other instances. Secondly, it created information channel between node selection strategy and variable branching strategy. And thirdly, it gave good results in term of running time comparing to standard Branch-and-Bound algorithm. In this work, we will focus on increasing SVM performance by using cross validation coupled with model selection.
Surrogate modeling for industrial designShinwoo Jang
We describe GTApprox | a new tool for medium-scale surrogate modeling in industrial design. Compared to existing software, GTApprox brings several innovations: a few novel approximation algorithms, several advanced methods of automated model selection, novel options in the form of hints. We demonstrate the efficiency of GTApprox on a large collection of test problems. In addition, we describe several applications of GTApprox to real engineering problems.
It covers all the basics of MATLAB required for beginners. After going through these slides, anyone can write a MATLAB program and apply it to his field of interest.
This document discusses various applications of interpolation in computer science and engineering. It describes interpolation as a method of constructing new data points within the range of a known discrete data set. Some examples of interpolation applications mentioned include estimating population values, image processing through transformations like resizing and rotation, zooming digital images using different interpolation functions, and ray tracing in computer graphics. Numerical integration techniques like the trapezoidal rule, Simpson's rule, and Romberg's method are also briefly covered.
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...ijcsit
Enterprise financial distress or failure includes bankruptcy prediction, financial distress, corporate performance prediction and credit risk estimation. The aim of this paper is that using wavelet networks innon-linear combination prediction to solve ARMA (Auto-Regressive and Moving Average) model problem.ARMA model need estimate the value of all parameters in the model, it has a large amount of computation.Under this aim, the paper provides an extensive review of Wavelet networks and Logistic regression. Itdiscussed the Wavelet neural network structure, Wavelet network model training algorithm, Accuracy rateand error rate (accuracy of classification, Type I error, and Type II error). The main research opportunity exist a proposed of business failure prediction model (wavelet network model and logistic regression
model). The empirical research which is comparison of Wavelet Network and Logistic Regression on training and forecasting sample, the result shows that this wavelet network model is high accurate and the overall prediction accuracy, Type Ⅰerror and Type Ⅱ error, wavelet networks model is better thanlogistic regression model.
This document describes an experiment using a least squares method to identify the dynamic model of a level control system in a didactic plant. The plant uses a Foundation Fieldbus communication protocol. The experiment applies a PRBS signal to excite the system and records the input and output signals. It then uses a non-recursive least squares estimator to identify the system and approximate its behavior with a second order transfer function. The results showed that the identification technique was able to accurately model the dynamic response of the level control loop.
The document discusses recommendation systems and machine learning models for recommendations. It covers the goals of recommendation systems, basic models including collaborative filtering, content-based, and knowledge-based systems. Neighborhood-based collaborative filtering is explained along with matrix factorization models. Deep learning methods for recommendations are also summarized, including neural collaborative filtering, graph-based models, and temporal models that handle dynamic graphs.
Incremental and Multi-feature Tensor Subspace Learning applied for Background...Andrews Cordolino Sobral
ICIAR'14 - International Conference on Image Analysis and Recognition. Incremental and Multi-feature Tensor Subspace Learning applied for Background Modeling and Subtraction.
The Factorization Machines algorithm for building recommendation system - Paw...Evention
One of successful examples of data science applications in the Big Data domain are recommendation systems. The goal of my talk is to present the Factorization Machines algorithm, available in the SAS Viya platform.
The Factorization Machines is a good choice for making predictions and recommendations based on large sparse data, in particular specific for the Big Data. In practical part of the presentation, a low level granularity data from the NBA league will be used to build an application recommending optimal game strategies as well as predicting results of league games.
This document introduces Factorization Machines, a general model that can mimic many successful factorization models. Factorization Machines allow feature vectors to be easily input and enjoy benefits of factorizing interactions between variables. The model has properties like expressiveness, multi-linearity, and scalable complexity. It relates to models like matrix factorization, tensor factorization, SVD++, and nearest neighbor models. Experiments show Factorization Machines outperform other models on rating prediction, context-aware recommendation, and tag recommendation tasks.
This document summarizes a presentation on Factorization Machines and Neural Factorization Machines. It begins with an overview of Factorization Machines, describing them as a generic approach that can mimic many factorization models through feature engineering. It then discusses how FM combines the generality of feature engineering with the power of factorization models to model interactions between categorical variables, working well on sparse data. The document then introduces Neural Factorization Machines as an extension of FM to address its limitations, using a multi-layer feedforward neural network as the core component. It concludes by comparing FM and NFM and listing references.
What are algorithms? How can I build a machine learning model? In machine learning, training large models on a massive amount of data usually improves results. Our customers report, however, that training such models and deploying them is either operationally prohibitive or outright impossible for them. At Amazon, we created a collection of machine learning algorithms that scale to any amount of data, including k-means clustering for data segmentation, factorisation machines for recommendations, and time-series forecasting. This talk will discuss those algorithms, understand where and how they can be used, and our design choices.
Introduction to Factorization Machines model with an example. Motivations - why you should have it in your toolbox, model and it expressiveness, use case for context-aware recommendations and Field-Aware Factorization Machines.
Simple representations for learning: factorizations and similarities Gael Varoquaux
Real-life data seldom comes in the ideal form for statistical learning.
This talk focuses on high-dimensional problems for signals and
discrete entities: when dealing with many, correlated, signals or
entities, it is useful to extract representations that capture these
correlations.
Matrix factorization models provide simple but powerful representations. They are used for recommender systems across discrete entities such as users and products, or to learn good dictionaries to represent images. However they entail large computing costs on very high-dimensional data, databases with many products or high-resolution images. I will present an
algorithm to factorize huge matrices based on stochastic subsampling that gives up to 10-fold speed-ups [1].
With discrete entities, the explosion of dimensionality may be due to variations in how a smaller number of categories are represented. Such a problem of "dirty categories" is typical of uncurated data sources. I will discuss how encoding this data based on similarities recovers a useful category structure with no preprocessing. I will show how it interpolates between one-hot encoding and techniques used in character-level natural language processing.
[1] Stochastic subsampling for factorizing huge matrices, A Mensch, J Mairal, B Thirion, G Varoquaux, IEEE Transactions on Signal Processing 66 (1), 113-128
[2] Similarity encoding for learning with dirty categorical variables. P Cerda, G Varoquaux, B Kégl Machine Learning (2018): 1-18
The document discusses machine learning concepts including supervised and unsupervised learning algorithms like clustering, dimensionality reduction, and classification. It also covers parallel computing strategies for machine learning like partitioning problems across distributed memory architectures.
The document discusses machine learning concepts including supervised and unsupervised learning algorithms like clustering, dimensionality reduction, and classification. It also covers parallel computing strategies for machine learning like partitioning problems across distributed memory systems.
The document discusses machine learning concepts including supervised and unsupervised learning algorithms like clustering, dimensionality reduction, and classification. It also covers parallel computing strategies for machine learning like partitioning problems across distributed memory architectures.
The document discusses machine learning concepts including supervised and unsupervised learning algorithms like clustering, dimensionality reduction, and classification. It also covers parallel computing strategies for machine learning like partitioning problems across distributed memory systems.
The document discusses machine learning concepts including supervised and unsupervised learning algorithms like clustering, dimensionality reduction, and classification. It also covers parallel computing strategies for machine learning like partitioning problems across distributed memory systems.
The document discusses machine learning concepts including supervised and unsupervised learning algorithms like clustering, dimensionality reduction, and classification. It also covers parallel computing strategies for machine learning like partitioning problems across systems.
The document discusses machine learning concepts including supervised and unsupervised learning algorithms like clustering, dimensionality reduction, and classification. It also covers parallel computing strategies for machine learning like partitioning problems across distributed memory architectures.
This document summarizes support vector machines (SVMs), a machine learning technique for classification and regression. SVMs find the optimal separating hyperplane that maximizes the margin between positive and negative examples in the training data. This is achieved by solving a convex optimization problem that minimizes a quadratic function under linear constraints. SVMs can perform non-linear classification by implicitly mapping inputs into a higher-dimensional feature space using kernel functions. They have applications in areas like text categorization due to their ability to handle high-dimensional sparse data.
Steffen Rendle, Research Scientist, Google at MLconf SFMLconf
Abstract:
Developing accurate recommender systems for a specific problem setting seems to be a complicated and time-consuming task: models have to be defined, learning algorithms derived and implementations written. In this talk, I present the factorization machine (FM) model which is a generic factorization approach that allows to be adapted to problems by feature engineering. Efficient FM learning algorithms are discussed among them SGD, ALS/CD and MCMC inference including automatic hyperparameter selection. I will show on several tasks, including the Netflix prize and KDDCup 2012, that FMs are flexible and generate highly competitive accuracy. With FMs these results can be achieved by simple data preprocessing and without any tuning of regularization parameters or learning rates.
Steffen Rendle, Research Scientist, Google at MLconf SFMLconf
Title: Factorization Machines
Abstract:
Developing accurate recommender systems for a specific problem setting seems to be a complicated and time-consuming task: models have to be defined, learning algorithms derived and implementations written. In this talk, I present the factorization machine (FM) model which is a generic factorization approach that allows to be adapted to problems by feature engineering. Efficient FM learning algorithms are discussed among them SGD, ALS/CD and MCMC inference including automatic hyperparameter selection. I will show on several tasks, including the Netflix prize and KDDCup 2012, that FMs are flexible and generate highly competitive accuracy. With FMs these results can be achieved by simple data preprocessing and without any tuning of regularization parameters or learning rates.
The document discusses several techniques for collaborative filtering and recommendation systems including matrix factorization, convolutional matrix factorization (ConvMF), factorization machines, Bayesian probabilistic matrix factorization (BPMF), and Bayesian personalized ranking (BPR). Matrix factorization decomposes user-item matrices into latent factor vectors to make predictions. ConvMF extends MF by applying a convolutional neural network to model document context. Factorization machines and BPR are techniques for implicit feedback modeling and ranking. BPMF applies Bayesian inference to MF with Markov chain Monte Carlo sampling.
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
Increased complexity makes it very hard and time-consuming to keep your software bug-free and secure. We introduce fuzz-testing as a method for automatically and continuously discovering vulnerabilities hidden in your code. The talk will explain how fuzzing works and how to integrate fuzz-testing into your Software Development Life Cycle to increase your code’s security.
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
It was 1993 when we decided to venture in a beat'em up game for Amiga. The Catalypse's success story pushed me and my comrade to create something astonishing for this incredible game machine... but things went harder, assumptions were slightly different, and italian competitors appeared out of nowhere... the project died in 1996. Story ended? Probably not...
Il Commodore 65 è un prototipo di personal computer che Commodore avrebbe dovuto mettere in commercio quale successore del Commodore 64. Purtroppo la sua realizzazione si fermò appunto allo stadio prototipale. Racconterò l'affascinante storia del suo sviluppo ed il perchè della soppressione del progetto ormai ad un passo dalla immissione in commercio.
Rivivere l'ebbrezza di progettare un vecchio computer o una consolle da bar è oggi possibile sfruttando le FPGA, ovvero logiche programmabili che consentono a chiunque di progettare il proprio hardware o di ricrearne uno del passato. In questa sessione si racconta come dal reverse engineering dell'hardware di vecchie glorie come il Commodore 64 e lo ZX Spectrum sia stato possibile farle rivivere attraverso tecnologie oggi alla portata di tutti.
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
There's a lot of talk about blockchain, but how does the technology behind it actually work? For developers, getting some hands-on experience is the fastest way to get familiair with new technologies. So let's build a blockchain, then! In this session, we're going to build one in plain old Java, and have it working in 40 minutes. We'll cover key concepts of a blockchain: transactions, blocks, mining, proof-of-work, and reaching consensus in the blockchain network. After this session, you'll have a better understanding of core aspects of blockchain technology.
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
When was the last time you were truly lost? Thanks to the maps and location technology in our phones, a whole generation has now grown up in a world where getting lost is truly a thing of the past. Location technology goes far beyond maps in the palm of our hand, however. In this talk, we will explore how a ridesharing app works. How do we discover our destination?How do we find the closest driver? How do we display this information on a map? How do we find the best route?To answer these questions,we will be learning about a variety of location APIs, including Maps, Positioning, Geocoding etc.
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
Eward Driehuis, SecureLink's research chief, will guide you through the bumpy ride we call the cyber threat landscape. As the industry has over a decade of experience of dealing with increasingly sophisticated attacks, you might be surprised to hear more attacks slip through the cracks than ever. From analyzing 20.000 of them in 2018, backed by a quarter of a million security events and over ten trillion data points, Eward will outline why this happens, how attacks are changing, and why it doesn't matter how neatly or securely you code.
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
IoT revolution is ended. Thanks to hardware improvement, building an intelligent ecosystem is easier than never before for both startups and large-scale enterprises. The real challenge is now to connect, process, store and analyze data: in the cloud, but also, at the edge. We’ll give a quick look on frameworks that aggregate dispersed devices data into a single global optimized system allowing to improve operational efficiency, to predict maintenance, to track asset in real-time, to secure cloud-connected devices and much more.
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
What if Virtual Reality glasses could transform your environment into a three-dimensional work of art in realtime in the style of a painting from Van Gogh? One of the many interesting developments in the field of Deep Learning is the so called "Style Transfer". It describes a possibility to create a patchwork (or pastiche) from two images. While one of these images defines the the artistic style of the result picture, the other one is used for extracting the image content. A team from TNG Technology Consulting managed to build an AI showcase using OpenCV and Tensorflow to realize such goggles.
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
The document summarizes some of the security issues with blockchain technology. It discusses how blockchain is not a "silver bullet" and does not inherently solve problems like privacy and security of smart devices. It outlines various application security issues with complex code, protocols, and difficulty of updates on blockchains. Concerns over data immutability and security of smart contracts are also covered. The document questions whether blockchain truly provides the level of decentralization and anonymity claimed, and outlines some impossibility results and limitations of existing approaches to achieving security and privacy in blockchain systems.
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
The document provides an overview of the HTTP network protocol in its early stages of development. It summarizes the initial IMP (Interface Message Processor) software used to establish connections and transmit messages over the ARPANET. It outlines some early requirements for host-to-host software to enable simple and advanced use between computer systems. The document also describes the initial host software specifications, including establishing connections, transmitting data efficiently, and implementing error checking between connected systems. This was one of the first documents to define core aspects of the early HTTP network protocol to enable information exchange over the fledgling internet.
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
Performance tests are not only an important instrument for understanding a system and its runtime environment. It is also essential in order to check stability and scalability – non-functional requirements that might be decisive for success. But won't my cloud hosting service scale for me as long as I can afford it? Yes, but… It only operates and scales resources. It won't automatically make your system fast, stable and scalable. This talk shows how such and comparable questions can be clarified with performance tests and how DevOps teams benefit from regular test practise.
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
Sascha will demonstrate the opportunities and challenges of Conversational AI learned from the practice. Both Technology and User Experience will be covered introducing a process finding micro-moments, writing happy paths, gathering intents, designing the conversational flow, and finally publishing on almost all channels including Voice Services and Chatbots. Valuable for enterprises, developers, and designers. All live on stage in just minutes and with almost no code.
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
A key challenge we face at Pacmed is quickly calibrating and deploying our tools for clinical decision support in different hospitals, where data formats may vary greatly. Using Intensive Care Units as a case study, I’ll delve into our scalable Python pipeline, which leverages Pandas’ split-apply-combine approach to perform complex feature engineering and automatic quality checks on large time-varying data, e.g. vital signs. I’ll show how we use the resulting flexible and interpretable dataframes to quickly (re)train our models to predict mortality, discharge, and medical complications.
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
Coolblue is a proud Dutch company, with a large internal development department; one that truly takes CI/CD to heart. Empowerment through automation is at the heart of these development teams, and with more than 1000 deployments a day, we think it's working out quite well. In this session, Pat Hermens (a Development Managers) will step you through what enables us to move so quickly, which tools we use, and most importantly, the mindset that is required to enable development teams to deliver at such a rapid pace.
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
Quantum computers can use all of the possible pathways generated by quantum decisions to solve problems that will forever remain intractable to classical compute power. As the mega players vie for quantum supremacy and Rigetti announces its $1M "quantum advantage" prize, we live in exciting times. IBM-Q and Microsoft Q# are two ways you can learn to program quantum computers so that you're ready when the quantum revolution comes. I'll demonstrate some quantum solutions to problems that will forever be out of reach of classical, including organic chemistry and large number factorisation.
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
Chinese food exploded across America in the early 20th century, rapidly adapting to local tastes while also spreading like wildfire. How was it able to spread so fast? The GY6 is a family of scooter engines that has achieved near total ubiquity in Europe. It is reliable and cheap to manufacture, and it's made in factories across China. How are these factories able to remain afloat? Chinese-American food and the GY6 are both riveting studies in product-market fit, and both are the product of a distributed open source-like development model. What lessons can we learn for open source software?
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
The design space has exploded in size within the last few years and Sketch is one of the most important milestones to represent the phenomenon. But behind the scenes of this growing reality there is a remote team that revolutionizes the design space all without leaving the home office. This talk will present how Sketch has grown to become a modern, product designer's tool.
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
Would you fly in a plane designed by a craftsman or would you prefer your aircraft to be designed by engineers? We are learning that science and empiricism works in software development, maybe now is the time to redefine what “Software Engineering” really means. Software isn't bridge-building, it is not car or aircraft development either, but then neither is Chemical Engineering. Engineering is different in different disciplines. Maybe it is time for us to begin thinking about retrieving the term "Software Engineering" maybe it is time to define what our "Engineering" discipline should be.
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
What is the job of a CTO and how does it change as a startup grows in size and scale? As a CTO, where should you spend your focus? As an engineer aspiring to be a CTO, what skills should you pursue? In this inspiring and personal talk, I describe my journey from early Red Hat engineer to CTO at Bloomon. I will share my view on what it means to be a CTO, and ultimately answer the question: Should the CTO be coding?
Adtran’s new Ensemble Cloudlet vRouter solution gives service providers a smarter way to replace aging edge routers. With virtual routing, cloud-hosted management and optional design services, the platform makes it easy to deliver high-performance Layer 3 services at lower cost. Discover how this turnkey, subscription-based solution accelerates deployment, supports hosted VNFs and helps boost enterprise ARPU.
AI in Java - MCP in Action, Langchain4J-CDI, SmallRye-LLM, Spring AIBuhake Sindi
This is the presentation I gave with regards to AI in Java, and the work that I have been working on. I've showcased Model Context Protocol (MCP) in Java, creating server-side MCP server in Java. I've also introduced Langchain4J-CDI, previously known as SmallRye-LLM, a CDI managed too to inject AI services in enterprise Java applications. Also, honourable mention: Spring AI.
Measuring Microsoft 365 Copilot and Gen AI SuccessNikki Chapple
Session | Measuring Microsoft 365 Copilot and Gen AI Success with Viva Insights and Purview
Presenter | Nikki Chapple 2 x MVP and Principal Cloud Architect at CloudWay
Event | European Collaboration Conference 2025
Format | In person Germany
Date | 28 May 2025
📊 Measuring Copilot and Gen AI Success with Viva Insights and Purview
Presented by Nikki Chapple – Microsoft 365 MVP & Principal Cloud Architect, CloudWay
How do you measure the success—and manage the risks—of Microsoft 365 Copilot and Generative AI (Gen AI)? In this ECS 2025 session, Microsoft MVP and Principal Cloud Architect Nikki Chapple explores how to go beyond basic usage metrics to gain full-spectrum visibility into AI adoption, business impact, user sentiment, and data security.
🎯 Key Topics Covered:
Microsoft 365 Copilot usage and adoption metrics
Viva Insights Copilot Analytics and Dashboard
Microsoft Purview Data Security Posture Management (DSPM) for AI
Measuring AI readiness, impact, and sentiment
Identifying and mitigating risks from third-party Gen AI tools
Shadow IT, oversharing, and compliance risks
Microsoft 365 Admin Center reports and Copilot Readiness
Power BI-based Copilot Business Impact Report (Preview)
📊 Why AI Measurement Matters: Without meaningful measurement, organizations risk operating in the dark—unable to prove ROI, identify friction points, or detect compliance violations. Nikki presents a unified framework combining quantitative metrics, qualitative insights, and risk monitoring to help organizations:
Prove ROI on AI investments
Drive responsible adoption
Protect sensitive data
Ensure compliance and governance
🔍 Tools and Reports Highlighted:
Microsoft 365 Admin Center: Copilot Overview, Usage, Readiness, Agents, Chat, and Adoption Score
Viva Insights Copilot Dashboard: Readiness, Adoption, Impact, Sentiment
Copilot Business Impact Report: Power BI integration for business outcome mapping
Microsoft Purview DSPM for AI: Discover and govern Copilot and third-party Gen AI usage
🔐 Security and Compliance Insights: Learn how to detect unsanctioned Gen AI tools like ChatGPT, Gemini, and Claude, track oversharing, and apply eDLP and Insider Risk Management (IRM) policies. Understand how to use Microsoft Purview—even without E5 Compliance—to monitor Copilot usage and protect sensitive data.
📈 Who Should Watch: This session is ideal for IT leaders, security professionals, compliance officers, and Microsoft 365 admins looking to:
Maximize the value of Microsoft Copilot
Build a secure, measurable AI strategy
Align AI usage with business goals and compliance requirements
🔗 Read the blog https://nikkichapple.com/measuring-copilot-gen-ai/
DePIN = Real-World Infra + Blockchain
DePIN stands for Decentralized Physical Infrastructure Networks.
It connects physical devices to Web3 using token incentives.
How Does It Work?
Individuals contribute to infrastructure like:
Wireless networks (e.g., Helium)
Storage (e.g., Filecoin)
Sensors, compute, and energy
They earn tokens for their participation.
Fully Open-Source Private Clouds: Freedom, Security, and ControlShapeBlue
In this presentation, Swen Brüseke introduced proIO's strategy for 100% open-source driven private clouds. proIO leverage the proven technologies of CloudStack and LINBIT, complemented by professional maintenance contracts, to provide you with a secure, flexible, and high-performance IT infrastructure. He highlighted the advantages of private clouds compared to public cloud offerings and explain why CloudStack is in many cases a superior solution to Proxmox.
--
The CloudStack European User Group 2025 took place on May 8th in Vienna, Austria. The event once again brought together open-source cloud professionals, contributors, developers, and users for a day of deep technical insights, knowledge sharing, and community connection.
European Accessibility Act & Integrated Accessibility TestingJulia Undeutsch
Emma Dawson will guide you through two important topics in this session.
Firstly, she will prepare you for the European Accessibility Act (EAA), which comes into effect on 28 June 2025, and show you how development teams can prepare for it.
In the second part of the webinar, Emma Dawson will explore with you various integrated testing methods and tools that will help you improve accessibility during the development cycle, such as Linters, Storybook, Playwright, just to name a few.
Focus: European Accessibility Act, Integrated Testing tools and methods (e.g. Linters, Storybook, Playwright)
Target audience: Everyone, Developers, Testers
nnual (33 years) study of the Israeli Enterprise / public IT market. Covering sections on Israeli Economy, IT trends 2026-28, several surveys (AI, CDOs, OCIO, CTO, staffing cyber, operations and infra) plus rankings of 760 vendors on 160 markets (market sizes and trends) and comparison of products according to support and market penetration.
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPathCommunity
Join the UiPath Community Berlin (Virtual) meetup on May 27 to discover handy Studio Tips & Tricks and get introduced to UiPath Insights. Learn how to boost your development workflow, improve efficiency, and gain visibility into your automation performance.
📕 Agenda:
- Welcome & Introductions
- UiPath Studio Tips & Tricks for Efficient Development
- Best Practices for Workflow Design
- Introduction to UiPath Insights
- Creating Dashboards & Tracking KPIs (Demo)
- Q&A and Open Discussion
Perfect for developers, analysts, and automation enthusiasts!
This session streamed live on May 27, 18:00 CET.
Check out all our upcoming UiPath Community sessions at:
👉 https://community.uipath.com/events/
Join our UiPath Community Berlin chapter:
👉 https://community.uipath.com/berlin/
Droidal: AI Agents Revolutionizing HealthcareDroidal LLC
Droidal’s AI Agents are transforming healthcare by bringing intelligence, speed, and efficiency to key areas such as Revenue Cycle Management (RCM), clinical operations, and patient engagement. Built specifically for the needs of U.S. hospitals and clinics, Droidal's solutions are designed to improve outcomes and reduce administrative burden.
Through simple visuals and clear examples, the presentation explains how AI Agents can support medical coding, streamline claims processing, manage denials, ensure compliance, and enhance communication between providers and patients. By integrating seamlessly with existing systems, these agents act as digital coworkers that deliver faster reimbursements, reduce errors, and enable teams to focus more on patient care.
Droidal's AI technology is more than just automation — it's a shift toward intelligent healthcare operations that are scalable, secure, and cost-effective. The presentation also offers insights into future developments in AI-driven healthcare, including how continuous learning and agent autonomy will redefine daily workflows.
Whether you're a healthcare administrator, a tech leader, or a provider looking for smarter solutions, this presentation offers a compelling overview of how Droidal’s AI Agents can help your organization achieve operational excellence and better patient outcomes.
A free demo trial is available for those interested in experiencing Droidal’s AI Agents firsthand. Our team will walk you through a live demo tailored to your specific workflows, helping you understand the immediate value and long-term impact of adopting AI in your healthcare environment.
To request a free trial or learn more:
https://droidal.com/
Introducing FME Realize: A New Era of Spatial Computing and ARSafe Software
A new era for the FME Platform has arrived – and it’s taking data into the real world.
Meet FME Realize: marking a new chapter in how organizations connect digital information with the physical environment around them. With the addition of FME Realize, FME has evolved into an All-data, Any-AI Spatial Computing Platform.
FME Realize brings spatial computing, augmented reality (AR), and the full power of FME to mobile teams: making it easy to visualize, interact with, and update data right in the field. From infrastructure management to asset inspections, you can put any data into real-world context, instantly.
Join us to discover how spatial computing, powered by FME, enables digital twins, AI-driven insights, and real-time field interactions: all through an intuitive no-code experience.
In this one-hour webinar, you’ll:
-Explore what FME Realize includes and how it fits into the FME Platform
-Learn how to deliver real-time AR experiences, fast
-See how FME enables live, contextual interactions with enterprise data across systems
-See demos, including ones you can try yourself
-Get tutorials and downloadable resources to help you start right away
Whether you’re exploring spatial computing for the first time or looking to scale AR across your organization, this session will give you the tools and insights to get started with confidence.
Agentic AI - The New Era of IntelligenceMuzammil Shah
This presentation is specifically designed to introduce final-year university students to the foundational principles of Agentic Artificial Intelligence (AI). It aims to provide a clear understanding of how Agentic AI systems function, their key components, and the underlying technologies that empower them. By exploring real-world applications and emerging trends, the session will equip students with essential knowledge to engage with this rapidly evolving area of AI, preparing them for further study or professional work in the field.
"AI in the browser: predicting user actions in real time with TensorflowJS", ...Fwdays
With AI becoming increasingly present in our everyday lives, the latest advancements in the field now make it easier than ever to integrate it into our software projects. In this session, we’ll explore how machine learning models can be embedded directly into front-end applications. We'll walk through practical examples, including running basic models such as linear regression and random forest classifiers, all within the browser environment.
Once we grasp the fundamentals of running ML models on the client side, we’ll dive into real-world use cases for web applications—ranging from real-time data classification and interpolation to object tracking in the browser. We'll also introduce a novel approach: dynamically optimizing web applications by predicting user behavior in real time using a machine learning model. This opens the door to smarter, more adaptive user experiences and can significantly improve both performance and engagement.
In addition to the technical insights, we’ll also touch on best practices, potential challenges, and the tools that make browser-based machine learning development more accessible. Whether you're a developer looking to experiment with ML or someone aiming to bring more intelligence into your web apps, this session will offer practical takeaways and inspiration for your next project.
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AIPeter Spielvogel
Explore how AI in SAP Fiori apps enhances productivity and collaboration. Learn best practices for SAPUI5, Fiori elements, and tools to build enterprise-grade apps efficiently. Discover practical tips to deploy apps quickly, leveraging AI, and bring your questions for a deep dive into innovative solutions.
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...James Anderson
The Quantum Apocalypse: A Looming Threat & The Need for Post-Quantum Encryption
We explore the imminent risks posed by quantum computing to modern encryption standards and the urgent need for post-quantum cryptography (PQC).
Bio: With 30 years in cybersecurity, including as a CISO, Tommy is a strategic leader driving security transformation, risk management, and program maturity. He has led high-performing teams, shaped industry policies, and advised organizations on complex cyber, compliance, and data protection challenges.
Marko.js - Unsung Hero of Scalable Web Frameworks (DevDays 2025)Eugene Fidelin
Marko.js is an open-source JavaScript framework created by eBay back in 2014. It offers super-efficient server-side rendering, making it ideal for big e-commerce sites and other multi-page apps where speed and SEO really matter. After over 10 years of development, Marko has some standout features that make it an interesting choice. In this talk, I’ll dive into these unique features and showcase some of Marko's innovative solutions. You might not use Marko.js at your company, but there’s still a lot you can learn from it to bring to your next project.
Dev Dives: System-to-system integration with UiPath API WorkflowsUiPathCommunity
Join the next Dev Dives webinar on May 29 for a first contact with UiPath API Workflows, a powerful tool purpose-fit for API integration and data manipulation!
This session will guide you through the technical aspects of automating communication between applications, systems and data sources using API workflows.
📕 We'll delve into:
- How this feature delivers API integration as a first-party concept of the UiPath Platform.
- How to design, implement, and debug API workflows to integrate with your existing systems seamlessly and securely.
- How to optimize your API integrations with runtime built for speed and scalability.
This session is ideal for developers looking to solve API integration use cases with the power of the UiPath Platform.
👨🏫 Speakers:
Gunter De Souter, Sr. Director, Product Manager @UiPath
Ramsay Grove, Product Manager @UiPath
This session streamed live on May 29, 2025, 16:00 CET.
Check out all our upcoming UiPath Dev Dives sessions:
👉 https://community.uipath.com/dev-dives-automation-developer-2025/
2. Letter from Ada Lovelace to Charles Babbage 1843
In this letter, Lovelace suggests an example of a calculation
which “may be worked out by the engine without having been
worked out by human head and hands first”.
6. What is an Algorithm?
https://commons.wikimedia.org/wiki/File:Euclid_flowchart.svg
By Somepics (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
A B
12 18
12 6
6 6
6 0
Euclid’s algorithm for the GCD
of two numbers
7. “You use code to tell a computer what to do.
Before you write code you need an algorithm.
An algorithm is a list of rules to follow
in order to solve a problem.”
BBC Bitesize
What is an Algorithm?
https://commons.wikimedia.org/wiki/File:Euclid_flowchart.svg
By Somepics (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
8. The Master Algorithm
“The future belongs to those who
understand at a very deep level how
to combine their unique expertise
with what algorithms do best.”
Pedro Domingos
15. Minimizing the Error
you know the expected values
(use separate datasets for
training and validation)
this is always positive
(convex function)
Supervised
17. Stochastic Gradient Descent (SGD)
https://en.wikipedia.org/wiki/Himmelblau's_function
Global
Vs
Local
Minimum
18. Factorization Machines
• It is an extension of a linear model that is
designed to parsimoniously capture
interactions between features within high
dimensional sparse datasets
• Factorization machines are a good choice for
tasks such as click prediction and item
recommendation
• They are usually trained by stochastic gradient
descent (SGD), alternative least square (ALS),
or Markov chain Monte Carlo (MCMC)
Factorization Machines
Steffen Rendle
Department of Reasoning for Intelligence
The Institute of Scientific and Industrial Research
Osaka University, Japan
[email protected]
Abstract—In this paper, we introduce Factorization Machines
(FM) which are a new model class that combines the advantages
of Support Vector Machines (SVM) with factorization models.
Like SVMs, FMs are a general predictor working with any
real valued feature vector. In contrast to SVMs, FMs model all
interactions between variables using factorized parameters. Thus
they are able to estimate interactions even in problems with huge
sparsity (like recommender systems) where SVMs fail. We show
that the model equation of FMs can be calculated in linear time
and thus FMs can be optimized directly. So unlike nonlinear
SVMs, a transformation in the dual form is not necessary and
the model parameters can be estimated directly without the need
of any support vector in the solution. We show the relationship
to SVMs and the advantages of FMs for parameter estimation
in sparse settings.
On the other hand there are many different factorization mod-
els like matrix factorization, parallel factor analysis or specialized
models like SVD++, PITF or FPMC. The drawback of these
models is that they are not applicable for general prediction tasks
but work only with special input data. Furthermore their model
equations and optimization algorithms are derived individually
for each task. We show that FMs can mimic these models just
by specifying the input data (i.e. the feature vectors). This makes
FMs easily applicable even for users without expert knowledge
in factorization models.
Index Terms—factorization machine; sparse data; tensor fac-
torization; support vector machine
I. INTRODUCTION
Support Vector Machines are one of the most popular
predictors in machine learning and data mining. Nevertheless
in settings like collaborative filtering, SVMs play no important
role and the best models are either direct applications of
standard matrix/ tensor factorization models like PARAFAC
[1] or specialized models using factorized parameters [2], [3],
[4]. In this paper, we show that the only reason why standard
SVM predictors are not successful in these tasks is that they
cannot learn reliable parameters (‘hyperplanes’) in complex
(non-linear) kernel spaces under very sparse data. On the other
hand, the drawback of tensor factorization models and even
more for specialized factorization models is that (1) they are
not applicable to standard prediction data (e.g. a real valued
feature vector in Rn
.) and (2) that specialized models are
usually derived individually for a specific task requiring effort
in modelling and design of a learning algorithm.
In this paper, we introduce a new predictor, the Factor-
ization Machine (FM), that is a general predictor like SVMs
but is also able to estimate reliable parameters under very
high sparsity. The factorization machine models all nested
variable interactions (comparable to a polynomial kernel in
SVM), but uses a factorized parametrization instead of a
dense parametrization like in SVMs. We show that the model
equation of FMs can be computed in linear time and that it
depends only on a linear number of parameters. This allows
direct optimization and storage of model parameters without
the need of storing any training data (e.g. support vectors) for
prediction. In contrast to this, non-linear SVMs are usually
optimized in the dual form and computing a prediction (the
model equation) depends on parts of the training data (the
support vectors). We also show that FMs subsume many of
the most successful approaches for the task of collaborative
filtering including biased MF, SVD++ [2], PITF [3] and FPMC
[4].
In total, the advantages of our proposed FM are:
1) FMs allow parameter estimation under very sparse data
where SVMs fail.
2) FMs have linear complexity, can be optimized in the
primal and do not rely on support vectors like SVMs.
We show that FMs scale to large datasets like Netflix
with 100 millions of training instances.
3) FMs are a general predictor that can work with any real
valued feature vector. In contrast to this, other state-of-
the-art factorization models work only on very restricted
input data. We will show that just by defining the feature
vectors of the input data, FMs can mimic state-of-the-art
models like biased MF, SVD++, PITF or FPMC.
II. PREDICTION UNDER SPARSITY
The most common prediction task is to estimate a function
y : Rn
→ T from a real valued feature vector x ∈ Rn
to a
target domain T (e.g. T = R for regression or T = {+, −}
for classification). In supervised settings, it is assumed that
there is a training dataset D = {(x(1)
, y(1)
), (x(2)
, y(2)
), . . .}
of examples for the target function y given. We also investigate
the ranking task where the function y with target T = R can
be used to score feature vectors x and sort them according to
their score. Scoring functions can be learned with pairwise
training data [5], where a feature tuple (x(A)
, x(B)
) ∈ D
means that x(A)
should be ranked higher than x(B)
. As the
pairwise ranking relation is antisymmetric, it is sufficient to
use only positive training instances.
In this paper, we deal with problems where x is highly
sparse, i.e. almost all of the elements xi of a vector x are
zero. Let m(x) be the number of non-zero elements in the
2010
Supervised
Classification,regression
23. XGBoost
• Ensemble methods use multiple learning
algorithms to improve predictions
• Boosting: “Can a set of weak learners create a
single strong learner?”
• Gradient Boosting: using gradient descent over a
function space
• eXtreme Gradient Boosting
• https://github.com/dmlc/xgboost
• Supports regression, classification, ranking
and user defined objectives
XGBoost: A Scalable Tree Boosting System
Tianqi Chen
University of Washington
[email protected]
Carlos Guestrin
University of Washington
[email protected]
ABSTRACT
Tree boosting is a highly e↵ective and widely used machine
learning method. In this paper, we describe a scalable end-
to-end tree boosting system called XGBoost, which is used
widely by data scientists to achieve state-of-the-art results
on many machine learning challenges. We propose a novel
sparsity-aware algorithm for sparse data and weighted quan-
tile sketch for approximate tree learning. More importantly,
we provide insights on cache access patterns, data compres-
sion and sharding to build a scalable tree boosting system.
By combining these insights, XGBoost scales beyond billions
of examples using far fewer resources than existing systems.
Keywords
Large-scale Machine Learning
1. INTRODUCTION
Machine learning and data-driven approaches are becom-
ing very important in many areas. Smart spam classifiers
protect our email by learning from massive amounts of spam
data and user feedback; advertising systems learn to match
the right ads with the right context; fraud detection systems
protect banks from malicious attackers; anomaly event de-
tection systems help experimental physicists to find events
that lead to new physics. There are two important factors
that drive these successful applications: usage of e↵ective
(statistical) models that capture the complex data depen-
dencies and scalable learning systems that learn the model
of interest from large datasets.
Among the machine learning methods used in practice,
gradient tree boosting [10]1
is one technique that shines
in many applications. Tree boosting has been shown to
give state-of-the-art results on many standard classification
benchmarks [16]. LambdaMART [5], a variant of tree boost-
ing for ranking, achieves state-of-the-art result for ranking
1
Gradient tree boosting is also known as gradient boosting
machine (GBM) or gradient boosted regression tree (GBRT)
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
KDD ’16, August 13-17, 2016, San Francisco, CA, USA
c 2016 Copyright held by the owner/author(s).
ACM ISBN .
DOI:
problems. Besides being used as a stand-alone predictor, it
is also incorporated into real-world production pipelines for
ad click through rate prediction [15]. Finally, it is the de-
facto choice of ensemble method and is used in challenges
such as the Netflix prize [3].
In this paper, we describe XGBoost, a scalable machine
learning system for tree boosting. The system is available as
an open source package2
. The impact of the system has been
widely recognized in a number of machine learning and data
mining challenges. Take the challenges hosted by the ma-
chine learning competition site Kaggle for example. Among
the 29 challenge winning solutions 3
published at Kaggle’s
blog during 2015, 17 solutions used XGBoost. Among these
solutions, eight solely used XGBoost to train the model,
while most others combined XGBoost with neural nets in en-
sembles. For comparison, the second most popular method,
deep neural nets, was used in 11 solutions. The success
of the system was also witnessed in KDDCup 2015, where
XGBoost was used by every winning team in the top-10.
Moreover, the winning teams reported that ensemble meth-
ods outperform a well-configured XGBoost by only a small
amount [1].
These results demonstrate that our system gives state-of-
the-art results on a wide range of problems. Examples of
the problems in these winning solutions include: store sales
prediction; high energy physics event classification; web text
classification; customer behavior prediction; motion detec-
tion; ad click through rate prediction; malware classification;
product categorization; hazard risk prediction; massive on-
line course dropout rate prediction. While domain depen-
dent data analysis and feature engineering play an important
role in these solutions, the fact that XGBoost is the consen-
sus choice of learner shows the impact and importance of
our system and tree boosting.
The most important factor behind the success of XGBoost
is its scalability in all scenarios. The system runs more than
ten times faster than existing popular solutions on a single
machine and scales to billions of examples in distributed or
memory-limited settings. The scalability of XGBoost is due
to several important systems and algorithmic optimizations.
These innovations include: a novel tree learning algorithm
is for handling sparse data; a theoretically justified weighted
quantile sketch procedure enables handling instance weights
in approximate tree learning. Parallel and distributed com-
puting makes learning faster which enables quicker model ex-
ploration. More importantly, XGBoost exploits out-of-core
2
https://github.com/dmlc/xgboost
3
Solutions come from of top-3 teams of each competitions.
arXiv:1603.02754v3[cs.LG]10Jun2016
2016
Supervised
Classification,regression
26. Image Classification
Deep Residual Learning for Image Recognition
Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun
Microsoft Research
{kahe, v-xiangz, v-shren, jiansun}@microsoft.com
Abstract
Deeper neural networks are more difficult to train. We
present a residual learning framework to ease the training
of networks that are substantially deeper than those used
previously. We explicitly reformulate the layers as learn-
ing residual functions with reference to the layer inputs, in-
stead of learning unreferenced functions. We provide com-
prehensive empirical evidence showing that these residual
networks are easier to optimize, and can gain accuracy from
considerably increased depth. On the ImageNet dataset we
evaluate residual nets with a depth of up to 152 layers—8⇥
deeper than VGG nets [41] but still having lower complex-
ity. An ensemble of these residual nets achieves 3.57% error
on the ImageNet test set. This result won the 1st place on the
ILSVRC 2015 classification task. We also present analysis
on CIFAR-10 with 100 and 1000 layers.
The depth of representations is of central importance
for many visual recognition tasks. Solely due to our ex-
tremely deep representations, we obtain a 28% relative im-
provement on the COCO object detection dataset. Deep
residual nets are foundations of our submissions to ILSVRC
& COCO 2015 competitions1
, where we also won the 1st
places on the tasks of ImageNet detection, ImageNet local-
ization, COCO detection, and COCO segmentation.
1. Introduction
Deep convolutional neural networks [22, 21] have led
to a series of breakthroughs for image classification [21,
50, 40]. Deep networks naturally integrate low/mid/high-
level features [50] and classifiers in an end-to-end multi-
layer fashion, and the “levels” of features can be enriched
by the number of stacked layers (depth). Recent evidence
[41, 44] reveals that network depth is of crucial importance,
and the leading results [41, 44, 13, 16] on the challenging
ImageNet dataset [36] all exploit “very deep” [41] models,
with a depth of sixteen [41] to thirty [16]. Many other non-
trivial visual recognition tasks [8, 12, 7, 32, 27] have also
1http://image-net.org/challenges/LSVRC/2015/ and
http://mscoco.org/dataset/#detections-challenge2015.
0 1 2 3 4 5 6
0
10
20
iter. (1e4)
trainingerror(%)
0 1 2 3 4 5 6
0
10
20
iter. (1e4)
testerror(%)
56-layer
20-layer
56-layer
20-layer
Figure 1. Training error (left) and test error (right) on CIFAR-10
with 20-layer and 56-layer “plain” networks. The deeper network
has higher training error, and thus test error. Similar phenomena
on ImageNet is presented in Fig. 4.
greatly benefited from very deep models.
Driven by the significance of depth, a question arises: Is
learning better networks as easy as stacking more layers?
An obstacle to answering this question was the notorious
problem of vanishing/exploding gradients [1, 9], which
hamper convergence from the beginning. This problem,
however, has been largely addressed by normalized initial-
ization [23, 9, 37, 13] and intermediate normalization layers
[16], which enable networks with tens of layers to start con-
verging for stochastic gradient descent (SGD) with back-
propagation [22].
When deeper networks are able to start converging, a
degradation problem has been exposed: with the network
depth increasing, accuracy gets saturated (which might be
unsurprising) and then degrades rapidly. Unexpectedly,
such degradation is not caused by overfitting, and adding
more layers to a suitably deep model leads to higher train-
ing error, as reported in [11, 42] and thoroughly verified by
our experiments. Fig. 1 shows a typical example.
The degradation (of training accuracy) indicates that not
all systems are similarly easy to optimize. Let us consider a
shallower architecture and its deeper counterpart that adds
more layers onto it. There exists a solution by construction
to the deeper model: the added layers are identity mapping,
and the other layers are copied from the learned shallower
model. The existence of this constructed solution indicates
that a deeper model should produce no higher training error
than its shallower counterpart. But experiments show that
our current solvers on hand are unable to find solutions that
1
arXiv:1512.03385v1[cs.CV]10Dec2015
Densely Connected Convolutional Networks
Gao Huang⇤
Cornell University
[email protected]
Zhuang Liu⇤
Tsinghua University
[email protected]
Laurens van der Maaten
Facebook AI Research
[email protected]
Kilian Q. Weinberger
Cornell University
[email protected]
Abstract
Recent work has shown that convolutional networks can
be substantially deeper, more accurate, and efficient to train
if they contain shorter connections between layers close to
the input and those close to the output. In this paper, we
embrace this observation and introduce the Dense Convo-
lutional Network (DenseNet), which connects each layer
to every other layer in a feed-forward fashion. Whereas
traditional convolutional networks with L layers have L
connections—one between each layer and its subsequent
layer—our network has L(L+1)
2 direct connections. For
each layer, the feature-maps of all preceding layers are
used as inputs, and its own feature-maps are used as inputs
into all subsequent layers. DenseNets have several com-
pelling advantages: they alleviate the vanishing-gradient
problem, strengthen feature propagation, encourage fea-
ture reuse, and substantially reduce the number of parame-
ters. We evaluate our proposed architecture on four highly
competitive object recognition benchmark tasks (CIFAR-10,
CIFAR-100, SVHN, and ImageNet). DenseNets obtain sig-
nificant improvements over the state-of-the-art on most of
them, whilst requiring less computation to achieve high per-
formance. Code and pre-trained models are available at
https://github.com/liuzhuang13/DenseNet.
1. Introduction
Convolutional neural networks (CNNs) have become
the dominant machine learning approach for visual object
recognition. Although they were originally introduced over
20 years ago [18], improvements in computer hardware and
network structure have enabled the training of truly deep
CNNs only recently. The original LeNet5 [19] consisted of
5 layers, VGG featured 19 [29], and only last year Highway
⇤Authors contributed equally
x0
x1
H1
x2
H2
H3
H4
x3
x4
Figure 1: A 5-layer dense block with a growth rate of k = 4.
Each layer takes all preceding feature-maps as input.
Networks [34] and Residual Networks (ResNets) [11] have
surpassed the 100-layer barrier.
As CNNs become increasingly deep, a new research
problem emerges: as information about the input or gra-
dient passes through many layers, it can vanish and “wash
out” by the time it reaches the end (or beginning) of the
network. Many recent publications address this or related
problems. ResNets [11] and Highway Networks [34] by-
pass signal from one layer to the next via identity connec-
tions. Stochastic depth [13] shortens ResNets by randomly
dropping layers during training to allow better information
and gradient flow. FractalNets [17] repeatedly combine sev-
eral parallel layer sequences with different number of con-
volutional blocks to obtain a large nominal depth, while
maintaining many short paths in the network. Although
these different approaches vary in network topology and
training procedure, they all share a key characteristic: they
create short paths from early layers to later layers.
1
arXiv:1608.06993v5[cs.CV]28Jan2018
Inception Recurrent Convolutional Neural Network for Object Recognition
Md Zahangir Alom [email protected]
University of Dayton, Dayton, OH, USA
Mahmudul Hasan [email protected]
Comcast Labs, Washington, DC, USA
Chris Yakopcic [email protected]
University of Dayton, Dayton, OH, USA
Tarek M. Taha [email protected]
University of Dayton, Dayton, OH, USA
Abstract
Deep convolutional neural networks (DCNNs)
are an influential tool for solving various prob-
lems in the machine learning and computer vi-
sion fields. In this paper, we introduce a
new deep learning model called an Inception-
Recurrent Convolutional Neural Network (IR-
CNN), which utilizes the power of an incep-
tion network combined with recurrent layers in
DCNN architecture. We have empirically eval-
uated the recognition performance of the pro-
posed IRCNN model using different benchmark
datasets such as MNIST, CIFAR-10, CIFAR-
100, and SVHN. Experimental results show sim-
ilar or higher recognition accuracy when com-
pared to most of the popular DCNNs including
the RCNN. Furthermore, we have investigated
IRCNN performance against equivalent Incep-
tion Networks and Inception-Residual Networks
using the CIFAR-100 dataset. We report about
3.5%, 3.47% and 2.54% improvement in classifi-
cation accuracy when compared to the RCNN,
equivalent Inception Networks, and Inception-
Residual Networks on the augmented CIFAR-
100 dataset respectively.
1. Introduction
In recent years, deep learning using Convolutional Neu-
ral Networks (CNNs) has shown enormous success in the
field of machine learning and computer vision. CNNs pro-
vide state-of-the-art accuracy in various image recognition
tasks including object recognition (Schmidhuber, 2015;
Krizhevsky et al., 2012; Simonyan & Zisserman, 2014;
Szegedy et al., 2015), object detection (Girshick et al.,
2014), tracking (Wang et al., 2015), and image caption-
ing (Xu et al., 2014). In addition, this technique has been
applied massively in computer vision tasks such as video
representation and classification of human activity (Bal-
las et al., 2015). Machine translation and natural language
processing are applied deep learning techniques that show
great success in this domain (Collobert & Weston, 2008;
Manning et al., 2014). Furthermore, this technique has
been used extensively in the field of speech recognition
(Hinton et al., 2012). Moreover, deep learning is not lim-
ited to signal, natural language, image, and video process-
ing tasks, it has been applying successfully for game devel-
opment (Mnih et al., 2013; Lillicrap et al., 2015). There is
a lot of ongoing research for developing even better perfor-
mance and improving the training process of DCNNs (Lin
et al., 2013; Springenberg et al., 2014; Goodfellow et al.,
2013; Ioffe & Szegedy, 2015; Zeiler & Fergus, 2013).
In some cases, machine intelligence shows better perfor-
mance compared to human intelligence including calcula-
tion, chess, memory, and pattern matching. On the other
hand, human intelligence still provides better performance
in other fields such as object recognition, scene under-
standing, and more. Deep learning techniques (DCNNs
in particular) perform very well in the domains of detec-
tion, classification, and scene understanding. There is a
still a gap that must be closed before human level intelli-
gence is reached when performing visual recognition tasks.
Machine intelligence may open an opportunity to build a
system that can process visual information the way that a
human brain does. According to the study on the visual
processing system within a human brain by James DiCarlo
et al. (Zoccolan & Rust, 2012) the brain consists of sev-
eral visual processing units starting with the visual cortex
arXiv:1704.07709v1[cs.CV]25Apr2017
2015-2017
Supervised
Im
age
Classification
27. Convolutional Neural Networks (CNNs)
By Debarko De @debarko
https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
28. SOCKEYE:
A Toolkit for Neural Machine Translation
Felix Hieber, Tobias Domhan, Michael Denkowski,
David Vilar, Artem Sokolov, Ann Clifton, Matt Post
{fhieber,domhant,mdenkows,dvilar,artemsok,acclift,mattpost}@amazon.com
Abstract
We describe SOCKEYE,1
an open-source sequence-to-sequence toolkit for Neural
Machine Translation (NMT). SOCKEYE is a production-ready framework for
training and applying models as well as an experimental platform for researchers.
Written in Python and built on MXNET, the toolkit offers scalable training and
inference for the three most prominent encoder-decoder architectures: attentional
recurrent neural networks, self-attentional transformers, and fully convolutional
networks. SOCKEYE also supports a wide range of optimizers, normalization and
regularization techniques, and inference improvements from current NMT literature.
Users can easily run standard training recipes, explore different model settings, and
incorporate new ideas. In this paper, we highlight SOCKEYE’s features and bench-
mark it against other NMT toolkits on two language arcs from the 2017 Conference
on Machine Translation (WMT): English–German and Latvian–English. We report
competitive BLEU scores across all three architectures, including an overall best
score for SOCKEYE’s transformer implementation. To facilitate further comparison,
we release all system outputs and training scripts used in our experiments. The
SOCKEYE toolkit is free software released under the Apache 2.0 license.
1 Introduction
The past two years have seen a deep learning revolution bring rapid and dramatic change to the field
of machine translation. For users, new neural network-based models consistently deliver better quality
translations than the previous generation of phrase-based systems. For researchers, Neural Machine
Translation (NMT) provides an exciting new landscape where training pipelines are simplified and
unified models can be trained directly from data. The promise of moving beyond the limitations of
Statistical Machine Translation (SMT) has energized the community, leading recent work to focus
almost exclusively on NMT and seemingly advance the state of the art every few months.
For all its success, NMT also presents a range of new challenges. While popular encoder-decoder
models are attractively simple, recent literature and the results of shared evaluation tasks show that
a significant amount of engineering is required to achieve “production-ready” performance in both
translation quality and computational efficiency. In a trend that carries over from SMT, the strongest
NMT systems benefit from subtle architecture modifications, hyper-parameter tuning, and empirically
effective heuristics. Unlike SMT, there is no “de-facto” toolkit that attracts most of the community’s
attention and thus contains all the best ideas from recent literature.2
Instead, the presence of many
independent toolkits3
brings diversity to the field, but also makes it difficult to compare architectural
and algorithmic improvements that are each implemented in different toolkits.
1
https://github.com/awslabs/sockeye (version 1.12)
2
For SMT, this role was largely filled by MOSES [Koehn et al., 2007].
3
https://github.com/jonsafari/nmt-list
arXiv:1712.05690v1[cs.CL]15Dec2017
Sequence to Sequence (seq2seq)
• seq2seq is a supervised learning algorithm where the
input is a sequence of tokens (for example, text,
audio) and the output generated is another
sequence of tokens.
• Example applications include:
• machine translation (input a sentence from
one language and predict what that sentence
would be in another language)
• text summarization (input a longer string of
words and predict a shorter string of words
that is a summary)
• speech-to-text (audio clips converted into
output sentences in tokens).
29. SOCKEYE:
A Toolkit for Neural Machine Translation
Felix Hieber, Tobias Domhan, Michael Denkowski,
David Vilar, Artem Sokolov, Ann Clifton, Matt Post
{fhieber,domhant,mdenkows,dvilar,artemsok,acclift,mattpost}@amazon.com
Abstract
We describe SOCKEYE,1
an open-source sequence-to-sequence toolkit for Neural
Machine Translation (NMT). SOCKEYE is a production-ready framework for
training and applying models as well as an experimental platform for researchers.
Written in Python and built on MXNET, the toolkit offers scalable training and
inference for the three most prominent encoder-decoder architectures: attentional
recurrent neural networks, self-attentional transformers, and fully convolutional
networks. SOCKEYE also supports a wide range of optimizers, normalization and
regularization techniques, and inference improvements from current NMT literature.
Users can easily run standard training recipes, explore different model settings, and
incorporate new ideas. In this paper, we highlight SOCKEYE’s features and bench-
mark it against other NMT toolkits on two language arcs from the 2017 Conference
on Machine Translation (WMT): English–German and Latvian–English. We report
competitive BLEU scores across all three architectures, including an overall best
score for SOCKEYE’s transformer implementation. To facilitate further comparison,
we release all system outputs and training scripts used in our experiments. The
SOCKEYE toolkit is free software released under the Apache 2.0 license.
1 Introduction
The past two years have seen a deep learning revolution bring rapid and dramatic change to the field
of machine translation. For users, new neural network-based models consistently deliver better quality
translations than the previous generation of phrase-based systems. For researchers, Neural Machine
Translation (NMT) provides an exciting new landscape where training pipelines are simplified and
unified models can be trained directly from data. The promise of moving beyond the limitations of
Statistical Machine Translation (SMT) has energized the community, leading recent work to focus
almost exclusively on NMT and seemingly advance the state of the art every few months.
For all its success, NMT also presents a range of new challenges. While popular encoder-decoder
models are attractively simple, recent literature and the results of shared evaluation tasks show that
a significant amount of engineering is required to achieve “production-ready” performance in both
translation quality and computational efficiency. In a trend that carries over from SMT, the strongest
NMT systems benefit from subtle architecture modifications, hyper-parameter tuning, and empirically
effective heuristics. Unlike SMT, there is no “de-facto” toolkit that attracts most of the community’s
attention and thus contains all the best ideas from recent literature.2
Instead, the presence of many
independent toolkits3
brings diversity to the field, but also makes it difficult to compare architectural
and algorithmic improvements that are each implemented in different toolkits.
1
https://github.com/awslabs/sockeye (version 1.12)
2
For SMT, this role was largely filled by MOSES [Koehn et al., 2007].
3
https://github.com/jonsafari/nmt-list
arXiv:1712.05690v1[cs.CL]15Dec2017
Sequence to Sequence (seq2seq)
• Recently, problems in this domain have been
successfully modeled with deep neural networks
that show a significant performance boost over
previous methodologies.
• Amazon released in open source the Sockeye
package, which uses Recurrent Neural Networks
(RNNs) and Convolutional Neural Network (CNN)
models with attention as encoder-decoder
architectures.
• https://github.com/awslabs/sockeye
2014-2017
Supervised
Text,Audio
30. Sequence to Sequence (seq2seq)
https://aws.amazon.com/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
2014-2017
Supervised
Text,Audio
31. Sequence to Sequence (seq2seq)
https://aws.amazon.com/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
“DasgrüneHaus”
“the Green House”
2014-2017
Supervised
Text,Audio
32. K-Means Clustering
SOME METHODS FOR
CLASSIFICATION AND ANALYSIS
OF MULTIVARIATE OBSERVATIONS
J. MACQUEEN
UNIVERSITY OF CALIFORNIA, Los ANGELES
1. Introduction
The main purpose of this paper is to describe a process for partitioning an
N-dimensional population into k sets on the basis of a sample. The process,
which is called 'k-means,' appears to give partitions which are reasonably
efficient in the sense of within-class variance. That is, if p is the probability mass
function for the population, S = {S1, S2, -
* *, Sk} is a partition of EN, and ui,
i = 1, 2, * -
, k, is the conditional mean of p over the set Si, then W2(S) =
ff=ISif z - u42 dp(z) tends to be low for the partitions S generated by the
method. We say 'tends to be low,' primarily because of intuitive considerations,
corroborated to some extent by mathematical analysis and practical computa-
tional experience. Also, the k-means procedure is easily programmed and is
computationally economical, so that it is feasible to process very large samples
on a digital computer. Possible applications include methods for similarity
grouping, nonlinear prediction, approximating multivariate distributions, and
nonparametric tests for independence among several variables.
In addition to suggesting practical classification methods, the study of k-means
has proved to be theoretically interesting. The k-means concept represents a
generalization of the ordinary sample mean, and one is naturally led to study the
pertinent asymptotic behavior, the object being to establish some sort of law of
large numbers for the k-means. This problem is sufficiently interesting, in fact,
for us to devote a good portion of this paper to it. The k-means are defined in
section 2.1, and the main results which have been obtained on the asymptotic
behavior are given there. The rest of section 2 is devoted to the proofs of these
results. Section 3 describes several specific possible applications, and reports
some preliminary results from computer experiments conducted to explore the
possibilities inherent in the k-means idea. The extension to general metric spaces
is indicated briefly in section 4.
The original point of departure for the work described here was a series of
problems in optimal classification (MacQueen [9]) which represented special
This work was supported by the Western Management Science Institute under a grant from
the Ford Foundation, and by the Office of Naval Research under Contract No. 233(75), Task
No. 047-041.
281
Bulletin de l’acad´emie
polonaise des sciences
Cl. III — Vol. IV, No. 12, 1956
MATH´EMATIQUE
Sur la division des corps mat´eriels en parties 1
par
H. STEINHAUS
Pr´esent´e le 19 Octobre 1956
Un corps Q est, par d´efinition, une r´epartition de mati`ere dans l’espace,
donn´ee par une fonction f(P) ; on appelle cette fonction la densit´e du corps
en question ; elle est d´efinie pour tous les points P de l’espace ; elle est non-
n´egative et mesurable. On suppose que l’ensemble caract´eristique du corps
E =E
P
{f(P) > 0} est born´e et de mesure positive ; on suppose aussi que
l’int´egrale de f(P) sur E est finie : c’est la masse du corps Q. On consid`ere
comme identiques deux corps dont les densit´es sont ´egales `a un ensemble de
mesure nulle pr`es.
En d´ecomposant l’ensemble caract´eristique d’un corps Q en n sous-ensembles
Ei (i = 1, 2, . . . , n) de mesures positives, on obtient une division du corps en
question en n corps partiels ; leurs ensembles caract´eristiques respectifs sont
les Ei et leurs densit´es sont d´efinies par les valeurs que prend la densit´e du
corps Q dans ces ensembles partiels. En d´esignant les corps partiels par Qi, on
´ecrira Q = Q1 + Q2 + . . . + Qn. Quand on donne d’abord n corps Qi, dont les
ensembles caract´eristiques sont disjoints deux `a deux `a la mesure nulle pr`es, il
existe ´evidemment un corps Q ayant ces Qi comme autant de parties ; on ´ecrira
Q1 + Q2 + . . . + Qn = Q. Ces remarques su sent pour expliquer la division et
la composition des corps.
Le probl`eme de cette Note est la division d’un corps en n parties Ki
(i = 1, 2, . . . , n) et le choix de n points Ai de mani`ere `a rendre aussi petite que
possible la somme
(1) S(K, A) =
nX
i=1
I(Ki, Ai) (K ⌘ {Ki}, A ⌘ {Ai}),
o`u I(Q, P) d´esigne, en g´en´eral, le moment d’inertie d’un corps quelconque Q
par rapport `a un point quelconque P. Pour traiter ce probl`eme ´el´ementaire nous
aurons recours aux lemmes suivants :
1. Cet article de Hugo Steinhaus est le premier formulant de mani`ere explicite, en dimen-
sion finie, le probl`eme de partitionnement par les k-moyennes (k-means), dites aussi “nu´ees
dynamiques”. Son algorithme classique est le mˆeme que celui de la quantification optimale de
Lloyd-Max. ´Etant di cilement accessible sous format num´erique, le voici transduit par Maciej
Denkowski, transmis par J´erˆome Bolte, transcrit par Laurent Duval, en juillet/aoˆut 2015. Un
e↵ort a ´et´e fourni pour conserver une proximit´e avec la pagination originale.
801
1956-1967
U
nsupervised
Clustering
34. Principal Component Analysis (PCA)
• PCA is an unsupervised learning algorithm that
attempts to reduce the dimensionality (number
of features) within a dataset while still retaining
as much information as possible
• This is done by finding a new set of features
called components, which are composites of the
original features that are uncorrelated with one
another
• They are also constrained so that the first
component accounts for the largest possible
variability in the data, the second component the
second most variability, and so on
Pearson, K. 1901. On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2:559-572.
http://pbil.univ-lyon1.fr/R/pearson1901.pdf
1901
U
nsupervised
D
im
ensionality
Reduction
37. Latent Dirichlet Allocation (LDA)
Copyright 2000 by the Genetics Society of America
Inference of Population Structure Using Multilocus Genotype Data
Jonathan K. Pritchard, Matthew Stephens and Peter Donnelly
Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom
Manuscript received September 23, 1999
Accepted for publication February 18, 2000
ABSTRACT
We describe a model-based clustering method for using multilocus genotype data to infer population
structure and assign individuals to populations. We assume a model in which there are K populations
(where K may be unknown), each of which is characterized by a set of allele frequencies at each locus.
Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more popula-
tions if their genotypes indicate that they are admixed. Our model does not assume a particular mutation
process, and it can be applied to most of the commonly used genetic markers, provided that they are not
closely linked. Applications of our method include demonstrating the presence of population structure,
assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individu-
als. We showthat the method can produce highlyaccurate assignments using modest numbers of loci—e.g.,
seven microsatellite loci in an example using genotype data from an endangered bird species. The software
used for this article is available from http:// www.stats.ox.ac.uk/ zpritch/ home.html.
IN applications of population genetics, it is often use- populationsbased on these subjective criteria represents
a natural assignment in genetic terms, and it would beful to classify individuals in a sample into popula-
tions. In one scenario, the investigator begins with a useful to be able to confirm that subjective classifications
are consistent with genetic information and hence ap-sample of individuals and wants to say something about
the properties of populations. For example, in studies propriate for studying the questions of interest. Further,
there are situations where one is interested in “cryptic”of human evolution, the population is often considered
to be the unit of interest, and a great deal of work has population structure—i.e., population structure that is
difficult to detect using visible characters, but may befocused on learning about the evolutionary relation-
ships of modern populations (e.g., Caval l i et al. 1994). significant in genetic terms. For example, when associa-
tion mapping is used to find disease genes, the presenceIn a second scenario, the investigator begins with a set
of predefined populations and wishes to classifyindivid- of undetected population structure can lead to spurious
associations and thus invalidate standard tests (Ewensuals of unknown origin. This type of problem arises
in many contexts (reviewed by Davies et al. 1999). A and Spiel man 1995). The problem of cryptic population
structure also arises in the context of DNA fingerprint-standard approach involves sampling DNA from mem-
bers of a number of potential source populations and ing for forensics, where it is important to assess the
degree of population structure to estimate the probabil-using these samples to estimate allele frequencies in
ity of false matches (Bal ding and Nich ol s 1994, 1995;each population at a series of unlinked loci. Using the
For eman et al. 1997; Roeder et al. 1998).estimated allele frequencies, it is then possible to com-
Pr it ch ar d and Rosenber g (1999) considered howpute the likelihood that a given genotype originated in
genetic information might be used to detect the pres-each population. Individuals of unknown origin can be
ence of cryptic population structure in the associationassigned to populations according to these likelihoods
mapping context. More generally, one would like to bePaet kau et al. 1995; Rannal a and Mount ain 1997).
able to identify the actual subpopulations and assignIn both situations described above, a crucial first step
individuals (probabilistically) to these populations. Inis to define a set of populations. The definition of popu-
this article we use a Bayesian clustering approach tolations is typically subjective, based, for example, on
tackle this problem. We assume a model in which therelinguistic, cultural, or physical characters, as well as the
are K populations (where K may be unknown), each ofgeographic location of sampled individuals. This subjec-
which is characterized by a set of allele frequencies attive approach is usually a sensible way of incorporating
each locus. Our method attempts to assign individualsdiverse types of information. However, it maybe difficult
to populations on the basis of their genotypes, whileto know whether a given assignment of individuals to
simultaneously estimating population allele frequen-
cies. The method can be applied to various types of
markers [e.g., microsatellites, restriction fragment
Corresponding author: Jonathan Pritchard, Department of Statistics,
length polymorphisms (RFLPs), or single nucleotideUniversity of Oxford, 1 S. Parks Rd., Oxford OX1 3TG, United King-
dom. E-mail: [email protected] polymorphisms (SNPs)], but it assumes that the marker
Genetics 155: 945–959 ( June 2000)
Journal of Machine Learning Research 3 (2003) 993-1022 Submitted 2/02; Published 1/03
Latent Dirichlet Allocation
David M. Blei [email protected]
Computer Science Division
University of California
Berkeley, CA 94720, USA
Andrew Y. Ng [email protected]
Computer Science Department
Stanford University
Stanford, CA 94305, USA
Michael I. Jordan [email protected]
Computer Science Division and Department of Statistics
University of California
Berkeley, CA 94720, USA
Editor: John Lafferty
Abstract
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of
discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each
item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in
turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of
text modeling, the topic probabilities provide an explicit representation of a document. We present
efficient approximate inference techniques based on variational methods and an EM algorithm for
empirical Bayes parameter estimation. We report results in document modeling, text classification,
and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI
model.
1. Introduction
In this paper we consider the problem of modeling text corpora and other collections of discrete
data. The goal is to find short descriptions of the members of a collection that enable efficient
processing of large collections while preserving the essential statistical relationships that are useful
for basic tasks such as classification, novelty detection, summarization, and similarity and relevance
judgments.
Significant progress has been made on this problem by researchers in the field of informa-
tion retrieval (IR) (Baeza-Yates and Ribeiro-Neto, 1999). The basic methodology proposed by
IR researchers for text corpora—a methodology successfully deployed in modern Internet search
engines—reduces each document in the corpus to a vector of real numbers, each of which repre-
sents ratios of counts. In the popular tf-idf scheme (Salton and McGill, 1983), a basic vocabulary
of “words” or “terms” is chosen, and, for each document in the corpus, a count is formed of the
number of occurrences of each word. After suitable normalization, this term frequency count is
compared to an inverse document frequency count, which measures the number of occurrences of a
c 2003 David M. Blei, Andrew Y. Ng and Michael I. Jordan.
2000-2003
U
nsupervised
Topic
M
odeling
38. Latent Dirichlet Allocation (LDA)
• As an extremely simple example, given a set of documents where the
only words that occur within them are eat, sleep, play, meow, and
bark, LDA might produce topics like the following:
Topic eat sleep play meow bark
Cats? Topic 1 0.1 0.3 0.2 0.4 0.0
Dogs? Topic 2 0.2 0.1 0.4 0.0 0.3
2000-2003
U
nsupervised
Topic
M
odeling
39. Neural Topic Model (NTM)
Encoder: feedforward net
Input term counts vector
µ
z
Document
Posterior
Sampled Document
Representation
Decoder:
Softmax
Output term counts vector
A Novel Neural Topic Model and Its Supervised Extension
Ziqiang Cao1
Sujian Li1
Yang Liu1
Wenjie Li2
Heng Ji3
1
Key Laboratory of Computational Linguistics, Peking University, MOE, China
2
Computing Department, Hong Kong Polytechnic University, Hong Kong
3
Computer Science Department, Rensselaer Polytechnic Institute, USA
{ziqiangyeah, lisujian, pku7yang}@pku.edu.cn [email protected][email protected]
Abstract
Topic modeling techniques have the benefits of model-
ing words and documents uniformly under a probabilis-
tic framework. However, they also suffer from the limi-
tations of sensitivity to initialization and unigram topic
distribution, which can be remedied by deep learning
techniques. To explore the combination of topic mod-
eling and deep learning techniques, we first explain the
standard topic model from the perspective of a neural
network. Based on this, we propose a novel neural topic
model (NTM) where the representation of words and
documents are efficiently and naturally combined into a
uniform framework. Extending from NTM, we can eas-
ily add a label layer and propose the supervised neu-
ral topic model (sNTM) to tackle supervised tasks. Ex-
periments show that our models are competitive in both
topic discovery and classification/regression tasks.
Introduction
The real-world tasks of text categorization and document
retrieval rely critically on a good representation of words
and documents. So far, state-of-the-art techniques including
topic models (Blei, Ng, and Jordan 2003; Mcauliffe and Blei
2007; Wang, Blei, and Li 2009; Ramage et al. 2009) and
neural networks (Bengio et al. 2003; Hinton and Salakhutdi-
nov 2009; Larochelle and Lauly 2012) have shown remark-
able success in exploring semantic representations of words
and documents. Such models are usually embedded with la-
tent variables or topics, which serve the role of capturing the
efficient low-dimensional representation of words and doc-
uments.
Topic modeling techniques, such as Latent Dirichlet Allo-
cation (LDA) (Blei, Ng, and Jordan 2003), have been widely
used for inferring a low dimensional representation that cap-
tures the latent semantics of words and documents. Each
topic is defined as a distribution over words and each docu-
ment as a mixture distribution over topics. Thus, the seman-
tic representations of both words and documents are com-
bined into a unified framework which has a strict proba-
bilistic explanation. However, topic models also suffer from
certain limitations as follows. First, LDA-based models re-
quire prior distributions which are always difficult to define.
Copyright c 2015, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
Second, previous models rarely adopt n-grams beyond uni-
grams in document modeling due to the sparseness problem,
though n-grams are important to express text. Last, when
there is extra labeling information associated with docu-
ments, topic models have to do some task-specific transfor-
mation in order to make use of it (Mcauliffe and Blei 2007;
Wang, Blei, and Li 2009; Ramage et al. 2009), which may
be computationally costly.
Recently, deep learning techniques also make low di-
mensional representations (i.e., distributed representations)
of words (i.e., word embeddings) and documents (Bengio
et al. 2003; Mnih and Hinton 2007; Collobert and Weston
2008; Mikolov et al. 2013; Ranzato and Szummer 2008;
Hinton and Salakhutdinov 2009; Larochelle and Lauly 2012;
Srivastava, Salakhutdinov, and Hinton 2013) feasible. Word
embeddings provide a way of representing phrases (Mikolov
et al. 2013) and are easy to embed with supervised tasks
(Collobert et al. 2011). With layer-wise pre-training (Ben-
gio et al. 2007), neural networks are built to automatically
initialize their weight values. Yet, the main problem of deep
learning is that it is hard to give each dimension of the gener-
ated distributed representations a reasonable interpretation.
Based on the analysis above, we can see that current topic
modeling and deep learning techniques both exhibit their
strengths and defects in representing words and documents.
A question comes to our mind: Can these two kinds of tech-
niques be combined to represent words and documents si-
multaneously? This combination can on the one hand over-
come the computation complexity of topic models and on
the other hand provide a reasonable probabilistic explana-
tion of the hidden variables.
In our preliminary study we explain topic models from
the perspective of a neural network, starting from the fact
that the conditional probability of a word given a document
can be seen as the product of the probability of a word
given a topic (word-topic representation) and the probabil-
ity of a topic given the document (topic-document represen-
tation). At the same time, to solve the unigram topic dis-
tribution problem of a standard topic model, we make use
of the word embeddings available (Mikolov et al. 2013) to
represent n-grams. Based on the neural network explanation
and n-gram representation, we propose a novel neural topic
model (NTM) where two hidden layers are constructed to
efficiently acquire the n-gram topic and topic-document rep-
2015
U
nsupervised
Topic
M
odeling
40. Time Series Forecasting (DeepAR)
DeepAR: Probabilistic Forecasting with
Autoregressive Recurrent Networks
Valentin Flunkert ⇤
, David Salinas ⇤
, Jan Gasthaus
Amazon Development Center
Germany
<dsalina,flunkert,[email protected]>
Abstract
Probabilistic forecasting, i.e. estimating the probability distribution of a time se-
ries’ future given its past, is a key enabler for optimizing business processes. In
retail businesses, for example, forecasting demand is crucial for having the right
inventory available at the right time at the right place. In this paper we propose
DeepAR, a methodology for producing accurate probabilistic forecasts, based on
training an auto-regressive recurrent network model on a large number of related
time series. We demonstrate how by applying deep learning techniques to fore-
casting, one can overcome many of the challenges faced by widely-used classical
approaches to the problem. We show through extensive empirical evaluation on
several real-world forecasting data sets that our methodology produces more accu-
rate forecasts than other state-of-the-art methods, while requiring minimal manual
work.
1 Introduction
Forecasting plays a key role in automating and optimizing operational processes in most businesses
and enables data driven decision making. In retail for example, probabilistic forecasts of product
supply and demand can be used for optimal inventory management, staff scheduling and topology
planning [17], and are more generally a crucial technology for most aspects of supply chain opti-
mization.
The prevalent forecasting methods in use today have been developed in the setting of forecasting
individual or small groups of time series. In this approach, model parameters for each given time
series are independently estimated from past observations. The model is typically manually selected
to account for different factors, such as autocorrelation structure, trend, seasonality, and other ex-
planatory variables. The fitted model is then used to forecast the time series into the future according
to the model dynamics, possibly admitting probabilistic forecasts through simulation or closed-form
expressions for the predictive distributions. Many methods in this class are based on the classical
Box-Jenkins methodology [3], exponential smoothing techniques, or state space models [11, 18].
In recent years, a new type of forecasting problem has become increasingly important in many appli-
cations. Instead of needing to predict individual or a small number of time series, one is faced with
forecasting thousands or millions of related time series. Examples include forecasting the energy
consumption of individual households, forecasting the load for servers in a data center, or forecast-
ing the demand for all products that a large retailer offers. In all these scenarios, a substantial amount
of data on past behavior of similar, related time series can be leveraged for making a forecast for an
individual time series. Using data from related time series not only allows fitting more complex (and
hence potentially more accurate) models without overfitting, it can also alleviate the time and labor
intensive manual feature engineering and model selection steps required by classical techniques.
⇤
equal contribution
arXiv:1704.04110v2[cs.AI]5Jul2017
2017
Supervised
Tim
e
SeriesForecasting
• DeepAR is a supervised learning algorithm for
forecasting scalar time series using recurrent neural
networks (RNN)
• Classical forecasting methods fit one model to each
individual time series, and then use that model to
extrapolate the time series into the future
• In many applications you might have many similar time
series across a set of cross-sectional units
• For example, demand for different products, load of servers,
requests for web pages, and so on
• In this case, it can be beneficial to train a single model
jointly over all of these time series
• DeepAR takes this approach, training a model for predicting a
time series over a large set of (related) time series
43. Word2vec ⇾ Word Embedding
2013
Supervised
W
ord
Em
bedding
Contextual
Bag-Of-Words
(CBOW)
to predict a word
given its context
Skip-Gram with
Negative Sampling
(SGNS)
to predict the context
given a word
46. Our Customers use ML at a massive scale
“We collect 160M events
daily in the ML pipeline
and run training over the
last 15 days and need it to
complete in one hour.
Effectively there's 100M
features in the model.”
Valentino Volonghi, CTO
“We process 3 million ad
requests a second,
100,000 features per
request. That’s 250 trillion
per day. Not your run of
the mill Data science
problem!”
Bill Simmons, CTO
“Our data warehouse is
100TB and we are
processing 2TB daily.
We're running mostly
gradient boosting (trees),
LDA and K-Means
clustering and collaborative
filtering.“
Shahar Cizer Kobrinsky,
VP Architecture
62. Amazon SageMaker
• Hosted Jupyter notebooks that
require no setup, so that you can
start processing your training
dataset and developing your
algorithms immediately
• One-click, on-demand distributed
training that sets up and tears
down the cluster after training.
• Built-in, high-performance ML
algorithms, re-engineered for
greater, speed, accuracy, and
data-throughput
Exploration Training
Hosting
63. Amazon SageMaker
• Built-in model tuning
(hyperparameter optimization)
that can automatically adjust
hundreds of different
combinations of algorithm
parameters
• An elastic, secure, and scalable
environment to host your models,
with one-click deployment
67. AWS
Services
Used
Amazon
SageMaker
AWS
Greengrass
Mobile
Edge
Cloud
4G
Network
Vodafone
Capabilities
Used
Services
Solution The Benefits
AW
S
Sum
m
it
M
ilan
2018
Use of AWS Greengrass • Seamlessly extends AWS cloud
capabilities to devices
• Integrates edge computing
with cloud natively
• Speed: Proof of Concept realised
in 7 weeks
• Future-ready: Can enrich application features +
apply concept
to other use cases
• A showcase: For future applications
Off-load of compute
from camera to Telco
Edge Cloud
• Lower Bill of Material
• Decoupling the life cycles:
Car versus Cloud
• Real-time performance
69. And Then There Are Algorithms
Algorithm Scope
Infinitely
Scalable
Linear Learner classification, regression Y
Factorization Machines classification, regression, sparse datasets Y
XGBoost regression, classification (binary and multiclass), and ranking
Image Classification CNNs (ResNet, DenseNet, Inception)
Sequence to Sequence (seq2seq) translation, text summarization, speech-to-text (RNNs, CNN)
K-Means Clustering clustering, unsupervised Y
Principal Component Analysis (PCA) dimensionality reduction, unsupervised Y
Latent Dirichlet Allocation (LDA) topic modeling, unsupervised
Neural Topic Model (NTM) topic modeling, unsupervised Y
Time Series Forecasting (DeepAR) time series forecasting (RNN) Y
BlazingText (Word2vec) word embeddings