高斯贝叶斯模型_没有使用推理网络的数学的贝叶斯高斯混合模型-CSDN博客

高斯贝叶斯模型

A quick practical guide to coding Gaussian mixture models in Infer.NET.

在Infer.NET中编码高斯混合模型的快速实用指南。

In this post, I will provide a brief introduction to Bayesian Gaussian mixture models and share my experience of building these types of models in Microsoft’s Infer.NET probabilistic graphical model framework. Being comfortable and familiar with k-means clustering and Python, I found it challenging to learn c#, Infer.NET and some of the underlying Bayesian principles used in probabilistic inference. My hope is that this article will save you time, remove any intimidation that the theory may bring and demonstrate some of the advantages of what is known as the Model-based machine learning (MBML) approach. Please follow the instructions provided in the Infer.NET documentation to get set up with the Infer.NET framework.

在本文中，我将简要介绍贝叶斯高斯混合模型，并分享我在Microsoft的Infer.NET概率图形模型框架中构建这些类型的模型的经验。熟悉并熟悉k-means聚类和Python，我发现学习c＃，Infer.NET和概率推断中使用的一些基本贝叶斯原理具有挑战性。我希望这篇文章可以节省您的时间，消除理论可能带来的任何恐吓，并证明所谓的基于模型的机器学习 (MBML)方法的某些优点。请按照Infer.NET文档中提供的说明进行操作设置Infer.NET框架。

Bayesian Gaussian mixture models constitutes a form of unsupervised learning and can be useful in fitting multi-modal data for tasks such as clustering, data compression, outlier detection, or generative classifiers. Each Gaussian component is usually a multivariate Gaussian with a mean vector and covariance matrix. For the sake of demonstration we will consider a simple univariate case.

贝叶斯高斯混合模型构成了无监督学习的一种形式，可用于拟合多模式数据以完成诸如聚类，数据压缩，离群值检测或生成分类器之类的任务。每个高斯分量通常是具有均值向量和协方差矩阵的多元高斯。为了演示，我们将考虑一个简单的单变量情况。

Let us sample data from a univariate Gaussian distribution and store the data in a .csv file using Python code:

让我们从单变量高斯分布中采样数据并将数据存储在。使用Python代码的csv文件：

This is what our data looks like:

我们的数据如下所示：

Let us pretend for a moment we did not know the distribution that generated our data set. We visualise the data and make the assumption that the data was generated by a Gaussian distribution. In other words, we hope that a Gaussian distribution can sufficiently describe our data set. However, we do not know the location or the spread of this Gaussian distribution. A Gaussian distribution can be parameterised by a mean and variance parameter. Sometimes it is easier mathematically to use a mean and precision, where precision is simply the inverse of variance. We will stick with precision for which the intuition is that the higher the precision the narrower (or more “certain”) the spread of the Gaussian distribution.

让我们假装一会儿，我们不知道生成数据集的分布。我们将数据可视化，并假设数据是由高斯分布生成的。换句话说，我们希望高斯分布能够充分描述我们的数据集。但是，我们不知道这种高斯分布的位置或分布。高斯分布可以参数化由均值和方差参数。有时，在数学上使用均值和精度会更容易，其中精度只是方差的倒数。直觉上，我们将坚持精度，精度越高，高斯分布的分布越窄(或更“确定”)。

Firstly, we are interested in finding the mean parameter of this Gaussian distribution, and will pretend that we know the value of its precision (we set the precision=1). In other words, we