A presentation for meeting of statistics in Japan.
https://connpass.com/event/204931/
Topics:
Methods and their properties to measure dependencies between variables
1) Canonical correlation analysis (CCA) is a statistical method that analyzes the correlation relationship between two sets of multidimensional variables.
2) CCA finds linear transformations of the two sets of variables so that their correlation is maximized. This can be formulated as a generalized eigenvalue problem.
3) The number of dimensions of the transformed variables is determined using Bartlett's test, which tests the eigenvalues against a chi-squared distribution.
1) Canonical correlation analysis (CCA) is a statistical method that analyzes the correlation relationship between two sets of multidimensional variables.
2) CCA finds linear transformations of the two sets of variables so that their correlation is maximized. This can be formulated as a generalized eigenvalue problem.
3) The number of dimensions of the transformed variables is determined using Bartlett's test, which tests the eigenvalues against a chi-squared distribution.
Imputation of Missing Values using Random ForestSatoshi Kato
missForest packageの紹介
“MissForest - nonparametric missing value imputation for mixed-type data (DJ Stekhoven, P Bühlmann (2011), Bioinformatics 28 (1), 112-118)
22. 22
require(party)
アイディア→
※“We want to measure the association between xp (説明変数) and y (目的変数)given
the correlation structure between xp and the other predictor variables inherent in the data set.
To meet this aim, Strobl et al. (2008) suggest a conditional permutation scheme, where xp
is permuted only within groups of observations (他の説明変数)with Z = z in order to
preserve the correlation structure between xp and the other predictor variables”
A toolbox for recursive partitioning
※Strobl et al., 2009, “Party on! A New, Conditional Variable Importance Measure for Random Forests Available in the party Package”
model <- cforest(formula, data = list(), control = cforest_unbiased())
varimp(model, conditional = TRUE) # 予測精度の平均低下
使い方→
38. 38
参考文献
1) Random Forestで計算できる特徴量の重要度、URL(最終アクセス:2014/3/25):
http://alfredplpl.hatenablog.com/entry/2013/12/24/225420
2) パッケージユーザーのための機械学習(5):ランダムフォレスト、URL(最終アクセス:2014/3/25):
http://tjo.hatenablog.com/entry/2013/12/24/190000
3) Stop using bivariate correlations for variable selection、URL(最終アクセス:2014/3/25):
http://jacobsimmering.com/2014/03/20/BivariateCorrelations.html
4) Strobl et al., 2009, “Party on! A New, Conditional Variable Importance Measure for Random Forests Available in the
party Package”
5) Becker et al., 2009, “penalizedSVM: a R-package for feature selection SVM classification”
6) 「森の神様」写真 、URL(最終アクセス:2014/3/28): http://morisong04.exblog.jp/i22/
7) Twitter icon by Nikola Lazarevic、URL(最終アクセス:2014/3/25): http://webexpedition18.com/work/extreme-grunge-
social-media-garments-icons-pack/