The document contains mathematical equations and notation related to machine learning and probability distributions. It involves defining terms like P(y|x), which represents the probability of outcome y given x, and exploring ways to calculate the expected value of an objective function Rn under different probability distributions p and q over the variables x and y. The goal appears to be to select parameters θ to optimize some objective while accounting for the distributions of the training data.
The document contains mathematical equations and notation related to machine learning and probability distributions. It involves defining terms like P(y|x), which represents the probability of outcome y given x, and exploring ways to calculate the expected value of an objective function Rn under different probability distributions p and q over the variables x and y. The goal appears to be to select parameters θ to optimize some objective while accounting for the distributions of the training data.
Introduction of "the alternate features search" using RSatoshi Kato
Introduction of the alternate features search using R, proposed in the paper. S. Hara, T. Maehara, Finding Alternate Features in Lasso, 1611.05940, 2016.
大規模データセットでの推論に便利なSVIの概要をまとめました.
SVIは確率的最適化の枠組みで行う変分ベイズ法です.
随時更新してます.
参考文献
[1]Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. Stochastic variational inference. The Journal of Machine Learning Research, Vol. 14, No. 1, pp. 1303–1347, 2013.
[2] 佐藤一誠. トピックモデルによる統計的意味解析. コロナ社, 2015.
How to generate PowerPoint slides Non-manually using RSatoshi Kato
Introduction to:
- Basic idea and procedure of {officer} package
- Getting started: Embedding texts, tables and figures in slides
- PowerPoint Structure: Layouts and Placeholders
- Making a template for specific layouts
- Making a template for your own slide-layouts
Resources are avail at: https://github.com/katokohaku/powerpoint_with_officer
Exploratory data analysis using xgboost package in RSatoshi Kato
Explain HOW-TO procedure exploratory data analysis using xgboost (EDAXGB), such as feature importance, sensitivity analysis, feature contribution and feature interaction. It is just based on using built-in predict() function in R package.
All of the sample codes are available at: https://github.com/katokohaku/EDAxgboost
a Japanese introduction of an R package {featuretweakR }
available from: https://github.com/katokohaku/featureTweakR
reference: "Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking" (https://arxiv.org/abs/1706.06691). Codes are at my Github (https://github.com/katokohaku/feature_tweaking)
Outline of Genetic Algorithm + Searching for Maximum Value of Function and Traveling Salesman Problem using R.
To view source codes and animation:
Searching for Maximum Value of Function
- https://github.com/katokohaku/evolutional_comptutation/blob/master/chap2.1.Rmd
Traveling Salesman Problem
- https://github.com/katokohaku/evolutional_comptutation/blob/master/chap2.2.Rmd
Intoroduction & R implementation of "Interpretable predictions of tree-based ...Satoshi Kato
a Japanese introduction and an R implementation of "Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking" (https://arxiv.org/abs/1706.06691). Codes are at my Github (https://github.com/katokohaku/feature_tweaking)
Introduction of sensitivity analysis for randamforest regression, binary classification and multi-class classification of random forest using {forestFloor} package
Imputation of Missing Values using Random ForestSatoshi Kato
missForest packageの紹介
“MissForest - nonparametric missing value imputation for mixed-type data (DJ Stekhoven, P Bühlmann (2011), Bioinformatics 28 (1), 112-118)
5. Lassoの変数選択とオラクル性(Oracle Property)
真のモデルをなるべく精度良く再現したい
• 選択される変数の一致性(selection consistency)
• 重要な変数を取りこぼさず選択し、ノイズとなる変数はすべて除外する
• サンプルサイズn が大きくなるとき,0 でない係数(βj = 0)を持つ説明変数が正しく選択される確率
が1 に収束する
• 推定量の一致性(estimation consistency)
• 非ゼロな係数の推定量は、真値に収束する
• 0 でない係数を持つ説明変数に対する推定量が、漸近不偏,漸近正規性を持つ
J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties.
Jour-nal of the American Statistical Association, 96 , 1348–1360, 2001.