This is a Japanese manual of IGOR Pro. IGOR Pro is a scientific data analysis software, numerical computing environment and programming language, supplied by WaveMetrics.
このスライドは、IGOR Proの日本語マニュアルである。 IGOR Proは、WaveMetrics社が提供する科学データ解析ソフトウェア、数値計算環境、プログラミング言語である。
1. The document discusses probabilistic modeling and variational inference. It introduces concepts like Bayes' rule, marginalization, and conditioning.
2. An equation for the evidence lower bound is derived, which decomposes the log likelihood of data into the Kullback-Leibler divergence between an approximate and true posterior plus an expected log likelihood term.
3. Variational autoencoders are discussed, where the approximate posterior is parameterized by a neural network and optimized to maximize the evidence lower bound. Latent variables are modeled as Gaussian distributions.
1. The document discusses probabilistic modeling and variational inference. It introduces concepts like Bayes' rule, marginalization, and conditioning.
2. An equation for the evidence lower bound is derived, which decomposes the log likelihood of data into the Kullback-Leibler divergence between an approximate and true posterior plus an expected log likelihood term.
3. Variational autoencoders are discussed, where the approximate posterior is parameterized by a neural network and optimized to maximize the evidence lower bound. Latent variables are modeled as Gaussian distributions.
How to generate PowerPoint slides Non-manually using RSatoshi Kato
Introduction to:
- Basic idea and procedure of {officer} package
- Getting started: Embedding texts, tables and figures in slides
- PowerPoint Structure: Layouts and Placeholders
- Making a template for specific layouts
- Making a template for your own slide-layouts
Resources are avail at: https://github.com/katokohaku/powerpoint_with_officer
Exploratory data analysis using xgboost package in RSatoshi Kato
Explain HOW-TO procedure exploratory data analysis using xgboost (EDAXGB), such as feature importance, sensitivity analysis, feature contribution and feature interaction. It is just based on using built-in predict() function in R package.
All of the sample codes are available at: https://github.com/katokohaku/EDAxgboost
a Japanese introduction of an R package {featuretweakR }
available from: https://github.com/katokohaku/featureTweakR
reference: "Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking" (https://arxiv.org/abs/1706.06691). Codes are at my Github (https://github.com/katokohaku/feature_tweaking)
Outline of Genetic Algorithm + Searching for Maximum Value of Function and Traveling Salesman Problem using R.
To view source codes and animation:
Searching for Maximum Value of Function
- https://github.com/katokohaku/evolutional_comptutation/blob/master/chap2.1.Rmd
Traveling Salesman Problem
- https://github.com/katokohaku/evolutional_comptutation/blob/master/chap2.2.Rmd
Intoroduction & R implementation of "Interpretable predictions of tree-based ...Satoshi Kato
a Japanese introduction and an R implementation of "Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking" (https://arxiv.org/abs/1706.06691). Codes are at my Github (https://github.com/katokohaku/feature_tweaking)
Introduction of "the alternate features search" using RSatoshi Kato
Introduction of the alternate features search using R, proposed in the paper. S. Hara, T. Maehara, Finding Alternate Features in Lasso, 1611.05940, 2016.
Introduction of sensitivity analysis for randamforest regression, binary classification and multi-class classification of random forest using {forestFloor} package
Imputation of Missing Values using Random ForestSatoshi Kato
missForest packageの紹介
“MissForest - nonparametric missing value imputation for mixed-type data (DJ Stekhoven, P Bühlmann (2011), Bioinformatics 28 (1), 112-118)
22. (参考)perplexityの自動調整①
Automatic Selection of t-SNE Perplexity
• Cao and Wang 2017
• よさそうなperplexityを探すのは計算時間が掛かる+目視での評
価になるため、選択基準を与えて自動探索したい
• KL(P||Q) にperplexityによる罰則項を加えた指標を提案
• この指標が有効なら、S(perp)の最小点を二分探索(=自動
化)できる
• 探索範囲は、 5 ~ データサイズ/3のあいだ
23. 概要:SNE ②
データ点とマップ点の距離をそれぞれ確率分布モデルで表現し、
両者がなるべく近づくようなマップ点を探索する
2.マップ点 yi から yj までの類似度を、条件付き確率分布qj|iと考える
• qj|iは平均 yi ,分散 σ=1/√2の正規分布で定義される
• ただし、qi|i=0
3.pj|i と qj|i とが近づくように、yi を探索する
• pj|i と qj|iとの距離は、KL情報量(Kullback–Leibler divergence)で表現
• すべてのデータ点に対するKL情報量の和を損失関数 とし、勾配法で最小化
●問題点
• コスト関数の最適化が困難
• Crowding problem
24. (参考)perplexityの自動調整①
Automatic Selection of t-SNE Perplexity
• Cao and Wang 2017
• よさそうなperplexityを探すのは計算時間が掛かる+目視での評
価になるため、選択基準を与えて自動探索したい
• KL(P||Q) にperplexityによる罰則項を加えた指標を提案
• この指標が有効なら、S(perp)の最小点を二分探索(=自動
化)できる
• 探索範囲は、 5 ~ データサイズ/3のあいだ
KL(P||Q)が得られれば
S(perp)は計算できる
25. (参考)perplexityの自動調整②
Automatic Selection of t-SNE Perplexity
• Rtsne() は、KL(P||Q) を返してくれるので計算できる
itercosts:
• The total costs (KL-divergence) for all objects
in every 50th + the last iteration
pen.KLdist = last(mapping.tsne$itercosts) +
log(mapping.tsne$N) * perplexity/ mapping.tsne$N)
30. 概要? :UMAP
• The details for the underlying mathematics can be
found in:
• McInnes, L, Healy, J, UMAP: Uniform Manifold
Approximation and Projection for Dimension Reduction,
ArXiv e-prints 1802.03426, 2018
• https://arxiv.org/abs/1802.03426
• The important thing is that:
• you don't need to worry about that
• you can use UMAP right now for dimension reduction and
visualisation as easily as a drop in replacement for scikit-
learn's t-SNE.
※ ただし「拡張とか考えるなら、きちんと理解しておけ」みたいなことは書いてある
https://github.com/lmcinnes/umap