Q. H. Tran and Y. Hasegawa, Topological time-series analysis with delay-variant embedding, Oral Presentation at Conference on Complex Systems, Singapore, Singapore, Oct. 2019.
SIAM-AG21-Topological Persistence Machine of Phase TransitionHa Phuong
Presentation at SIAM Conference on Applied Algebraic Geometry (AG21), Aug. 2021.
Abstract. The study of phase transitions using data-driven approaches is challenging, especially when little prior knowledge of the system is available. Topological data analysis is an emerging framework for characterizing the shape of data and has recently achieved success in detecting structural transitions in material science, such as the glass--liquid transition. However, data obtained from physical states may not have explicit shapes as structural materials. We thus propose a general framework, termed “topological persistence machine," to construct the shape of data from correlations in states so that we can subsequently decipher phase transitions via qualitative changes in the shape. Our framework enables an effective and unified approach in phase transition analysis without having prior knowledge about phases or requiring the investigation of the system with large size. We demonstrate the efficacy of the approach in terms of detecting the Berezinskii--Kosterlitz--Thouless phase transition in the classical XY model and quantum phase transitions in the transverse Ising and Bose--Hubbard models. Interestingly, while these phase transitions have proven to be notoriously difficult to analyze using traditional methods, they can be characterized through our framework without requiring prior knowledge of the phases. Our approach is thus expected to be widely applicable and will provide the prospective with practical interests in exploring the phases of experimental physical systems.
Topological Data Analysis: visual presentation of multidimensional data setsDataRefiner
Topology data analysis (TDA) is an unsupervised approach which may revolutionise the way data can be mined and eventually drive the new generation of analytical tools. The idea behind TDA is an attempt to "measure" shape of data and find compressed combinatorial representation of the shape. In ordinary topology, the combinatorial representations serve the purpose of providing the compressed representation of high dimensional data sets which retains information about the geometric relationships between data points. TDA can also be used as a very powerful clustering technique. Edward will present the comparison between TDA and other dimension reduction algorithms like PCA, LLE, Isomap, MDS, and Spectral Embedding.
The document provides an overview of topological data analysis methods and examples of applications. It describes topological data analysis as a method for partial clustering that allows overlaps between clusters. It also outlines techniques like persistent homology and the Mapper algorithm. Applications discussed include identifying subtypes of diabetes and breast cancer using high-dimensional gene expression and medical data.
Topological data analysis analyzes large, complicated datasets by representing data points as nodes in a network and their relationships as edges. It has three key properties: coordinate invariance, which allows it to analyze data regardless of its coordinate system; deformation invariance, which means the analysis is unaffected by distortions of the data; and compressed representations, which allow it to represent complex shape patterns in fewer dimensions. These properties enable topological data analysis to capture the underlying shape and structure of data to help analyze and understand even very large, complex datasets.
Tutorial of topological data analysis part 3(Mapper algorithm)Ha Phuong
The document provides an overview of the Mapper algorithm, a technique from topological data analysis. It begins by introducing basic concepts from topology like Reeb graphs and Morse theory. It then describes the key steps of the Mapper algorithm: (1) defining a filter function on the data, (2) clustering inverse images of the filter, and (3) connecting clusters to form a graph. The document discusses practical considerations like choosing filter functions and parameters. It also provides examples of applying Mapper for tasks like clustering, feature selection, and data exploration.
WiDS Alexandria, Egypt workshop in topological data analysis (Python and R code available on request), covering persistent homology, the Mapper algorithm, and discrete Ricci curvature. Examples include text data and social network data.
Topological Data Analysis and Persistent HomologyCarla Melia
This document provides an overview of topological data analysis and persistent homology. It discusses how topological data analysis uses techniques from fields like statistics, computer science, and algebraic topology to infer robust features about complex datasets. Persistent homology in particular analyzes the homology of filtrations to study topological features across different scales. The document also describes implementations of topological data analysis techniques and applications to areas such as brain networks, periodic systems, and cosmological data analysis.
Introduction to Topological Data AnalysisMason Porter
Here are slides for my 3/14/21 talk on an introduction to topological data analysis.
This is the first talk in our Short Course on topological data analysis at the 2021 American Physical Society (APS) March Meeting: https://march.aps.org/program/dsoft/gsnp-short-course-introduction-to-topological-data-analysis/
This doctoral dissertation defense presentation summarizes Yang Yang's dissertation work on developing data-adaptive methods for analyzing genome-wide association studies (GWAS) using longitudinal data. The presentation includes background on GWAS and longitudinal data analysis, the overall study design involving simulation studies and application to a real dataset, and three proposed journal articles describing novel methods for SNP-set and pathway-based association tests. The goal of the work is to develop more powerful statistical approaches for detecting genetic associations using the additional information from longitudinal phenotypes in GWAS.
Spatial autocorrelation refers to the similarity of objects near each other in space. It is measured using global and local statistics. Global measures like Moran's I provide a single value for the whole data set, indicating if nearby things tend to be more similar. Local measures like LISA (Local Indicators of Spatial Association) calculate a statistic for each observation to identify local clusters, or "hot spots" and "cold spots". Moran's I is a correlation between a value and the average of its neighbors, while Geary's C uses the actual values. LISA extends these global concepts to detect significant spatial autocorrelation for individual locations.
This document contains information about 3D display methods in computer graphics presented by a group of 5 students. It discusses parallel projection, perspective projection, depth cueing, visible line identification, and surface rendering techniques. The goal is to generate realistic 3D images and correctly display depth relationships between objects.
Tutorial of topological_data_analysis_part_1(basic)Ha Phuong
This document provides an overview of topological data analysis (TDA) concepts, including:
- Simplicial complexes which represent topological spaces and holes of different dimensions
- Persistent homology which tracks the appearance and disappearance of holes over different scales
- Applications of TDA concepts like using persistent homology to analyze protein compressibility.
This document summarizes research on transfer defect learning to improve cross-project defect prediction. It presents Transfer Component Analysis (TCA) as a state-of-the-art transfer learning technique that maps data from source and target projects into a shared feature space to make their distributions more similar. It then proposes TCA+ which augments TCA with data normalization and decision rules to select the optimal normalization method based on characteristics of the source and target datasets. Experimental results on two cross-project defect prediction datasets show that TCA+ significantly outperforms traditional cross-project prediction and basic TCA.
Tacheometry is a surveying method that uses angular measurements from a tacheometer to determine horizontal and vertical distances. It is well-suited for hilly areas where chaining distances is difficult. The document provides procedures to determine the multiplying and additive constants of a tacheometer through stadia tacheometry. This involves setting up the instrument and measuring staff intercepts at known distances to solve equations and calculate the constants. The constants are then used in tacheometric formulas to determine horizontal distances, vertical distances, and elevations for different sighting configurations of the staff.
This document discusses 2D transformations in computer graphics including translation, rotation, scaling, and combining transformations using homogeneous coordinates and transformation matrices. It provides examples of translating, rotating, and scaling polygons and explains that the order of transformations matters as matrix multiplication does not commute, so the final result depends on the order the transformations are applied.
This document discusses geospatial digital twins. It begins by introducing the vision of digital earth and digital twins. It then discusses how digital twin technology can disrupt and improve geospatial business processes like data acquisition, storage, processing, and presentation. Examples of digital twins for healthcare and aircraft simulations are provided. The document also discusses VirtualSingapore, a 3D digital twin of Singapore used for urban planning, disaster management and tourism. It explores how technologies like crowdsourced data, augmented reality, and 3D geospatial analytics can enhance geospatial digital twins. In the end, the document envisions how digital twins could allow users to interactively explore and zoom in on high resolution geospatial data from space down to individual objects.
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
Review of the t-SNE algorithm which helps visualizing the high dimensional data on manifold by projecting them onto 2D or 3D space with metric preserving.
This document discusses point pattern analysis, which involves finding and explaining patterns in maps of point locations. It introduces key concepts like point patterns, windows, kernel density estimation, and nearest neighbor analysis. Kernel density estimation creates a smooth surface showing the density of points across an area. Nearest neighbor analysis examines the cumulative distribution of distances to each point's nearest neighbor, and can identify clustered, uniform, or random patterns. Significance is tested using simulations.
This document discusses exploratory spatial data analysis (ESDA) and the software GeoDa. It defines ESDA as the visualization and exploration of data that takes geographic location into account to identify patterns like clusters and outliers. The document explains that GeoDa is a free, open-source software for ESDA developed by Dr. Luc Anselin. It can be used to create choropleth maps, histograms, scatter plots, and identify clusters through tools like box maps and multivariate LISA. Examples using health and demographic data from Rwanda are provided to demonstrate GeoDa's univariate and multivariate analysis features.
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
PR-050: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
Original Slide from http://home.cse.ust.hk/~xshiab/data/valse-20160323.pptx
Youtube: https://youtu.be/3cFfCM4CXws
t-SNE is a modern visualization algorithm that presents high-dimensional data in 2 or 3 dimensions according to some desired distances. If you have some data and you can measure their pairwise differences, t-SNE visualization can help you identify various clusters.
Slides for a talk about Graph Neural Networks architectures, overview taken from very good paper by Zonghan Wu et al. (https://arxiv.org/pdf/1901.00596.pdf)
During the past decade, the size of 3D seismic data volumes and the number of seismic attributes have increased
to the extent that it is difficult, if not impossible, for interpreters to examine every seismic line and time
slice. To address this problem, several seismic facies classification algorithms including k-means, self-organizing
maps, generative topographic mapping, support vector machines, Gaussian mixture models, and artificial neural
networks have been successfully used to extract features of geologic interest from multiple volumes. Although
well documented in the literature, the terminology and complexity of these algorithms may bewilder the average
seismic interpreter, and few papers have applied these competing methods to the same data volume. We have
reviewed six commonly used algorithms and applied them to a single 3D seismic data volume acquired over the
Canterbury Basin, offshore New Zealand, where one of the main objectives was to differentiate the architectural
elements of a turbidite system. Not surprisingly, the most important parameter in this analysis was the choice of
the correct input attributes, which in turn depended on careful pattern recognition by the interpreter. We found
that supervised learning methods provided accurate estimates of the desired seismic facies, whereas unsupervised
learning methods also highlighted features that might otherwise be overlooked.
Machine Learning for Scientific ApplicationsDavid Lary
1) The document discusses using machine learning techniques like neural networks to help calibrate and reduce biases in different satellite datasets measuring atmospheric variables like inorganic chlorine (Cly).
2) Measurements of Cly are limited and models show a wide range, so machine learning could help constrain estimates by determining relationships between datasets.
3) Inter-comparisons of satellite instruments show biases that machine learning may be able to correct, like using neural networks to recalibrate one dataset based on others. This could provide longer, more consistent time series inputs for models.
This doctoral dissertation defense presentation summarizes Yang Yang's dissertation work on developing data-adaptive methods for analyzing genome-wide association studies (GWAS) using longitudinal data. The presentation includes background on GWAS and longitudinal data analysis, the overall study design involving simulation studies and application to a real dataset, and three proposed journal articles describing novel methods for SNP-set and pathway-based association tests. The goal of the work is to develop more powerful statistical approaches for detecting genetic associations using the additional information from longitudinal phenotypes in GWAS.
Spatial autocorrelation refers to the similarity of objects near each other in space. It is measured using global and local statistics. Global measures like Moran's I provide a single value for the whole data set, indicating if nearby things tend to be more similar. Local measures like LISA (Local Indicators of Spatial Association) calculate a statistic for each observation to identify local clusters, or "hot spots" and "cold spots". Moran's I is a correlation between a value and the average of its neighbors, while Geary's C uses the actual values. LISA extends these global concepts to detect significant spatial autocorrelation for individual locations.
This document contains information about 3D display methods in computer graphics presented by a group of 5 students. It discusses parallel projection, perspective projection, depth cueing, visible line identification, and surface rendering techniques. The goal is to generate realistic 3D images and correctly display depth relationships between objects.
Tutorial of topological_data_analysis_part_1(basic)Ha Phuong
This document provides an overview of topological data analysis (TDA) concepts, including:
- Simplicial complexes which represent topological spaces and holes of different dimensions
- Persistent homology which tracks the appearance and disappearance of holes over different scales
- Applications of TDA concepts like using persistent homology to analyze protein compressibility.
This document summarizes research on transfer defect learning to improve cross-project defect prediction. It presents Transfer Component Analysis (TCA) as a state-of-the-art transfer learning technique that maps data from source and target projects into a shared feature space to make their distributions more similar. It then proposes TCA+ which augments TCA with data normalization and decision rules to select the optimal normalization method based on characteristics of the source and target datasets. Experimental results on two cross-project defect prediction datasets show that TCA+ significantly outperforms traditional cross-project prediction and basic TCA.
Tacheometry is a surveying method that uses angular measurements from a tacheometer to determine horizontal and vertical distances. It is well-suited for hilly areas where chaining distances is difficult. The document provides procedures to determine the multiplying and additive constants of a tacheometer through stadia tacheometry. This involves setting up the instrument and measuring staff intercepts at known distances to solve equations and calculate the constants. The constants are then used in tacheometric formulas to determine horizontal distances, vertical distances, and elevations for different sighting configurations of the staff.
This document discusses 2D transformations in computer graphics including translation, rotation, scaling, and combining transformations using homogeneous coordinates and transformation matrices. It provides examples of translating, rotating, and scaling polygons and explains that the order of transformations matters as matrix multiplication does not commute, so the final result depends on the order the transformations are applied.
This document discusses geospatial digital twins. It begins by introducing the vision of digital earth and digital twins. It then discusses how digital twin technology can disrupt and improve geospatial business processes like data acquisition, storage, processing, and presentation. Examples of digital twins for healthcare and aircraft simulations are provided. The document also discusses VirtualSingapore, a 3D digital twin of Singapore used for urban planning, disaster management and tourism. It explores how technologies like crowdsourced data, augmented reality, and 3D geospatial analytics can enhance geospatial digital twins. In the end, the document envisions how digital twins could allow users to interactively explore and zoom in on high resolution geospatial data from space down to individual objects.
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
Review of the t-SNE algorithm which helps visualizing the high dimensional data on manifold by projecting them onto 2D or 3D space with metric preserving.
This document discusses point pattern analysis, which involves finding and explaining patterns in maps of point locations. It introduces key concepts like point patterns, windows, kernel density estimation, and nearest neighbor analysis. Kernel density estimation creates a smooth surface showing the density of points across an area. Nearest neighbor analysis examines the cumulative distribution of distances to each point's nearest neighbor, and can identify clustered, uniform, or random patterns. Significance is tested using simulations.
This document discusses exploratory spatial data analysis (ESDA) and the software GeoDa. It defines ESDA as the visualization and exploration of data that takes geographic location into account to identify patterns like clusters and outliers. The document explains that GeoDa is a free, open-source software for ESDA developed by Dr. Luc Anselin. It can be used to create choropleth maps, histograms, scatter plots, and identify clusters through tools like box maps and multivariate LISA. Examples using health and demographic data from Rwanda are provided to demonstrate GeoDa's univariate and multivariate analysis features.
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
PR-050: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
Original Slide from http://home.cse.ust.hk/~xshiab/data/valse-20160323.pptx
Youtube: https://youtu.be/3cFfCM4CXws
t-SNE is a modern visualization algorithm that presents high-dimensional data in 2 or 3 dimensions according to some desired distances. If you have some data and you can measure their pairwise differences, t-SNE visualization can help you identify various clusters.
Slides for a talk about Graph Neural Networks architectures, overview taken from very good paper by Zonghan Wu et al. (https://arxiv.org/pdf/1901.00596.pdf)
During the past decade, the size of 3D seismic data volumes and the number of seismic attributes have increased
to the extent that it is difficult, if not impossible, for interpreters to examine every seismic line and time
slice. To address this problem, several seismic facies classification algorithms including k-means, self-organizing
maps, generative topographic mapping, support vector machines, Gaussian mixture models, and artificial neural
networks have been successfully used to extract features of geologic interest from multiple volumes. Although
well documented in the literature, the terminology and complexity of these algorithms may bewilder the average
seismic interpreter, and few papers have applied these competing methods to the same data volume. We have
reviewed six commonly used algorithms and applied them to a single 3D seismic data volume acquired over the
Canterbury Basin, offshore New Zealand, where one of the main objectives was to differentiate the architectural
elements of a turbidite system. Not surprisingly, the most important parameter in this analysis was the choice of
the correct input attributes, which in turn depended on careful pattern recognition by the interpreter. We found
that supervised learning methods provided accurate estimates of the desired seismic facies, whereas unsupervised
learning methods also highlighted features that might otherwise be overlooked.
Machine Learning for Scientific ApplicationsDavid Lary
1) The document discusses using machine learning techniques like neural networks to help calibrate and reduce biases in different satellite datasets measuring atmospheric variables like inorganic chlorine (Cly).
2) Measurements of Cly are limited and models show a wide range, so machine learning could help constrain estimates by determining relationships between datasets.
3) Inter-comparisons of satellite instruments show biases that machine learning may be able to correct, like using neural networks to recalibrate one dataset based on others. This could provide longer, more consistent time series inputs for models.
Space-Time in the Matrix and Uses of Allen Temporal Operators for Stratigraph...Keith.May
This document discusses using Allen temporal operators to model stratigraphic relationships in archaeological analysis. It summarizes the key temporal relationships identified by Allen that are useful for modeling stratigraphy, including before, meets, overlaps, during, starts and finishes. The document also discusses issues with inconsistent standards for digitally archiving stratigraphic data and relationships, and the need for standards to make this fundamental archaeological data more reusable. Finally, it calls for international conventions on stratigraphic recording and analysis to facilitate understanding and communication across disciplines.
A new-quantile-based-fuzzy-time-series-forecasting-modelCemal Ardil
The document presents a new quantile based fuzzy time series forecasting model. It begins by reviewing existing fuzzy time series forecasting methods and their applications. It then proposes a new method that bases forecasts on predicting future trends in the data using third order fuzzy relationships. The method converts statistical quantiles into fuzzy quantiles using membership functions. It uses a fuzzy metric and trend forecast to calculate future values. The method is applied to TAIFEX index forecasting. Results show the proposed method performs comparably better than other fuzzy time series methods in terms of complexity and forecasting accuracy.
Graph Machine Learning - Past, Present, and Future -kashipong
Graph machine learning, despite its many commonalities with graph signal processing, has developed as a relatively independent field.
This presentation will trace the historical progression from graph data mining in the 1990s, through graph kernel methods in the 2000s, to graph neural networks in the 2010s, highlighting the key ideas and advancements of each era. Additionally, recent significant developments, such as the integration with causal inference, will be discussed.
A box plot (also called a box and whisker plot) is a graph that presents information from a five-number summary. It does not show a distribution in as much detail as a stem and leaf plot or histogram does, but is especially useful for indicating whether a distribution is skewed and whether there are potential unusual observations (outliers) in the data set. Box and whisker plots are very useful when working with small sets of data.
A data plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a mechanical or electronic plotter.
Application of panel data to the effect of five (5) world development indicat...Alexander Decker
This document discusses the application of panel data analysis to examine the effect of 5 world development indicators (WDI) on GDP per capita for 20 African Union countries from 1981 to 2011. It presents the panel data model, describes the methodology used as fixed effects regression, and provides sample output of the panel data format and regression results. The key world development indicators examined are official exchange rate, broad money, inflation rate, total natural resources rents, and foreign direct investment.
Application of panel data to the effect of five (5) world development indicat...Alexander Decker
This document discusses applying a panel data model to analyze the effect of 5 world development indicators (WDI) on GDP per capita for 20 African Union countries from 1981 to 2011. It introduces panel data modeling and the fixed effects model specifically. The fixed effects model is estimated using least squares dummy variable regression to account for country-specific effects. The results of analyzing the relationship between GDP per capita and the 5 WDI (exchange rate, money supply, inflation, natural resources, foreign investment) using this fixed effects panel data model are then presented.
The document discusses representing relational spatiotemporal data using information granules. It proposes:
1) Describing the relational data using a vocabulary of granular descriptors formed from Cartesian products of spatial, temporal, and signal information granules. This granular representation provides an interpretable perspective on the data.
2) Analyzing the capabilities of different vocabularies to capture the essence of the data through the processes of granulation and degranulation, where the original data is reconstructed from its granular representation. The quality of reconstruction is used to optimize the vocabulary.
3) Extending the approach to analyze evolvability of the granular description as the relational data changes across consecutive
Robust Block-Matching Motion Estimation of Flotation Froth Using Mutual Infor...CSCJournals
This document presents a new method for estimating motion in flotation froth images using mutual information as the similarity metric. It proposes using mutual information with a bin size of two (MI2) for block matching motion estimation of froth images. The paper finds that MI2 improves motion estimation accuracy in terms of peak signal-to-noise ratio compared to the commonly used mean absolute difference (MAD) metric, while having a similar computational cost. It tests the MI2 method on two froth video sequences using three-step search and new three-step search, and finds MI2 yields slightly better reconstructed image quality than MAD according to PSNR measurements.
This document presents a new multivariate fuzzy time series forecasting method to predict car road accidents. The method uses four secondary factors (number killed, mortally wounded, died 30 days after accident, severely wounded, and lightly casualties) along with the main factor of total annual car accidents in Belgium from 1974 to 2004. The new method establishes fuzzy logical relationships between the factors to generate forecasts. Experimental results show the proposed method performs better than existing fuzzy time series forecasting approaches at predicting car accidents. Actuaries can use this kind of multivariate fuzzy time series analysis to help define insurance premiums and underwriting.
The document provides an introduction to geostatistics and variogram analysis. It defines key concepts in geostatistics such as variograms, covariance, correlation, and semivariance. It discusses how these statistics are used to characterize the spatial correlation and continuity of natural phenomena. The document also presents an example analysis of porosity data from an oil field, including exploring the data distribution, computing sample variograms, and fitting theoretical variogram models for use in kriging and stochastic simulation.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
This document discusses spatial analysis and analysis tools. It begins by defining spatial analysis as techniques for analyzing spatial data where the results depend on object locations. It then describes 7 types of spatial analysis: spatial data analysis, spatial autocorrelation, spatial interpolation, spatial regression, spatial interaction, simulation and modelling, and multiple-point geostatistics. The document also discusses various analysis toolsets including map algebra, math tools, multi-variate tools, neighborhood tools, raster tools, reclassification tools, and solar radiation tools. It emphasizes that spatial analysis is useless without spatial infographics and visualization.
- The document discusses various methods for collecting structural data sets including field data, remote sensing, digital elevation models, seismic data, experimental modeling, and numerical modeling.
- It emphasizes the importance of field observations but notes they are limited by human biases. Remote sensing helps overcome these limitations but should be grounded in field data.
- The document provides examples of specific techniques within each method and discusses advantages and disadvantages. It stresses the complexity of nature challenges even powerful modeling and the need for well-organized data analysis.
DIGITAL TOPOLOGY OPERATING IN MEDICAL IMAGING WITH MRI TECHNOLOGY.pptxmathematicssac
Digital topology and geometry refers to using topological and geometric properties of digital images. This document discusses digital topology and geometry and their roles in medical imaging applications. It provides background on topology, digital images, and neighborhoods. Medical imaging techniques like MRI use magnetic fields to image body tissues. Digital topology and geometry are useful for many medical imaging applications and clinical/research studies by examining abnormalities, tumors, injuries, and diseases. This project aims to disseminate mathematical methods in medical imaging.
Universal Approximation Property via Quantum Feature Maps
----
The quantum Hilbert space can be used as a quantum-enhanced feature space in machine learning (ML) via the quantum feature map to encode classical data into quantum states. We prove the ability to approximate any continuous function with optimal approximation rate via quantum ML models in typical quantum feature maps.
---
Contributed talk at Quantum Techniques in Machine Learning 2021, Tokyo, November 8-12 2021.
By Quoc Hoan Tran, Takahiro Goto and Kohei Nakajima
018 20160902 Machine Learning Framework for Analysis of Transport through Com...Ha Phuong
This document proposes a machine learning framework to analyze fluid flow through porous media. It involves:
1) Discrete element modeling of granular materials to generate pore structure data.
2) Finite element modeling of fluid flow simulations to calculate permeability.
3) Construction of pore and contact networks from structure data.
4) Calculation of network features like centrality measures related to permeability.
5) Feature selection and machine learning models to predict permeability from network features.
017_20160826 Thermodynamics Of Stochastic Turing MachinesHa Phuong
Show how to construct stochastic models which mimic the
behavior of a general-purpose computer (a Turing machine).
Discrete state systems obeying a Markovian master equation,
which are logically reversible and have a well-defined and
consistent thermodynamic interpretation
016_20160722 Molecular Circuits For Dynamic Noise FilteringHa Phuong
The document discusses a molecular analog of the Kalman filter that was developed to dynamically filter noise in molecular systems. Specifically:
- Researchers developed a Poisson noise filter using biochemical reactions as a molecular version of the Kalman filter to improve the reliability of synthetic biological circuits.
- The filter has the potential to play an important role in developing new medical therapies by making biological circuits more robust to noise and changing conditions similarly to electrical circuits.
- The Poisson filter cancels out the effects of molecular environment noise through optimal signal estimation based on a kinetic model driven by external inputs.
015_20160422 Controlling Synchronous Patterns In Complex NetworksHa Phuong
This document summarizes a research paper that presents a framework for controlling synchronization patterns in complex networks of coupled chaotic oscillators. The framework groups network nodes into clusters and uses pinning coupling to control the large network by stabilizing unstable synchronous patterns associated with network symmetries. The approach is mathematically analyzed and demonstrated on a six-node network of coupled Lorenz oscillators, showing it can control the system's synchronous mode.
This document discusses using persistent homology to analyze the topological structure of proteins and relate it to protein compressibility. It summarizes that researchers modeled protein molecules as alpha filtrations to obtain multi-scale insight into their tunnel and cavity structures. The persistence diagrams of the alpha filtrations capture the sizes and robustness of these features in a compact way. The researchers found a clear linear correlation between their topological measure and experimentally determined protein compressibility values.
This document discusses using topological data analysis to analyze the spread of contagions on networks. It introduces contagion maps, which embed network nodes as point clouds based on contagion transmission times. The topology, geometry, and dimensionality of these point clouds are then analyzed and compared to the underlying network manifold. This reveals insights into whether contagion dynamics follow the network's geometric structure. Numerical experiments on simulated networks demonstrate that contagion maps can recover the underlying manifold when wavefront propagation dominates.
The variational Gaussian process (VGP), a Bayesian nonparametric model which adapts its shape to match com- plex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity.
009_20150201_Structural Inference for Uncertain NetworksHa Phuong
This paper develops methods for analyzing networks where the connections between nodes are only known probabilistically rather than exactly. It presents a maximum likelihood method for inferring community structure in such uncertain networks by fitting a generative model using EM and belief propagation algorithms. Evaluations on synthetic and real-world networks demonstrate the ability to accurately detect communities and recover the underlying network structure from uncertain edge probability data.
The document summarizes sampling methods from Chapter 11 of Bishop's PRML book. It introduces basic sampling algorithms like rejection sampling, importance sampling, and SIR. It then discusses Markov chain Monte Carlo (MCMC) methods which allow sampling from complex distributions using a Markov chain. Specific MCMC methods covered include the Metropolis algorithm, Gibbs sampling, and estimating the partition function using the IP algorithm.
The document summarizes key concepts from Chapter 10 of Bishop's PRML book on approximate inference using variational methods. It introduces variational inference as a deterministic alternative to importance sampling for approximating intractable distributions. Variational inference frames inference as an optimization problem of variationally approximating the true posterior using a simpler distribution from an assumed family. This is done by maximizing a lower bound on the marginal likelihood. Mean-field variational inference further assumes a factorized form for the variational distribution.
008 20151221 Return of Frustrating Easy Domain AdaptationHa Phuong
The document proposes a simple and effective method called CORrelation ALignment (CORAL) for unsupervised domain adaptation. CORAL minimizes domain shift by aligning the second-order statistics of the source and target distributions without requiring any target labels. The method whitens the source distribution and recolors it with the target covariance matrix. Experiments on object recognition and sentiment analysis tasks show CORAL outperforms other unsupervised domain adaptation methods.
007 20151214 Deep Unsupervised Learning using Nonequlibrium ThermodynamicsHa Phuong
The document discusses a new approach to unsupervised deep learning using concepts from nonequilibrium thermodynamics. Specifically, it proposes destroying structure in data through an iterative forward diffusion process, then learning the reverse diffusion process to restore structure and act as a generative model. This approach is shown to outperform other generative models on image datasets like CIFAR-10 and is able to perform tasks like inpainting. The diffusion process is modeled using Gaussian distributions and the reverse process is learned using a deep network as an approximator.
006 20151207 draws - Deep Recurrent Attentive WriterHa Phuong
The document summarizes the DRAW paper which introduces an attention mechanism to a generative model called DRAW. It augments encoders and decoders with recurrent neural networks. The attention mechanism allows the model to focus on subsets of input data for reading and writing during generation. This allows DRAW to generate MNIST digits sequentially while focusing attention on different parts at each time step, producing higher quality images than without attention. It can also classify cluttered MNIST images by focusing attention on the digit.
The document discusses four papers on adversarial networks:
- The 2013 paper "Intriguing Properties of Neural Networks" introduced the concept of adversarial examples and showed neural networks are susceptible to small perturbations.
- The 2015 paper "Explaining and Harnessing Adversarial Examples" proposed that adversarial examples exist due to the linear behavior of neural networks in high-dimensional spaces.
- The 2015 paper "Deep Neural Networks are Easily Fooled" evolved images to fool neural networks into classifying them with high confidence despite being unrecognizable to humans.
- The 2015 paper "Generative Adversarial Networks" introduced a framework that uses two neural networks, a generator and discriminator, competing against each other to generate new
This document summarizes a research paper that proposes using diffusion processes inspired by non-equilibrium statistical physics for unsupervised deep learning. The key idea is to systematically destroy structure in data distributions through iterative forward diffusion, then learn the reverse process to restore structure and generate new data. The authors show this approach yields a flexible generative model. It is trained by maximizing the likelihood of data under the diffusion/reverse diffusion process using techniques from statistical physics like annealed importance sampling.
This document discusses recent research on making neural networks faster through techniques like tensor decomposition, binary/ternary connect, and separable filters. One 2015 paper proposes binary and ternary connect methods that use only 1 or 2 bits per connection instead of full precision values. This allows neural networks to perform computations much faster while maintaining good performance levels. A 2013 paper introduces learning separable filters to decompose computations in neural network layers.
This paper examines how interconnected networks can fail catastrophically or remain stable. The author finds that if interconnections are provided by network hubs and the connections between networks are moderately convergent, the system of networks is stable and robust to failure. Two experiments on functional brain networks show that the brain's network topology maximizes stability, as the theory predicts interconnected networks will be stable if the connections are through hubs and moderately convergent.
This document discusses how PageRank fails as a ranking algorithm in growing networks where nodes enter the network over time. Through numerical simulations of models of growing networks, the study finds that PageRank is biased based on how long nodes have been in the network, rather than their true importance, measured by fitness. The indegree of nodes provides a less biased ranking than PageRank in these growing networks. The study also analyzes empirical data and finds PageRank does not correlate as strongly with total relevance, a measure of importance, as indegree does.
Deep Learning And Business Models (VNITC 2015-09-13)Ha Phuong
Deep Learning and Business Models
Tran Quoc Hoan discusses deep learning and its applications, as well as potential business models. Deep learning has led to significant improvements in areas like image and speech recognition compared to traditional machine learning. Some business models highlighted include developing deep learning frameworks, building hardware optimized for deep learning, using deep learning for IoT applications, and providing deep learning APIs and services. Deep learning shows promise across many sectors but also faces challenges in fully realizing its potential.
The Man Who Dared to Challenge Newton: The True Story of Thane Heins, the Canadian Genius
Who Changed the World
By Johnny Poppi – for international press
In a small town in Ontario, among wheat fields and wind-filled silences, a man has worked for decades in
anonymity, armed only with naive curiosity, motors, copper wires, and questions too big to ignore. His
name is Thane C/ Heins, and according to some scientists who have seen him in action, he may have
made—and indeed has made—the most important scientific discovery in the history of humanity.
A discovery which will eventually eliminate the need for oil, coal, and uranium, and at the very least their
harmful effects while eliminating the need to recharge electric vehicles, and even rewrite—as it has already
begun—the very laws of physics as we’ve known them since Aristotle in 300 BC.
Sound like science fiction? Then listen to this story.
Investigating the central role that theories of the visual arts and creativity played in the development of fascism in France, Mark Antliff examines the aesthetic dimension of fascist myth-making within the history of the avant-garde. Between 1909 and 1939, a surprising array of modernists were implicated in this project, including such well-known figures as the symbolist painter Maurice Denis, the architects Le Corbusier and Auguste Perret, the sculptors Charles Despiau and Aristide Maillol, the “New Vision” photographer Germaine Krull, and the fauve Maurice Vlaminck.
Applications of Radioisotopes in Cancer Research.pptxMahitaLaveti
:
This presentation explores the diverse and impactful applications of radioisotopes in cancer research, spanning from early detection to therapeutic interventions. It covers the principles of radiotracer development, radiolabeling techniques, and the use of isotopes such as technetium-99m, fluorine-18, iodine-131, and lutetium-177 in molecular imaging and radionuclide therapy. Key imaging modalities like SPECT and PET are discussed in the context of tumor detection, staging, treatment monitoring, and evaluation of tumor biology. The talk also highlights cutting-edge advancements in theranostics, the use of radiolabeled antibodies, and biodistribution studies in preclinical cancer models. Ethical and safety considerations in handling radioisotopes and their translational significance in personalized oncology are also addressed. This presentation aims to showcase how radioisotopes serve as indispensable tools in advancing cancer diagnosis, research, and targeted treatment.
Hemorrhagic Fever from Venezuala Medical Virology.pptxwamunsmith
https://www.scribd.com/archive/plans?slideshare=true please find attached the PowerPoint for the medical virology and that will be enough for you to see it.
CCS2019-opological time-series analysis with delay-variant embedding
1. Topological Time-Series Data Analysis with
Delay-Variant Embedding
The University ofTokyo
CCS2019@NTU
[email protected]
Graduate School of Information Science andTechnology
Tran Quoc Hoan (Ph.D. Candidate)
2. Motivation
Diagram of the hierarchical organization of biology at different scales and examples of data that can be collected at these different levels.
(Image source https://researcher.watson.ibm.com/researcher/view_group.php?id=5372)
E.g.,Variant scales of biology
◼ Reveal the black box (vs.
Deep Learning machines) to
understand the nature of
complex data
◼ Reveal the variant scales in
complex data
2019/9/30 TopologicalTime-SeriesAnalysis 2/16
3. VariantTopological Features
◼ Our research focuses on the shape of data to provide
insights into dynamics, variant scales via new and
general features which are robust under the perturbation
applied to the data.
◼ The “shape” of data
→ Appearance of holes in the high-dimensional space
◼ Dynamics &Variant scales
→ Oscillations, variant timescales.
◼ Perturbation
→ Noise added to data
2019/9/30
We focus on time-
series data in this talk
TopologicalTime-SeriesAnalysis 3/16
4. Persistent Homology
◼ An algebraic method to encode the topological structures
of data into quantitative features
Finite set of points, networks, etc.
We need
i.e., holes
➢ Mathematically define the “hole”
➢ Quantitatively calculate the “hole”
2019/9/30 TopologicalTime-SeriesAnalysis 4/16
6. What is hole?
◼ 0-dimensional holes: connected components
◼ 1-dimensional holes: rings, loops, tunnels
1-dimensional graph (in 𝐾) without
boundary, they are not boundary of
any 2-dimensional graph in 𝐾
Not a hole
𝐾
A hole
◼ 2-dimensional holes: cavities, voids
2-dimensional graph (in 𝐾) without
boundary, they are not boundary of
any 3-dimensional graph in 𝐾 Not a hole
solid
empty
A hole
2019/9/30 TopologicalTime-SeriesAnalysis 6/16
7. Represent the “hole”
◼ Idea: connect nearby points, fill in complete geometrical
shapes
𝜀
1. Choose
a distance
𝜀.
2. Connect
pairs of
points that
are no further
apart than 𝜀.
3. Fill in complete
geometrical shapes
(triangle, tetrahedron, etc.).
4. Homology detects the hole
Problem:
How do we choose
the distance 𝜀
2019/9/30 This figure is designed similarly with the slide “Introduction to Persistent Homology” of Dr. Matthew L.Wright 7/16
8. 𝜀
How to choose distance 𝜀?
This 𝜀
looks
good.
Innovation Idea
Consider all
distances 𝜀.
How to
distinguish
this hole with
other ?
2019/9/30 8/16
This figure is designed similarly with the slide “Introduction to Persistent Homology” of Dr. Matthew L.Wright
9. 𝜀: 0 1 2 3
Barcodes: Monitor the change of topological structures
2019/9/30 9/16
This figure is designed similarly with the slide “Introduction to Persistent Homology” of Dr. Matthew L.Wright
10. Topological features = Persistence Diagram
A persistence diagram is a two-
dimensional representation of a
barcode
◼ Multi-set points with coordinates (b,
d) which represent birth-scale (b) and
death-scale (d) of the hole.
Birth-scale
Death-scale
2019/9/30 TopologicalTime-SeriesAnalysis 10/16
12. Patterns from delay-embedding
Limit cycle
Limit torus Strange attractor
(Topological) Patterns
from delay-embedding
represent the behavior
of attractors which
provide insights into
dynamical system
Attractors of Dynamical System
Fixed point
2019/9/30 TopologicalTime-SeriesAnalysis 12/16
13. Problem in delay-embedding
Determining time delay is sensitive and problem-dependent
◼ Well-known methods: mutual information, auto correlation, etc
◼ A real timeseries is noisy and has a finite length
It is not well-defined to
evaluate the shape of
embedded points from
embedding space
2019/9/30 TopologicalTime-SeriesAnalysis 13/16
14. (Proposed) Delay-variant embedding
◼ Considering time delay 𝜏
as the variable parameter
◼ Monitor the variation of
topological structures in
the embedded space
◼ Construct the
topological features for
each 𝜏, then integrate
these features with
𝜏 serving as an additional
dimension
𝑥(𝑡)
𝑥(𝑡)
𝑥(𝑡 − 𝜏)
𝑥(𝑡 − 2𝜏)
2019/9/30 TopologicalTime-SeriesAnalysis
Three-dimensional persistence diagram (3PD)
14/16
15. Stability theorem
𝑑𝐵,𝜉
(3)
𝐷(𝑙,𝑚)
(3)
𝑥 , 𝐷(𝑙,𝑚)
(3)
𝑦 ≤ 2 𝑚 max
𝑡∈𝕋
𝑥 𝑡 − 𝑦(𝑡)
Theorem: Stability theorem
◼ If 𝑥 𝑡 is perturbed by noise to 𝑦 𝑡 = 𝑥 𝑡 + 𝜖(𝑡) then the upper limit
of the distance between diagrams is governed by magnitude of 𝜖(𝑡)
◼ The 3PDs are robust w.r.t time-series data being perturbed by noise
◼ The 3PDs can be used as discriminating features for characterizing the
time series
2019/9/30 TopologicalTime-SeriesAnalysis 15/24
16. Kernel method
◼ Can define an inner product
◼ Use in (linear) statistical-learning tasks (e.g.,
SVM)
𝐸
𝐹
The space of diagrams
◼ Not a vector space
◼ Difficult to use in (linear)
statistical-learning tasks (e.g.,
classification)
◼ Cannot define an inner product
Ω
Φ𝐸
Φ𝐹
, 𝐻𝑏
Feature mapping
Φ
Feature-mapped space
𝐻𝑏
Hilbert space
◼ Use in unsupervised learning tasks (e.g.,
Kernel PCA, Kernel Change Point Detection)
2019/9/30 TopologicalTime-SeriesAnalysis 16/16
17. Scenarios for Applications
◼ Identify the dynamics of
biological model via observed
noisy biological timeseries
➢ E.g., stochastic oscillations in
single-cell live imaging time
series
◼ Classify real time-series data
➢ E.g., ECG data, sensor data
(Image source https://slideplayer.com/slide/7612898/ )
Stochastic model of the Hes1
genetic oscillator (N.A. Monk,
Curr. Biol. 13, 2003)
2019/9/30 17/16
Topological time-series analysis with delay-variant embedding, Physical Review E 100, 032308, 2019