Feature engineering--the underdog of machine learning. This deck provides an overview of feature generation methods for text, image, audio, feature cleaning and transformation methods, how well they work and why.
Transformer modality is an established architecture in natural language processing that utilizes a framework of self-attention with a deep learning approach.
This presentation was delivered under the mentorship of Mr. Mukunthan Tharmakulasingam (University of Surrey, UK), as a part of the ScholarX program from Sustainable Education Foundation.
The document discusses gradient descent methods for unconstrained convex optimization problems. It introduces gradient descent as an iterative method to find the minimum of a differentiable function by taking steps proportional to the negative gradient. It describes the basic gradient descent update rule and discusses convergence conditions such as Lipschitz continuity, strong convexity, and condition number. It also covers techniques like exact line search, backtracking line search, coordinate descent, and steepest descent methods.
Exploratory data analysis data visualization:
Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to
Maximize insight into a data set.
Uncover underlying structure.
Extract important variables.
Detect outliers and anomalies.
Test underlying assumptions.
Develop parsimonious models.
Determine optimal factor settings
Machine and Deep Learning Application.
Applying big data learning techniques for a malware classification problem.
Code:
https://gist.github.com/indraneeld/7ffb182fd8eb87d6d463dedc001efad0
Acknowledgments:
Canadian Institute for Cybersecurity (CIC) project in collaboration with Canadian Centre for Cyber Security (CCCS).
1. Sentiment analysis involves using natural language processing, statistics, or machine learning to identify and extract subjective information like opinions, attitudes, and emotions from text.
2. It can analyze sentiment at different levels of granularity, such as document, sentence, or entity level.
3. Sentiment analysis has many applications including understanding customer opinions, predicting election results, and improving marketing strategies.
4. Performing accurate sentiment analysis requires understanding the concept of an opinion as a quintuple that identifies the target, aspect, sentiment polarity, opinion holder, and time.
The document discusses feature engineering for machine learning. It defines feature engineering as the process of transforming raw data into features that better represent the data and improve machine learning performance. Some key techniques discussed include feature selection, construction, transformation, and extraction. Feature construction involves generating new features from existing ones, such as calculating apartment area from length and breadth. Feature extraction techniques discussed are principal component analysis, which transforms correlated features into linearly uncorrelated components capturing maximum variance. The document provides examples and steps for principal component analysis.
NLP is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language. Also called Computational Linguistics – Also concerns how computational methods can aid the understanding of human language
The document introduces fuzzy set theory as an extension of classical set theory that allows for elements to have varying degrees of membership rather than binary membership. It discusses key concepts such as fuzzy sets, membership functions, fuzzy logic, fuzzy rules, and fuzzy inference. Fuzzy set theory provides a framework for modeling imprecise and uncertain concepts that are common in human reasoning.
This document summarizes a machine learning workshop on feature selection. It discusses typical feature selection methods like single feature evaluation using metrics like mutual information and Gini indexing. It also covers subset selection techniques like sequential forward selection and sequential backward selection. Examples are provided showing how feature selection improves performance for logistic regression on large datasets with more features than samples. The document outlines the workshop agenda and provides details on when and why feature selection is important for machine learning models.
This document provides an overview of data science including what is big data and data science, applications of data science, and system infrastructure. It then discusses recommendation systems in more detail, describing them as systems that predict user preferences for items. A case study on recommendation systems follows, outlining collaborative filtering and content-based recommendation algorithms, and diving deeper into collaborative filtering approaches of user-based and item-based filtering. Challenges with collaborative filtering are also noted.
This presentation covers Decision Tree as a supervised machine learning technique, talking about Information Gain method and Gini Index method with their related Algorithms.
This slide is used to do an introduction for the matplotlib library and this will be a very basic introduction. As matplotlib is a very used and famous library for machine learning this will be very helpful to teach a student with no coding background and they can start the plotting of maps from the ending of the slide by there own.
This document provides an overview of Markov Decision Processes (MDPs) and related concepts in decision theory and reinforcement learning. It defines MDPs and their components, describes algorithms for solving MDPs like value iteration and policy iteration, and discusses extensions to partially observable MDPs. It also briefly mentions dynamic Bayesian networks, the dopaminergic system, and its role in reinforcement learning and decision making.
Linear regression with gradient descentSuraj Parmar
Intro to the very popular optimization Technique(Gradient descent) with linear regression . Linear regression with Gradient descent on www.landofai.com
Linear Discriminant Analysis (LDA) is a dimensionality reduction technique that projects data onto a lower dimensional space to maximize separation between classes. It works by computing eigenvectors from within-class and between-class scatter matrices to generate a linear transformation of the data. The transformation projects the high-dimensional data onto a new subspace while preserving the separation between classes.
Generative adversarial networks (GANs) use two neural networks, a generator and discriminator, that compete against each other. The generator aims to produce realistic samples to fool the discriminator, while the discriminator tries to distinguish real samples from generated ones. This adversarial training can produce high-quality, sharp samples but is challenging to train as the generator and discriminator must be carefully balanced.
Fuzzy relations, fuzzy graphs, and the extension principle are three important concepts in fuzzy logic. Fuzzy relations generalize classical relations to allow partial membership and describe relationships between objects to varying degrees. Fuzzy graphs describe functional mappings between input and output linguistic variables. The extension principle provides a procedure to extend functions defined on crisp domains to fuzzy domains by mapping fuzzy sets through functions. These concepts form the foundation of fuzzy rules and fuzzy arithmetic.
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
This presentation explains basic ideas of graph neural networks (GNNs) and their common applications. Primary target audiences are students, engineers and researchers who are new to GNNs but interested in using GNNs for their projects. This is a modified version of the course material for a special lecture on Data Science at Nara Institute of Science and Technology (NAIST), given by Preferred Networks researcher Katsuhiko Ishiguro, PhD.
I have implemented various optimizers (gradient descent, momentum, adam, etc.) based on gradient descent using only numpy not deep learning framework like TensorFlow.
Explainable AI makes the algorithms to be transparent where they interpret, visualize, explain and integrate for fair, secure and trustworthy AI applications.
Data science is having a growing effect on our lives, from the content we see on social media feeds to the decisions businesses are making. Along with successes, data science has inspired much hype about what it is and what it can do. So I plan to try and demystify data science and have a discussion about what it really is. What does a day-in-the-life look like? What tools and skills are needed? How is data science successfully applied in the real world? In this talk, I’ll be providing insight into these questions and also speculate the future of data science and its place in business and technology.
Presented at OpenWest 2018
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
How should data be preprocessed for use in machine learning algorithms? How to identify the most predictive attributes of a dataset? What features can generate to improve the accuracy of a model?
Feature Engineering is the process of extracting and selecting, from raw data, features that can be used effectively in predictive models. As the quality of the features greatly influences the quality of the results, knowing the main techniques and pitfalls will help you to succeed in the use of machine learning in your projects.
In this talk, we will present methods and techniques that allow us to extract the maximum potential of the features of a dataset, increasing flexibility, simplicity and accuracy of the models. The analysis of the distribution of features and their correlations, the transformation of numeric attributes (such as scaling, normalization, log-based transformation, binning), categorical attributes (such as one-hot encoding, feature hashing, Temporal (date / time), and free-text attributes (text vectorization, topic modeling).
Python, Python, Scikit-learn, and Spark SQL examples will be presented and how to use domain knowledge and intuition to select and generate features relevant to predictive models.
1. Sentiment analysis involves using natural language processing, statistics, or machine learning to identify and extract subjective information like opinions, attitudes, and emotions from text.
2. It can analyze sentiment at different levels of granularity, such as document, sentence, or entity level.
3. Sentiment analysis has many applications including understanding customer opinions, predicting election results, and improving marketing strategies.
4. Performing accurate sentiment analysis requires understanding the concept of an opinion as a quintuple that identifies the target, aspect, sentiment polarity, opinion holder, and time.
The document discusses feature engineering for machine learning. It defines feature engineering as the process of transforming raw data into features that better represent the data and improve machine learning performance. Some key techniques discussed include feature selection, construction, transformation, and extraction. Feature construction involves generating new features from existing ones, such as calculating apartment area from length and breadth. Feature extraction techniques discussed are principal component analysis, which transforms correlated features into linearly uncorrelated components capturing maximum variance. The document provides examples and steps for principal component analysis.
NLP is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language. Also called Computational Linguistics – Also concerns how computational methods can aid the understanding of human language
The document introduces fuzzy set theory as an extension of classical set theory that allows for elements to have varying degrees of membership rather than binary membership. It discusses key concepts such as fuzzy sets, membership functions, fuzzy logic, fuzzy rules, and fuzzy inference. Fuzzy set theory provides a framework for modeling imprecise and uncertain concepts that are common in human reasoning.
This document summarizes a machine learning workshop on feature selection. It discusses typical feature selection methods like single feature evaluation using metrics like mutual information and Gini indexing. It also covers subset selection techniques like sequential forward selection and sequential backward selection. Examples are provided showing how feature selection improves performance for logistic regression on large datasets with more features than samples. The document outlines the workshop agenda and provides details on when and why feature selection is important for machine learning models.
This document provides an overview of data science including what is big data and data science, applications of data science, and system infrastructure. It then discusses recommendation systems in more detail, describing them as systems that predict user preferences for items. A case study on recommendation systems follows, outlining collaborative filtering and content-based recommendation algorithms, and diving deeper into collaborative filtering approaches of user-based and item-based filtering. Challenges with collaborative filtering are also noted.
This presentation covers Decision Tree as a supervised machine learning technique, talking about Information Gain method and Gini Index method with their related Algorithms.
This slide is used to do an introduction for the matplotlib library and this will be a very basic introduction. As matplotlib is a very used and famous library for machine learning this will be very helpful to teach a student with no coding background and they can start the plotting of maps from the ending of the slide by there own.
This document provides an overview of Markov Decision Processes (MDPs) and related concepts in decision theory and reinforcement learning. It defines MDPs and their components, describes algorithms for solving MDPs like value iteration and policy iteration, and discusses extensions to partially observable MDPs. It also briefly mentions dynamic Bayesian networks, the dopaminergic system, and its role in reinforcement learning and decision making.
Linear regression with gradient descentSuraj Parmar
Intro to the very popular optimization Technique(Gradient descent) with linear regression . Linear regression with Gradient descent on www.landofai.com
Linear Discriminant Analysis (LDA) is a dimensionality reduction technique that projects data onto a lower dimensional space to maximize separation between classes. It works by computing eigenvectors from within-class and between-class scatter matrices to generate a linear transformation of the data. The transformation projects the high-dimensional data onto a new subspace while preserving the separation between classes.
Generative adversarial networks (GANs) use two neural networks, a generator and discriminator, that compete against each other. The generator aims to produce realistic samples to fool the discriminator, while the discriminator tries to distinguish real samples from generated ones. This adversarial training can produce high-quality, sharp samples but is challenging to train as the generator and discriminator must be carefully balanced.
Fuzzy relations, fuzzy graphs, and the extension principle are three important concepts in fuzzy logic. Fuzzy relations generalize classical relations to allow partial membership and describe relationships between objects to varying degrees. Fuzzy graphs describe functional mappings between input and output linguistic variables. The extension principle provides a procedure to extend functions defined on crisp domains to fuzzy domains by mapping fuzzy sets through functions. These concepts form the foundation of fuzzy rules and fuzzy arithmetic.
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
This presentation explains basic ideas of graph neural networks (GNNs) and their common applications. Primary target audiences are students, engineers and researchers who are new to GNNs but interested in using GNNs for their projects. This is a modified version of the course material for a special lecture on Data Science at Nara Institute of Science and Technology (NAIST), given by Preferred Networks researcher Katsuhiko Ishiguro, PhD.
I have implemented various optimizers (gradient descent, momentum, adam, etc.) based on gradient descent using only numpy not deep learning framework like TensorFlow.
Explainable AI makes the algorithms to be transparent where they interpret, visualize, explain and integrate for fair, secure and trustworthy AI applications.
Data science is having a growing effect on our lives, from the content we see on social media feeds to the decisions businesses are making. Along with successes, data science has inspired much hype about what it is and what it can do. So I plan to try and demystify data science and have a discussion about what it really is. What does a day-in-the-life look like? What tools and skills are needed? How is data science successfully applied in the real world? In this talk, I’ll be providing insight into these questions and also speculate the future of data science and its place in business and technology.
Presented at OpenWest 2018
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
How should data be preprocessed for use in machine learning algorithms? How to identify the most predictive attributes of a dataset? What features can generate to improve the accuracy of a model?
Feature Engineering is the process of extracting and selecting, from raw data, features that can be used effectively in predictive models. As the quality of the features greatly influences the quality of the results, knowing the main techniques and pitfalls will help you to succeed in the use of machine learning in your projects.
In this talk, we will present methods and techniques that allow us to extract the maximum potential of the features of a dataset, increasing flexibility, simplicity and accuracy of the models. The analysis of the distribution of features and their correlations, the transformation of numeric attributes (such as scaling, normalization, log-based transformation, binning), categorical attributes (such as one-hot encoding, feature hashing, Temporal (date / time), and free-text attributes (text vectorization, topic modeling).
Python, Python, Scikit-learn, and Spark SQL examples will be presented and how to use domain knowledge and intuition to select and generate features relevant to predictive models.
Kaggle is a community of almost 400K data scientists who have built almost 2MM machine learning models to participate in our competitions. Data scientists come to Kaggle to learn, collaborate and develop the state of the art in machine learning. This talk will cover some of the lessons we have learned from the Kaggle community.
Large-Scale Training with GPUs at FacebookFaisal Siddiqi
This document discusses large-scale distributed training with GPUs at Facebook using their Caffe2 framework. It describes how Facebook was able to train the ResNet-50 model on the ImageNet dataset in just 1 hour using 32 GPUs with 8 GPUs each. It explains how synchronous SGD was implemented in Caffe2 using Gloo for efficient all-reduce operations. Linear scaling of the learning rate with increased batch size was found to work best when gradually warming up the learning rate over the first few epochs. Nearly linear speedup was achieved using this approach on commodity hardware.
Parameter Server Approach for Online Learning at TwitterZhiyong (Joe) Xie
Parameter Server approaches for online learning at Twitter allow models to be updated continuously based on new data and improve predictions in real-time. Version 1.0 decouples training and prediction to increase efficiency. Version 2.0 scales training by distributing it across servers. Version 3.0 will scale large complex models by sharding models and features across multiple servers. These approaches enable Twitter to perform online learning on massive datasets and complex models in real-time.
2017 10-10 (netflix ml platform meetup) learning item and user representation...Ed Chi
1) Learning user and item representations is challenging due to sparse data and shifting preferences in recommender systems.
2) The presentation outlines research at Google to address sparsity through two approaches: focused learning, which develops specialized models for subsets of data like genres or cold-start items, and factorized deep retrieval, which jointly embeds items and their features to predict preferences for fresh items.
3) The techniques have improved overall viewership and nomination of candidates, demonstrating their effectiveness in production recommender systems.
Understanding Feature Space in Machine Learning - Data Science Pop-up SeattleDomino Data Lab
Machine learning derives mathematical models from raw data. In the model building process, raw data is first processed into "features," then the features are given to algorithms to train a model. The process of turning raw data into features is sometimes called feature engineering, and it is a crucial step in model building. Good features lead to successful models with a lot of predictive power; bad features lead to a lot of headache and nowhere.
This talk aims to help the audience understand what is a feature space and why it is so important. We will go through some common feature space representations of English text and discuss what tasks they are suited for and why. Expect lots of pictures, whiteboard drawings and handwaving. We will exercise our power of imagination to visualize high dimensional feature spaces in our mind's eye. Presented by Alice Zheng Director of Data Science at Dato.
Maths in the PYP - A Journey through the Artsmadahay
This document outlines an agenda for a mathematical journey through the arts workshop. It includes an icebreaker activity, sharing beliefs about mathematics, exploring the connections between math and art, action planning, and reflection. During the workshop, participants will read stories with mathematical concepts and use manipulatives like ladybugs and caterpillars to develop their understanding of addition and subtraction. The document emphasizes building conceptual understanding through concrete and pictorial representations before introducing symbolic notation.
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법NAVER D2
1) The document discusses various approaches and techniques in artificial intelligence including symbolic logic, planning, expert systems, fuzzy logic, genetic algorithms, Bayesian networks, and more.
2) It provides examples of each technique including using logic to represent arguments, planning routes for a traveling salesman, building financial expert systems, applying fuzzy logic to tipping recommendations, and using Bayesian networks for medical diagnosis.
3) The key challenges of AI discussed are computational complexity, problems with first-order logic like undecidability and uncertainty, and the difficulty of non-symbolic approaches like uncertainty in real-world problems.
This document provides information and instructions about quadratic inequalities. It begins with objectives to identify and describe quadratic inequalities using practical situations and mathematical expressions. It then defines quadratic inequalities as inequalities containing polynomials of degree 2. The standard form of quadratic inequalities is presented. Examples of quadratic inequalities in standard and non-standard form are given and worked through. Steps for solving quadratic inequalities are demonstrated. Activities include matching terms to definitions, describing examples, and completing a table with quadratic expressions and symbols. The document aims to build understanding of quadratic inequalities.
1. LDA represents documents as mixtures of topics and topics as mixtures of words.
2. It assumes documents are generated by first choosing a topic distribution, then choosing words from that topic.
3. The algorithm estimates topic distributions for each document and word distributions for each topic that are most likely to have generated the observed document-word matrix.
This document provides an overview of machine learning and feature engineering. It discusses how machine learning can be used for tasks like classification, regression, similarity matching, and clustering. It explains that feature engineering involves transforming raw data into numeric representations called features that machine learning models can use. Different techniques for feature engineering text and images are presented, such as bag-of-words and convolutional neural networks. Dimensionality reduction through principal component analysis is demonstrated. Finally, information is given about upcoming machine learning tutorials and Dato's machine learning platform.
Overview of Machine Learning and Feature EngineeringTuri, Inc.
Machine Learning 101 Tutorial at Strata NYC, Sep 2015
Overview of machine learning models and features. Visualization of feature space and feature engineering methods.
Le développement du Web et des réseaux sociaux ou les numérisations massives de documents contribuent à un renouvellement des Sciences Humaines et Sociales, des études des patrimoines littéraires ou culturels, ou encore de la façon dont est exploitée la littérature scientifique en général.
Les humanités numériques, qui croisent diverses disciplines avec l’informatique, posent comme centrales les questions du volume des données, de leur diversité, de leur origine, de leur véracité ou de leur représentativité. Les informations sont véhiculées au sein de « documents » textuels (livres, pages Web, tweets...), audio, vidéo ou multimédia. Ils peuvent comporter des illustrations ou des graphiques.
Appréhender de telles ressources nécessite le développement d'approches informatiques robustes, capables de passer à l’échelle et adaptées à la nature fondamentalement ambiguë et variée des informations manipulées (langage naturel ou images à interpréter, points de vue multiples…).
Si les approches d’apprentissage statistique sont monnaie courante pour des tâches de classification ou d’extraction d’information, elles doivent faire face à des espaces vectoriels creux et de dimension très élevées (plusieurs millions), être en mesure d’exploiter des ressources (par exemple des lexiques ou des thesaurus) et tenir compte ou produire des annotations sémantiques qui devront pouvoir être réutilisées.
Pour faire face à ces enjeux, des infrastructures ont été créées telle HumaNum à l’échelle nationale, DARIAH ou CLARIN à l’échelle européenne et des recommandations établies à l’échelle mondiale telle que la TEI (Text Encoding Initiative). Des plateformes au service de l’information scientifique comme l’équipement d’excellence OpenEdition.org sont une autre brique essentielle pour la préservation et l’accès aux « Big Digital Humanities » mais aussi pour favoriser la reproductibilité et la compréhension des expérimentations et des résultats obtenus.
Introductory seminar on NLP for CS sophomores. Presented to Texas A&M's Fall 2022 CSCE181 class. Slides are a bit redundant due to compatibility issues :\
The document discusses learning programming through MOOCs and machine learning. It provides data on a MOOC with over 160,000 students from 209 countries. It analyzes student error messages, submissions, and interactions to improve programming instructions. However, programming languages can be ambiguous and students struggle with different concepts. The document advocates for mastery learning through one-on-one tutoring and continual course improvements using data and machine learning.
Here are some key terms that are similar to "champagne":
- Sparkling wines
- French champagne
- Cognac
- Rosé
- White wine
- Sparkling wine
- Wine
- Burgundy
- Bordeaux
- Cava
- Prosecco
Some specific champagne brands that are similar terms include Moët, Veuve Clicquot, Dom Pérignon, Taittinger, and Bollinger. Grape varieties used in champagne production like Chardonnay and Pinot Noir could also be considered similar terms.
"You Can Do It" by Louis Monier (Altavista Co-Founder & CTO) & Gregory Renard (CTO & Artificial Intelligence Lead Architect at Xbrain) for Deep Learning keynote #0 at Holberton School (http://www.meetup.com/Holberton-School/events/228364522/)
If you want to assist to similar keynote for free, checkout http://www.meetup.com/Holberton-School/
word2vec beginner.
vector space, distributional semantics, word embedding, vector representation for word, word vector representation, sparse and dense representation, vector representation, Google word2vec, tensorflow
1. The document discusses educational theory and concepts relevant to learning at hacker schools.
2. It promotes three main ideas: that learning is designable like coding, individual brains learn differently, and learning is not an isolated process but relies on community and collaboration.
3. Various learning theories are covered briefly, including cognitive apprenticeship and legitimate peripheral participation within a community of practice. Motivation, mindset, and overcoming challenges are also addressed.
This document provides an overview of strategies for effective college teaching, including facilitating discussions, delivering lectures, assessing student comprehension through testing, and incorporating educational technologies. A variety of specific techniques are presented for each teaching method, with examples and suggestions for implementation. The goal is to help educators engage students and promote learning.
The document provides an overview of machine learning and discusses various concepts related to applying machine learning to real-world problems. It covers topics such as feature extraction, encoding input data, classification vs regression, evaluating model performance, and challenges like overfitting and underfitting models to data. Examples are given for different types of learning problems, including text classification, sentiment analysis, and predicting stock prices.
This document introduces the basics of translating statements from natural language to the formal language of Quantified Logic (QL). It explains that QL uses constants to represent singular terms, predicates represented by capital letters, and variables represented by lowercase letters. Quantifiers like "for all" and "there exists" are used to represent statements about properties of individuals or groups. To translate a statement to QL, one must identify whether quantifiers are used, what the universe of discourse is, any singular terms, and the relevant predicates to determine the proper representation using constants, predicates, variables, quantifiers and logical connectives.
Infrastructure for Tracking Information Flow from Social Media to U.S. TV New...Himarsha Jayanetti
This study examines the intersection between social media and mainstream television (TV) news with an aim to understand how social media content amplifies its impact through TV broadcasts. While many studies emphasize social media as a primary platform for information dissemination, they often underestimate its total influence by focusing solely on interactions within the platform. This research examines instances where social media posts gain prominence on TV broadcasts, reaching new audiences and prompting public discourse. By using TV news closed captions, on-screen text recognition, and social media logo detection, we analyze how social media is referenced in TV news.
Environmental Sciences is the scientific study of the environmental system and
the status of its inherent or induced changes on organisms. It includes not only the study
of physical and biological characters of the environment but also the social and cultural
factors and the impact of man on environment.
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptxhipachi8
Vermicomposting: A sustainable practice converting organic waste into nutrient-rich fertilizer using worms, promoting eco-friendly agriculture, reducing waste, and supporting environmentally conscious gardening and farming practices naturally.
On the Lunar Origin of Near-Earth Asteroid 2024 PT5Sérgio Sacani
The near-Earth asteroid (NEA) 2024 PT5 is on an Earth-like orbit that remained in Earth's immediate vicinity for several months at the end of 2024. PT5's orbit is challenging to populate with asteroids originating from the main belt and is more commonly associated with rocket bodies mistakenly identified as natural objects or with debris ejected from impacts on the Moon. We obtained visible and near-infrared reflectance spectra of PT5 with the Lowell Discovery Telescope and NASA Infrared Telescope Facility on 2024 August 16. The combined reflectance spectrum matches lunar samples but does not match any known asteroid types—it is pyroxene-rich, while asteroids of comparable spectral redness are olivine-rich. Moreover, the amount of solar radiation pressure observed on the PT5 trajectory is orders of magnitude lower than what would be expected for an artificial object. We therefore conclude that 2024 PT5 is ejecta from an impact on the Moon, thus making PT5 the second NEA suggested to be sourced from the surface of the Moon. While one object might be an outlier, two suggest that there is an underlying population to be characterized. Long-term predictions of the position of 2024 PT5 are challenging due to the slow Earth encounters characteristic of objects in these orbits. A population of near-Earth objects that are sourced by the Moon would be important to characterize for understanding how impacts work on our nearest neighbor and for identifying the source regions of asteroids and meteorites from this understudied population of objects on very Earth-like orbits. Unified Astronomy Thesaurus concepts: Asteroids (72); Earth-moon system (436); The Moon (1692); Asteroid dynamics (2210)
The human eye is a complex organ responsible for vision, composed of various structures working together to capture and process light into images. The key components include the sclera, cornea, iris, pupil, lens, retina, optic nerve, and various fluids like aqueous and vitreous humor. The eye is divided into three main layers: the fibrous layer (sclera and cornea), the vascular layer (uvea, including the choroid, ciliary body, and iris), and the neural layer (retina).
Here's a more detailed look at the eye's anatomy:
1. Outer Layer (Fibrous Layer):
Sclera:
The tough, white outer layer that provides shape and protection to the eye.
Cornea:
The transparent, clear front part of the eye that helps focus light entering the eye.
2. Middle Layer (Vascular Layer/Uvea):
Choroid:
A layer of blood vessels located between the retina and the sclera, providing oxygen and nourishment to the outer retina.
Ciliary Body:
A ring of tissue behind the iris that produces aqueous humor and controls the shape of the lens for focusing.
Iris:
The colored part of the eye that controls the size of the pupil, regulating the amount of light entering the eye.
Pupil:
The black opening in the center of the iris that allows light to enter the eye.
3. Inner Layer (Neural Layer):
Retina:
The light-sensitive layer at the back of the eye that converts light into electrical signals that are sent to the brain via the optic nerve.
Optic Nerve:
A bundle of nerve fibers that carries visual signals from the retina to the brain.
4. Other Important Structures:
Lens:
A transparent, flexible structure behind the iris that focuses light onto the retina.
Aqueous Humor:
A clear, watery fluid that fills the space between the cornea and the lens, providing nourishment and maintaining eye shape.
Vitreous Humor:
A clear, gel-like substance that fills the space between the lens and the retina, helping maintain eye shape.
Macula:
A small area in the center of the retina responsible for sharp, central vision.
Fovea:
The central part of the macula with the highest concentration of cone cells, providing the sharpest vision.
These structures work together to allow us to see, with the light entering the eye being focused by the cornea and lens onto the retina, where it is converted into electrical signals that are transmitted to the brain for interpretation.
he eye sits in a protective bony socket called the orbit. Six extraocular muscles in the orbit are attached to the eye. These muscles move the eye up and down, side to side, and rotate the eye.
The extraocular muscles are attached to the white part of the eye called the sclera. This is a strong layer of tissue that covers nearly the entire surface of the eyeball.he layers of the tear film keep the front of the eye lubricated.
Tears lubricate the eye and are made up of three layers. These three layers together are called the tear film. The mucous layer is made by the conjunctiva. The watery part of the tears is made by the lacrimal gland
Protective function of skin, protection from mechanical blow, UV rays, regulation of water and electrolyte balance, absorptive activity, secretory activity, excretory activity, storage activity, synthetic activity, sensory activity, role of sweat glands regarding heat loss, cutaneous receptors and stratum corneum
4. 4
The machine learning pipeline
I fell in love the instant I laid
my eyes on that puppy. His
big eyes and playful tail, his
soft furry paws, …
Raw data
Features
Models
Predictions
Deploy in
production
6. 6
Representing natural text
It is a puppy and it is
extremely cute.
What’s important?
Phrases? Specific
words? Ordering?
Subject, object, verb?
Classify:
puppy or not?
Raw Text
{“it”:2,
“is”:2,
“a”:1,
“puppy”:1,
“and”:1,
“extremely”:1,
“cute”:1 }
Bag of Words
7. 7
Representing natural text
It is a puppy and it is
extremely cute.
Classify:
puppy or not?
Raw Text Bag of Words
it 2
they 0
I 1
am 0
how 0
puppy 1
and 1
cat 0
aardvark 0
cute 1
extremely 1
… …
Sparse vector
representation
8. 8
Representing images
Image source: “Recognizing and learning object categories,”
Li Fei-Fei, Rob Fergus, Anthony Torralba, ICCV 2005—2009.
Raw image:
millions of RGB triplets,
one for each pixel
Classify:
person or animal?
Raw Image Bag of Visual Words
9. 9
Representing images
Classify:
person or animal?
Raw Image Deep learning features
3.29
-15
-5.24
48.3
1.36
47.1
-
1.92
36.5
2.83
95.4
-19
-89
5.09
37.8
Dense vector
representation
10. 10
Feature space in machine learning
• Raw data high dimensional vectors
• Collection of data points point cloud in feature space
• Model = geometric summary of point cloud
• Feature engineering = creating features of the appropriate
granularity for the task
11. Crudely speaking, mathematicians fall into two
categories: the algebraists, who find it easiest to
reduce all problems to sets of numbers and
variables, and the geometers, who understand the
world through shapes.
-- Masha Gessen, “Perfect Rigor”
16. 16
Why are we looking at spheres?
= =
= =
Poincaré Conjecture:
All physical objects without holes
is “equivalent” to a sphere.
17. 17
The power of higher dimensions
• A sphere in 4D can model the birth and death process of
physical objects
• Point clouds = approximate geometric shapes
• High dimensional features can model many things
19. 19
The challenge of high dimension geometry
• Feature space can have hundreds to millions of
dimensions
• In high dimensions, our geometric imagination is limited
- Algebra comes to our aid
20. 20
Visualizing bag-of-words
puppy
cute
1
1
I have a puppy and
it is extremely cute
I have a puppy and
it is extremely cute
it 1
they 0
I 1
am 0
how 0
puppy 1
and 1
cat 0
aardvark 0
zebra 0
cute 1
extremely 1
… …
28. 28
When does bag-of-words fail?
puppy
cat
2
1
1
have
I have a puppy
I have a cat
I have a kitten
Task: find a surface that separates
documents about dogs vs. cats
Problem: the word “have” adds fluff
instead of information
I have a dog
and I have a pen
1
29. 29
Improving on bag-of-words
• Idea: “normalize” word counts so that popular words
are discounted
• Term frequency (tf) = Number of times a terms
appears in a document
• Inverse document frequency of word (idf) =
• N = total number of documents
• Tf-idf count = tf x idf
30. 30
From BOW to tf-idf
puppy
cat
2
1
1
have
I have a puppy
I have a cat
I have a kitten
idf(puppy) = log 4
idf(cat) = log 4
idf(have) = log 1 = 0
I have a dog
and I have a pen
1
31. 31
From BOW to tf-idf
puppy
cat1
have
tfidf(puppy) = log 4
tfidf(cat) = log 4
tfidf(have) = 0
I have a dog
and I have a pen,
I have a kitten
1
log 4
log 4
I have a cat
I have a puppy
Decision surface
Tf-idf flattens
uninformative
dimensions in the
BOW point cloud
32. 32
Entry points of feature engineering
• Start from data and task
- What’s the best text representation for classification?
• Start from modeling method
- What kind of features does k-means assume?
- What does linear regression assume about the data?
33. 33
That’s not all, folks!
• There’s a lot more to feature engineering:
- Feature normalization
- Feature transformations
- “Regularizing” models
- Learning the right features
• Dato is hiring! [email protected][email protected] @RainyData
Editor's Notes
#5: Features sit between raw data and model. They can make or break an application.