0% found this document useful (0 votes)
12 views31 pages

Lecture1 Introduction

Uploaded by

itisnotlavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views31 pages

Lecture1 Introduction

Uploaded by

itisnotlavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

CS 584: Personal Information

Assistants
Lecture 1: Course Overview and Introduction

CS 584: Personal Information Assistants.


1/24/19 1
Spring 2019
Lecture Plan
• What is IR? (the big questions)

• Course overview and topics

• Logistics

CS 584: Personal Information Assistants.


1/24/19 2
Spring 2019
Information Retrieval
• Information retrieval is a field concerned with the
structure, analysis, organization, storage, searching,
and retrieval of information
Gerald Salton, 1968

• Compare to Google corporate mission:


Google's mission is to organize the world's
information and make it universally accessible and
useful.

CS 584: Personal Information Assistants.


1/24/19 3
Spring 2019
Intelligent Information Access:
General Research Areas

Search, Ranking,
Recommendation,
Personalization

Text and Social Data Intent Inference, User


Mining, Classification Attention & Interactions,
Information Extraction Evaluation

CS 584: Personal Information Assistants.


1/24/19 4
Spring 2019
Intelligent Information Access:
Examples (1)
Question Answering: Search, Ranking,
Recommendation,
Personalization

Text and Social Data Intent Inference, User


Mining, Classification Attention & Interactions,
Text-based
QA Information Extraction Evaluation
Crowds

Unstructured data Semi-structured data KB-based QA Structured data

Contextual (Social) Text Mining:


1. Novel temporal representation of text
c
c1
Word n

Represent
words as concept
vectors

2. Novel temporal text-similarity measurement


Word1 Word 2 Method for computing
semantic relatedness
using the temporal
representation

CS 584: Personal Information Assistants.


1/24/19 5
Spring 2019
Intelligent Information Access:
Examples (2)
Personal Intelligent Assistants
Conversational Search
• Intelligent Assistants:
in Alexa Prize 2018
• Dialogue-based
Search Interfaces
• Explicit interaction

Search, Ranking,
Recommendation,
Personalization

User Behavior Models:


Text and Social Data Intent Inference, User Inferring User Attention
Mining, Classification Attention & Interactions, Memory Loss (Alzheimer’s)
Information Extraction Evaluation
Interactions (Parkinson’s)

CS 584: Personal Information Assistants.


1/24/19 6
Spring 2019
Contextual Personalized Search
• Contextual Search on ubiquitous devices
– Grounding search in physical
and social context
– Incorporate sensor data
– Speech-based search: use acoustics,
other sensor input

• Personalized search:
– When, what, and how to personalize?
– Seamlessly access info when needed
– Decision support anytime, any topic
CS 584: Personal Information Assistants.
1/24/19 7
Spring 2019
Lots of Data

Ex: Large Hadron Collider:


36 Billion events per minute.

How much of this data is


useful information?
CS 584: Personal Information Assistants.
1/24/19 8
Spring 2019
Fundamental Mismatch
• Computers deal with data
• Humans (and systems) want useful information
• Examples:
– Structured Data (easiest): bitsàschemaàquery
– Text: bits à represented as words
– Audio: bits à speech recognition à words
– Images: bits à image processing à computer vision à
object recognition à <tags>?
– Video: image processing+audio+activity recognition à ?
CS 584: Personal Information Assistants.
1/24/19 9
Spring 2019
Search Challenges (2002)
UMass CIIR report, 2002

• Global information access: Satisfy human


information needs through natural, efficient
interaction with a system … [over the world’s
data]… in any language.

• Contextual retrieval: Combine search


technologies and knowledge about query and
user context … to provide the most “appropriate”
answer for a user’s information needs.
CS 584: Personal Information Assistants.
1/24/19 10
Spring 2019
A “classic” IR Task
• Given:
– A corpus of textual natural-language documents.
– A user query in the form of a textual string.
• Find:
– A ranked set of documents that are relevant to the
query.

Contrast with database/SQL queries

CS 584: Personal Information Assistants.


1/24/19 11
Spring 2019
A search model
TASK Get rid of mice in a
politically correct way
Misconception?
Info Need Info about removing mice
without killing them
Mistranslation?
Verbal How do I trap mice alive?
form
Misformulation?

Query mouse trap

SEARCH
ENGINE

Query Results
Corpus
Refinement
CS 584: Personal Information Assistants.
1/24/19 Spring 2019 12
Information Retrieval Challenges
• Understand user’s query (information need)

• Interpret and organize data (indexing, classification)

• Rank “documents” by expected utility for user

• Find answers to show to user (selection, presentation)

• Evaluate, improve search, repeat


CS 584: Personal Information Assistants.
1/24/19 13
Spring 2019
Search Challenges (2012) SWIRL 2012 report
PROMISE 2012 report

• Beyond ranked list: enrich querying & results


• IR for all: empower the user to search & learn
• Capture context: current task, time, etc.
• Beyond document retrieval: complex data &
result integration
• Domain search: Verticals, apps, restricted data
• Evaluation: for new search types, tasks

CS 584: Personal Information Assistants.


1/24/19 14
Spring 2019
Search Chalenges: SWIRL 2018 (1)
• https://www.damianospina.com/wp-content/uploads/2018/04/swirl3-report.pdf
• Decision Support over Pathways: Understanding and designing systems to
help people in making decisions.
• • Generating New Information Objects: Ad hoc generation, composition,
and summarization of new text, and layouts in response to an information
need.
• • Transparent/Explainable Information Retrieval: Explaining ranking
decisions. Providing reliable and responsible information access.
• • Cognitive-aware IR: Tracking and modeling user behavior and perception.
Modeling political-correctness of decisions. Identifying fake news and
provenance.
• • Societal impact of information retrieval: Understanding the long term
impact of IR on society and the economy.
• • Personal information access: Federated personal information search and
management (e.g. knowledge graphs). Biometrics for affective state.
• • Next Generation Efficiency-Effectiveness Issues: Efficient machine learning
inference. Resource-constrained search.
CS 584: Personal Information Assistants.
1/24/19 15
Spring 2019
Search Chalenges: SWIRL 2018 (2)
• https://www.damianospina.com/wp-content/uploads/2018/04/swirl3-report.pdf
• Machine Learning and Search: Developing effective machine-learned retrieval models (e.g.
neural networks, reinforcement learning, meta-optimization).
• • Personalized interaction: Diversified and personalized interactions.
• • Conversational information access: Information-seeking conversations. Learning
representations for conversations.
• • New approaches to evaluation: Moving beyond the Cranfield paradigm, topical relevance,
and queries. Controlling for variability. Counterfactual evaluation and off-policy evaluation.
• • New interaction modes with information, multi-device search: Multi-device search.
• • Blending online and physical: Search in the context of mobile, smart environments, and
augmented/virtual reality.
• Task-specific representation learning: Adapting machine learned models for new search
domains.
• • Pertinent Context: Surfacing and using the relevant contextual information for search.
• • Success prediction: Formal models and principles to inform retrieval system design (build
the right bridge instead of build six bridges and see which survives).

CS 584: Personal Information Assistants.


1/24/19 16
Spring 2019
History of the (IR) World: the Ancients

– Vanevar Bush: “As we may think”:


“Memex”, 1945 à proto-IR à desktop search

– Gerald Salton: founder of IR as field


CS 584: Personal Information Assistants.
1/24/19 17
Spring 2019
History: the mainframe era (50s-70s)
• High-level programming
languages
• Fortran, Ada, Cobol, …
• Data structures
• Sorting
• Searching
• Routing
• Graph theory
• Internet (ARPA-net)
• Salton develops IR

CS 584: Personal Information Assistants.


1/24/19 18
Spring 2019
IR System (80’s, 90’s)

Document
corpus

Query IR
String System

1. Doc1
2. Doc2
Ranked 3. Doc3
Documents .
.

CS 584: Personal Information Assistants.


1/24/19 19
Spring 2019
History: Web 1.0 (90s)
• Distributed computing
• Mosaic, Netscape, IE
• Web search
– Alta Vista, Excite…

CS 584: Personal Information Assistants.


1/24/19 20
Spring 2019
Google architecture (circa 1998)

CS 584: Personal Information Assistants.


1/24/19 21
Spring 2019
Web 2.0: 2000’s

CS 584: Personal Information Assistants.


1/24/19 22
Spring 2019
Now (2010à)
• Knowledge Graphs

• Deep Learning for IR

• Personal Assistants

CS 584: Personal Information Assistants.


1/24/19 23
Spring 2019
And in this course…

CS 584: Personal Information Assistants.


1/24/19 24
Spring 2019
Search Engine Architecture: Indexing

CS 584: Personal Information Assistants.


1/24/19 25
Spring 2019
Search Engine Architecture: Querying

CS 584: Personal Information Assistants.


1/24/19 26
Spring 2019
Course Logistics
• Lectures:
– Tuesday & Thursday: 1pm-2:15pm
– Following dates will be rescheduled or canceled:
• Office hours (tentatively):
– Tuesday,, Thursday, 3-4pm, Emerson E500
• Communication/website:
https://canvas.emory.edu/courses/39249

CS 584: Personal Information Assistants.


1/24/19 27
Spring 2019
Course structure
• The course covers roughly 4 topics, with some overlap:
– Information retrieval and recommender systems (e.g., Bing,
Netflix)
– Personalization and contextualization (e.g., personal ranking,
local suggestions)
– Explicit and implicit feedback (ratings, clicks, engagement)
– Personal Information Assistants (1+2+3+domain specifics)
• Grading:
– Two implementation homeworks: 40%
– Paper presentations/leading discussions: 15%
– Final project: 45% (proposal, implementation, presentation,
report).

CS 584: Personal Information Assistants.


1/24/19 28
Spring 2019
Homework projects

• HW1: Lucene or Whoosh over local collection e.g.,


movies, semi-structured search, e.g., "movies about
robots"
• HW2: (personalized) recommendation, e.g., movie
or song or book, can be local and/or remote
dataset. platform TBA.

CS 584: Personal Information Assistants.


1/24/19 29
Spring 2019
Texts
• CMS: Information Retrieval in Practice
Croft, Metlzer Strohman
Free online: https://ciir.cs.umass.edu/irbook/
• MRS: Introduction to Information Retrieval,
Manning, Raghavan and Schütze, Cambridge University
Press (free online):
http://nlp.stanford.edu/IR-book/
• SUI: Search User interfaces, Marti Hearst, CUP, free
online:
http://searchuserinterfaces.com/
• Additional readings will be posted online.

CS 584: Personal Information Assistants.


1/24/19 30
Spring 2019
Ideas for final projects
• Personalized ”Skill” for Alexa platform, e.g.,
personalized news briefing
• Personalized reminder/assistant App
• TREC 2018/2019 challenges, esp. conversational
search: https://trec.nist.gov/pubs/call2019.html

CS 584: Personal Information Assistants.


1/24/19 31
Spring 2019

You might also like