Data Mining Group Project .
Data Mining Group Project .
Presented By:
Chantelle Chifamba(301270348)
Khalid Dawd(301144241)
AbdulMujeeb Adesoye(301208797)
Jatinder Dosanjh
Data Mining
• Data mining is the process of automatically discovering useful information in
large data repositories.
• Human analysts may take weeks to discover useful information.
• Much of the data is never analyzed at all.
4,000,000
3,500,000
The Data Gap
3,000,000
2,500,000
0
1995 1996 1997 1998 1999
Largest databases in 2007
• Largest database in the world: World Data Centre for Climate (WDCC)
operated by the Max Planck Institute and German Climate Computing
Centre
• 220 terabytes of data on climate research and climatic trends,
• 110 terabytes worth of climate simulation data.
• 6 petabytes worth of additional information stored on tapes.
• AT&T
• 323 terabytes of information
• 1.9 trillion phone call records
• Google
• 91 million searches per day,
• After a year worth of searches, this figure amounts to more than 33 trillion database entries.
What is (not) Data Mining?
What is not Data What is Data Mining?
Mining?
Data Mining
Database
systems
Data Mining Tasks
Data mining tasks are generally divided into two major categories:
opportunities
Example problem
(Adapted from Leslie Kaelbling's example in the MIT courseware)
deploy machine learning models in a fast and simple way, even if they