Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60
Unit - I
Data Analytics (KIT 601)
SWASTI SINGHAL Department of CSIT KIET, Ghaziabad
DA KIT 601 By Swasti Singhal
07/24/2024 2 SYLLABUS • Introduction to Data Analytics: Sources and nature of data, classification of data (structured, semi-structured, unstructured), characteristics of data, introduction to Big Data platform, need of data analytics, evolution of analytic scalability, analytic process and tools, analysis vs reporting, modern data analytic tools, applications of data analytics. • Data Analytics Lifecycle: Need, key roles for successful analytic projects, various phases of data analytics lifecycle – discovery, data preparation, model planning, model building, communicating results, operationalization. Learning Objectives CO 1 Discuss various concepts of data analytics pipeline
• Define Data and Its importance
• Define data analytics and its types • Introduction to Data Analytics: Sources and nature of data, classification of data (structured, semi- structured, unstructured), characteristics of data, introduction to Big Data platform, need of data analytics, evolution of analytic scalability, analytic process and tools, analysis vs reporting, modern data analytic tools, applications of data analytics. • Data Analytics Lifecycle: Need, key roles for successful analytic projects, various phases of data analytics lifecycle – discovery, data preparation, model planning, model building, communicating results, operationalization. Source of Data What is Big data CLASSIFICATION OF DATA • Data classification is broadly defined as the process of organizing data by relevant categories so that it may be used and protected more efficiently. On a basic level, the classification process makes data easier to locate and retrieve. Data classification is of particular importance when it comes to risk management, compliance, and data security. • Big Data includes huge volume, high velocity, and extensible variety of data. These are 3 types: Structured data, Semi-structured data, and Unstructured data. Structured data • Structured data is data whose elements are addressable for effective analysis. It has been organized into a formatted repository that is typically a database. It concerns all data which can be stored in database SQL in a table with rows and columns. They have relational keys and can easily be mapped into pre-designed fields. Today, those data are most processed in the development and simplest way to manage information. Example: Relational data. Unstructured data • Unstructured data is a data that is which is not organized in a pre- defined manner or does not have a pre-defined data model, thus it is not a good fit for a mainstream relational database. So for Unstructured data, there are alternative platforms for storing and managing, it is increasingly prevalent in IT systems and is used by organizations in a variety of business intelligence and analytics applications. Example: Word, PDF, Text, Media logs. Semi-Structured data • Semi-structured data is information that does not reside in a relational database but that have some organizational properties that make it easier to analyze. With some process, you can store them in the relation database (it could be very hard for some kind of semi- structured data), but Semi-structured exist to ease space. Example: XML data. CHARACTERISTICS OF DATA ? Volume Variety Veracity Value Velocity • Visualization:Using charts and graphs to visualize large amounts of complex data is much more effective in conveying meaning than spreadsheets and reports • Variability: Variability is different from variety. A coffee shop may offer 6 different blends of coffee, but if you get the same blend every day and it tastes different every day, that is variability. The same is true of data, if the meaning is constantly changing it can have a huge impact on your data. How Much Data cern's large hadron collider TYPES OF DATA VARIETIES BIG DATA COLLECTED What is Big data • Big data exceeds to reach of commonly used hardware environments and software tools to capture, manage and process it within a tolerable elapsed time for its user population - Merv ardern • Big data' refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. – Mckinsey global institute Summary • Definition of Data Analytics • Data Analytics vs Data Mining • Definitions to Big data • Classification's of big data • Characteristics of big data • Applications of Big data TRADITIONAL ANALYTICS VS BIG DATA ANALYTICS 360-degree view
Applied Longitudinal Data Analysis for Medical Science: A Practical Guide 3rd Edition Twisk - Download the ebook in PDF with all chapters to read anytime