Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7
INSTITUTE: CHANDIGARH UNIVERSITY
DEPARTMENT: UIC MCA Business Analytics 23CAH-701
DISCOVER . LEARN . EMPOWER
Datasets • Datasets are curated tables of data that can be reused across multiple reports. They are created by writing a SQL query and transforming the results of that query into a reusable asset. Datasets can then be shared across your organization. This allows multiple reports to be created off of the initial query, which can be set to refresh on a schedule. Reports created from Datasets will be able to consume the fresh data when available to ensure accuracy of the reporting over time. • The data in a Dataset is cached in Helix, which enables more efficient data usage and improved performance for reports created from Datasets. Key benefits of Datasets: • Centralize logic and data quality: Datasets can power multiple reports, allowing analysts to write or update one query that cascades down across multiple reports. • Manage data stack complexities: Datasets introduces a new way data moves through Mode, creating a middle governance later that can centralize logic and make scaling easier. • Improve efficiency and performance: With data cached in Helix, this provides incremental performance gains for each report refresh that doesn't have to hit the data warehouse. • Cost savings: Datasets are positioned between reports and warehouses for more efficient data usage and controlled warehouse hits. • Confident self service access: Datasets can be an approved source for teams within an organization to confidently build reports without writing any code, knowing the dataset has been published by the data team. • Data accessibility: Datasets can be organized in collections and browsed when creating reports. Datasets are subject to permissions just like reports. Manipulate Large Data Sets When working with large data sets in SQL, it is important to employ efficient techniques to manipulate the data effectively. Here are some strategies for manipulating large data sets: 1.Use Proper Indexing: Indexes help improve query performance by allowing the database to quickly locate and retrieve the required data. Ensure that appropriate indexes are created on columns frequently used in search conditions, joins, and sorting operations. Analyze query execution plans and consider adding or adjusting indexes based on the query patterns and data access patterns. 2.Filter and Subset Data: When dealing with large data sets, it is often beneficial to filter and retrieve only the necessary subset of data instead of processing the entire dataset. Utilize the WHERE clause in SELECT statements to apply conditions and retrieve only the relevant rows. This helps reduce the amount of data being processed and improves query performance. 4 3. Use Pagination or Limiting Techniques: Instead of retrieving the entire result set at once, use pagination or limiting techniques to retrieve data in smaller chunks. This involves retrieving a subset of rows using keywords like LIMIT, OFFSET, or the equivalent syntax supported by your database system. By retrieving data in smaller batches, you can reduce memory consumption and improve query performance. 4. Optimize Joins: When joining tables, ensure that the join conditions are well-defined and appropriate indexes are in place. Consider using appropriate join types (INNER JOIN, LEFT JOIN, etc.) based on the relationship between the tables and the desired output. Avoid unnecessary or redundant joins that can result in excessive data processing. 5. Aggregate and Summarize Data: Instead of processing every individual row, consider aggregating and summarizing the data using GROUP BY, SUM, COUNT, and other aggregate functions. This helps reduce the amount of data being processed and provides a more concise view of the information. 5 6. Partitioning and Parallel Processing: Some database systems support data partitioning, which involves splitting large tables into smaller, more manageable pieces based on specific criteria (such as ranges or hash values). Partitioning allows for parallel processing of data, distributing the workload across multiple resources and improving query performance. 7. Consider Batch Processing: If applicable, consider performing operations on the data in batches rather than processing the entire dataset at once. This can be useful for tasks such as updates, deletions, or inserts. Breaking the data into smaller batches can help manage resources more effectively and allow for easier error handling and recovery. 8. Optimize Query Performance: Analyze and optimize your SQL queries to ensure they are written efficiently. Use appropriate join conditions, avoid unnecessary subqueries or redundant calculations, and ensure that your queries are using the best execution plan available. Regularly review and analyze query performance using database-specific tools or EXPLAIN/EXPLAIN PLAN statements to identify areas for improvement. 6 THANK YOU
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint