0% found this document useful (0 votes)
4 views

Lecture 2.3 B

Uploaded by

himanshisaini47
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture 2.3 B

Uploaded by

himanshisaini47
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

INSTITUTE: CHANDIGARH UNIVERSITY

DEPARTMENT: UIC
MCA
Business Analytics
23CAH-701

DISCOVER . LEARN . EMPOWER


Datasets
• Datasets are curated tables of data that can be reused across multiple
reports. They are created by writing a SQL query and transforming the
results of that query into a reusable asset. Datasets can then be
shared across your organization. This allows multiple reports to be
created off of the initial query, which can be set to refresh on a
schedule. Reports created from Datasets will be able to consume the
fresh data when available to ensure accuracy of the reporting over
time.
• The data in a Dataset is cached in Helix, which enables more efficient
data usage and improved performance for reports created from
Datasets.
Key benefits of Datasets:
• Centralize logic and data quality: Datasets can power multiple reports, allowing analysts
to write or update one query that cascades down across multiple reports.
• Manage data stack complexities: Datasets introduces a new way data moves through
Mode, creating a middle governance later that can centralize logic and make scaling
easier.
• Improve efficiency and performance: With data cached in Helix, this provides incremental
performance gains for each report refresh that doesn't have to hit the data warehouse.
• Cost savings: Datasets are positioned between reports and warehouses for more efficient
data usage and controlled warehouse hits.
• Confident self service access: Datasets can be an approved source for teams within an
organization to confidently build reports without writing any code, knowing the dataset
has been published by the data team.
• Data accessibility: Datasets can be organized in collections and browsed when creating
reports. Datasets are subject to permissions just like reports.
Manipulate Large Data Sets
When working with large data sets in SQL, it is important to employ
efficient techniques to manipulate the data effectively. Here are
some strategies for manipulating large data sets:
1.Use Proper Indexing: Indexes help improve query performance
by allowing the database to quickly locate and retrieve the
required data. Ensure that appropriate indexes are created on
columns frequently used in search conditions, joins, and sorting
operations. Analyze query execution plans and consider adding or
adjusting indexes based on the query patterns and data access
patterns.
2.Filter and Subset Data: When dealing with large data sets, it is
often beneficial to filter and retrieve only the necessary subset of
data instead of processing the entire dataset. Utilize the WHERE
clause in SELECT statements to apply conditions and retrieve only
the relevant rows. This helps reduce the amount of data being
processed and improves query performance. 4
3. Use Pagination or Limiting Techniques: Instead of retrieving the
entire result set at once, use pagination or limiting techniques to
retrieve data in smaller chunks. This involves retrieving a subset of rows
using keywords like LIMIT, OFFSET, or the equivalent syntax supported
by your database system. By retrieving data in smaller batches, you can
reduce memory consumption and improve query performance.
4. Optimize Joins: When joining tables, ensure that the join conditions
are well-defined and appropriate indexes are in place. Consider using
appropriate join types (INNER JOIN, LEFT JOIN, etc.) based on the
relationship between the tables and the desired output. Avoid
unnecessary or redundant joins that can result in excessive data
processing.
5. Aggregate and Summarize Data: Instead of processing every
individual row, consider aggregating and summarizing the data using
GROUP BY, SUM, COUNT, and other aggregate functions. This helps
reduce the amount of data being processed and provides a more
concise view of the information.
5
6. Partitioning and Parallel Processing: Some database systems
support data partitioning, which involves splitting large tables into smaller,
more manageable pieces based on specific criteria (such as ranges or
hash values). Partitioning allows for parallel processing of data,
distributing the workload across multiple resources and improving query
performance.
7. Consider Batch Processing: If applicable, consider performing
operations on the data in batches rather than processing the entire
dataset at once. This can be useful for tasks such as updates, deletions, or
inserts. Breaking the data into smaller batches can help manage resources
more effectively and allow for easier error handling and recovery.
8. Optimize Query Performance: Analyze and optimize your SQL
queries to ensure they are written efficiently. Use appropriate join
conditions, avoid unnecessary subqueries or redundant calculations, and
ensure that your queries are using the best execution plan available.
Regularly review and analyze query performance using database-specific
tools or EXPLAIN/EXPLAIN PLAN statements to identify areas for
improvement.
6
THANK YOU

You might also like