0% found this document useful (0 votes)

38 views

Take Assessment: Exercise 6: Index Choice and Query Optimization

Uploaded by

xgdsmxy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Take Assessment: Exercise 6: Index Choice and Query Optimization

Uploaded by

xgdsmxy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 7

Take Assessment: Exercise 6

Please answer the following question(s).

If the assessment includes multiple-choice questions, click the "Submit
Answers" button when you have completed those questions.
1. Go to bottom of question.

Index Choice and Query Optimization

Background

In this exercise, you will gain hands on experience with query optimizations using indexes.
The data you will be working with consists of a simple real estate information system where
information about customers, lots and the lots owned by the customers is stored. The SQL
schema for these tables:

Customer( customer_id, customer_first_name, customer_last_name )

Lot ( lot_id, lot_description, lot_size, lot_district, lot_value,
lot_street_address )
Customer_lot (lot_id, customer_id)

You must design the access methods for the database such that the best possible
performance is achieved under a variety of operating conditions. These operating conditions
are the different types of queries that will run on your system. These different queries run
on the system are below.

1. Selecting the lot_id for all lots in a given size range. An example of such a query
would be select lot_id from lot where lot_size between 300 and 15000;
2. Selecting the lot_id for all lots in a given value range. An example of such a query
would be select lot_id from lot where lot_value between 3000 and 15000;
3. Selecting all of the information for a specific customer. An example of such a query
would be select * from customer where customer_id=12;
4. Inserting new customer or lot data. An example of such a query would be insert into
customer values (250001, 'Vince', 'Smith' );
5. Deleting a row of customer or lot data. An example of such a query would be delete
from customer where customer_id='250001';
6. Updating a row of customer or lot data. An example of such a query would be
update customer set customer_first_name='Vinny' where customer_id='249001';
7. Selecting the average lot size of all lots. An example of such a query would be
select avg(lot_size) from lot;

Your Tasks

1. Optimize the real estate tables above. Perform the following operations.
1. Build and populate these tables using the instructions for setting up the real
estate database.
2. Analyze the storage characteristics of the tables when stored in the
database. You are to submit the most accurate estimate of the number of
tuples for each table and the number of disk pages that are used to store
each table. Submit the code necessary to gather this information as well as
the output of these queries. What do you calculate as the blocking factor for
each table?
3. Analyze the run time characteristics of each query given above. Report both
PostgreSQL's estimate and the actual total cost expressed in the number of
disk accesses and the amount of time that each query takes to run. Submit
the code necessary for gathering this data.
4. For each query above, suggest an index, if applicable, to improve
performance. To answer this question completely, you must state what
columns you are indexing and what index you will use. You must fully
defend your choice with a complete explanation. If an index is not
appropriate for a query, clearly state why. Note that you may only use
indexes supported in PostgreSQL, e.g. hash or b-tree. You may also use
clustering.
5. Implement the indexes that you proposed in the previous step and analyze the run-
time performance of the same queries that you ran in step three. Submit a table
showing the number of estimated and actual total cost expressed in the number of
disk page accesses and query run times after the index was used. Also compute the
percentage increase or decrease in performance (based on run time) that results
from indexing the tables. If no performance benefit was gained from the index,
you must explain why. You may use the following table as a basis:
Performance
Without Index With Index
Improvement
Estimated Actual Actual Estimated Actual Actual
Query
Disk Disk Run Disk Disk Run
Number
Accesses Accesses Time Accesses Accesses Time
1
2
3
4
5
6
7
6. Given the data that you have now obtained, which queries do index
structures slow down and which do they speed up? If query types 1, 2 and
3 are common (occurring 75% of the time) and types 4 is uncommon
(occurring 25% of the time), does the increase in performance for some
queries outweigh the decreases of others? How would your opinion change
if the ratio were closer to 50% for queries 1, 2, and 3, and 50% for type 4?
2. You are designing a database to store sensor records for a research study. For the
first year, the database will mostly be populated via insert statements gathered
from these sensors. After the first year, the database will mainly be queried for data
via select statements. You must decide whether to index the database initially or
wait until after the first year. From what you have witnessed, would you initially
index the table or wait until after the first year? Explain.
3. You are employed by a local hospital as one of team of database professionals. A
colleague has implemented a table that stores information about patients in
PostgreSQL. You are asked to obtain all records for any female patient. You write a
query select * from patient where gender='f'; You notice that query
performance is poor. Your colleague who implemented the patient table is currently
away and it is your responsibility to determine what is wrong. You first describe the
patient table:

You notice that the gender column is indexed and still are unsure why query
performance is not good. You decide to run an explain analyze on the above query.
The output generated by explain analyze is:

hospital=# explain SELECT * FROM patient WHERE gender='f';

NOTICE: QUERY PLAN:

Seq Scan on patient (cost=0.00..173.07 rows=6406 width=70)

EXPLAIN

You now understand why the query performance is suffering. Even though there is
an index on gender, it appears as though it is not being used. You issue a query to
count the number of males and females in the table. You find that the distribution is
almost exactly 50% males and 50% females. Explain why the index is not being
used. Will clustering the patient_gender index help performance? Please explain.

4. Recall that PostgreSQL stores statistics about tables in the system table called
pg_class. The query planner accesses this table for every query. These statistics
may only be updated using the analyze command. If the analyze command is not
run often, the statistics in this table may not be accurate and the query planner
may make poor decisions which can degrade system performance. Another strategy
is for the query planner to generate these statistics for each query (including
selects, inserts, updates, and deletes). This approach would allow the query planner
to have the most up-to-date statistics possible. Why doesn't PostgreSQL do this?

Submission

Submit answers to all of these questions in a file named indexes.txt.

To help yourself do your best on this assessment, consult this general list of grading
guidelines.
Go to top of question.

File to submit:

Go to top of assessment.

练习十索引选择及查询优化

背景
在这次联系中，你将亲自动手实验用索引进行查询优化。你操作的数
据由一个简单实时评估信息系统组成，该系统存储了用户，地段，
用户所拥有的地段的信息。这些表格的 SQL 框架如下：
Customer( customer_id, customer_first_name,
customer_last_name )
Lot ( lot_id, lot_description, lot_size, lot_district,
lot_value, lot_street_address )
Customer_lot (lot_id, customer_id)
你必须设计数据库的访问方法以便在各种操作条件下，能够获得最
可能的效果。这些操作条件是运行在你的系统上的不同类型的查询。
这些在系统上运行的不同查询如下：
1. 给定一个大小范围，选出所有范围内的地段号 lot_id。该类查询
的实例应该是 select lot_id from lot where lot_size between
300 and 15000；
2. 给定一个值的范围，选出所有范围内的地段号 lot_id。该类查询
的实例应该是 select lot_id from lot where lot_value between
300 and 15000；
3. 选出一个指定用户的所有信息。该类查询的实例应该是 select *
from customer where customer_id=12；
4. 插入新的用户或地段数据。该类查询的实例应该是 insert into
customer values (250001, 'Vince', 'Smith' );
5. 删除一行用户或地段数据。该类查询的实例应该是 delete from
customer where customer_id='250001';
6. 更新一行用户或地段数据。该类查询的实例应该是 update
customer set customer_first_name='Vinny' where
customer_id='249001';
7. 选出所有点段的平均地段大小。该类查询的实例应该是 select
avg(lot_size) from lot;
你的任务
1. 优化上面的实时评估表。进行下列操作。
1. 利用设置实时评估数据库的说明，构建并填充这些表格。
2. 当存储表格到数据库的时候，分析表格的存储特征。你将提交
最正确的评估每个表格的元组数量以及用于存储每个表格的磁盘
的页数。提交收集这些信息的必要代码以及这些查询的输出。对于
每个表格，你以什么为块因子进行计算？
3. 分析上面给出的每个查询的运行时特征。报告 PostgreSQL 的
评估，用于磁盘访问的实际次数以及每次查询花费时间的总计。提
交收集这些数据的必要代码。
4. 对于上面每一个查询，如果可以，建议使用一个索引以提高
查询性能。要完整地回答这个问题，你必须描述出你编入索引的列
以及你将使用的索引。你必须用一个完整的解释来充分地为你的选择
进行辩护。如果一个索引不适合这个查询，那么要清晰地
解释原因。注意，你可以只使用 PostgreSQL,支持的索引，例如,
hash 或 b-tree。你也可以用聚类。
5. 实现那些你之前提出的索引并且分析步骤 3 中运行的呢些查
询的运行时性能。提交一个表，显示出使用索引后，评估的和实际总
的磁盘页的访问数以及查询运行时间。基于运行时间，计算出由引表
格导致的性能的增加或降低的百分比。如果性能没有因为
索引而得到提高，那么你必须解释原因。你可以用下面的表格作为基
础。
Performance
Without Index With Index
Improvement
Estimated Actual Actual Estimated Actual Actual
Query
Disk Disk Run Disk Disk Run
Number
Accesses Accesses Time Accesses Accesses Time
1
2
3
4
5
6
7

6. 给定一个你已有的数据，索引结构使哪些查询减速了，使哪
些查询加速了？如果 1,2,3 类查询是常见的（75%的发生率），而 4
类是不常见的（25%的发生率），一些查询性能的提高
是否超过了其他查询所降低的？如果 1,2,3 类查询的比率接近 50%并
且 4 类也占 50%，那么你的想法会发生怎样改变？
2. 为了研究学习，你正设计一个用于存储传感器记录的数据库。第
一年，通过插入语句，将从感知器收集到的记录填充到数据库中。第
一年以后，数据库主要用于查询通过 select 语句。你必须决定是否
在最初或是等到第一年以后再索引数据库。从你已经证明的内容，你
将在最初索引数据库还是等到第一年以后再做？解释。
3. 假设你被一个本地医院雇佣作为数据库专家组的一员。一位同事
已经在 PostgreSQL 中实现了一个存储病人信息的表。要求你获取所
有女病人的记录。你写了一个查询 select * from patient where
gender='f'；你主要查询性能是比较差的。现在你那位实现病人表的
同事不在，你有责任确定哪是错误的。你首先描述了病人的表格。
hospital=# \d patient
Table "patient"
Column | Type | Modifiers
--------------------+--------------+-------------
id | integer | not null
firstname | text | not null
lastname | text | not null
title | text |
admissiondate | date |
address | text |
gender | char | default 'f'
Indexes: patient_gender,
patient_id,
patient_firstname,
patient_lastname
Primary key: patient_pkey
你注意到 gender 这一列被索引了并且你仍然不确定为什么查询性能
是不好的。你决定在上述查询上运行一个解释分析。由解释分析生
成的输出如下：
hospital=# explain SELECT * FROM patient WHERE gender='f';
NOTICE: QUERY PLAN:
Seq Scan on patient (cost=0.00..173.07 rows=6406 width=70)
EXPLAIN
你现在明白了为什么查询性能这么差了。尽管在 gender 上有一个索
引，但它好像并没有被使用。你给出了一个查询用于统计表格中男
病人和女病人的数量。你发现两者的分布是各占 50%。解释一下为什
么索引没有被使用。对 patient_gender 进行聚类是否对性能有帮助？
请解释。
4. 回忆一下 PostgreSQL 将表格统计信息存储到一个名为 pg_class
系统表中。查询规划师每次查询都要访问这个表。这些统计只能用分
析命令进行更新。如果分析命令不经常运行，那么表中的统计信息或
许是不正确的并且查询规划者或许做出一些会降低系统性能的不好
的决策。对于一个策略为每一次查询生成这些统计信息（包括查找，
增加，更新，删除）。这个方法可以让查询规划师能够得到最新的统
计信息。为什么 PostgreSQL 不能做这项工作？
提交
将所有问题的答案放在一个名为 indexes.txt 的文件中，提交该文
件。
独立完成这部分的评估，可查阅指导书。

High Society - 1-54!1!22
0% (1)
High Society - 1-54!1!22
22 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
DP Ss3 Note First Term
100% (2)
DP Ss3 Note First Term
43 pages
Gandhi Cloth Company - Integer & Mixed Integer Programming
No ratings yet
Gandhi Cloth Company - Integer & Mixed Integer Programming
3 pages
Index On The Search Key, and Heap Files With An Unclusted Hash Index. Briefly Discuss The
No ratings yet
Index On The Search Key, and Heap Files With An Unclusted Hash Index. Briefly Discuss The
5 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
13 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Jump into JMP Scripting, Second Edition
From Everand
Jump into JMP Scripting, Second Edition
Wendy Murphrey
No ratings yet
Lec6 QP Indexing
No ratings yet
Lec6 QP Indexing
40 pages
Lab 06 (1) (1)
No ratings yet
Lab 06 (1) (1)
8 pages
Database
No ratings yet
Database
4 pages
Perofrmance and Indexes Discussion Questions Solutions PDF
No ratings yet
Perofrmance and Indexes Discussion Questions Solutions PDF
5 pages
Query Optimization in Mysql Database Usi F8e2fb8b
No ratings yet
Query Optimization in Mysql Database Usi F8e2fb8b
7 pages
ICT503_Week_9
No ratings yet
ICT503_Week_9
4 pages
Guc 437 59 31055 2023-05-25T16 41 09
No ratings yet
Guc 437 59 31055 2023-05-25T16 41 09
15 pages
Lab 11 MinchulS
No ratings yet
Lab 11 MinchulS
6 pages
Database Performance Tuning and Query Optimization: Discussion Focus
No ratings yet
Database Performance Tuning and Query Optimization: Discussion Focus
6 pages
Oracle SQL High Performance Tuning: Guy Harrison Director, R&D Melbourne
100% (1)
Oracle SQL High Performance Tuning: Guy Harrison Director, R&D Melbourne
56 pages
K. J. Somaiya College of Engineering, Mumbai-77
No ratings yet
K. J. Somaiya College of Engineering, Mumbai-77
8 pages
Tuning
100% (2)
Tuning
29 pages
PostgreSQL CHEAT SHEET
No ratings yet
PostgreSQL CHEAT SHEET
8 pages
Instant download Oracle 12C SQL 3rd Edition Casteel Solutions Manual pdf all chapter
100% (13)
Instant download Oracle 12C SQL 3rd Edition Casteel Solutions Manual pdf all chapter
41 pages
SQL Query Optimization
No ratings yet
SQL Query Optimization
49 pages
Module 18 - Database Tuning
No ratings yet
Module 18 - Database Tuning
9 pages
Lecture12(CNC 312)
No ratings yet
Lecture12(CNC 312)
36 pages
Query Optimization
No ratings yet
Query Optimization
9 pages
Hitachi
No ratings yet
Hitachi
7 pages
Oracle 12C SQL 3rd Edition Casteel Solutions Manual instant download
100% (2)
Oracle 12C SQL 3rd Edition Casteel Solutions Manual instant download
42 pages
Tuning: Overview: Leccotech
No ratings yet
Tuning: Overview: Leccotech
29 pages
Indexer - Lab
No ratings yet
Indexer - Lab
4 pages
Oracle 12C SQL 3rd Edition Casteel Solutions Manualinstant download
100% (4)
Oracle 12C SQL 3rd Edition Casteel Solutions Manualinstant download
34 pages
Database Performance Optimization. Andrey Avtomonov
100% (1)
Database Performance Optimization. Andrey Avtomonov
26 pages
DBMS A1
No ratings yet
DBMS A1
10 pages
Tuning SQL Queries - Oracle
100% (1)
Tuning SQL Queries - Oracle
27 pages
Midterm 13w2
No ratings yet
Midterm 13w2
8 pages
Index & Query Optimization
No ratings yet
Index & Query Optimization
21 pages
Module3 Question Bank
No ratings yet
Module3 Question Bank
10 pages
Assignment DBMS
No ratings yet
Assignment DBMS
6 pages
SE3060 - Database Systems
No ratings yet
SE3060 - Database Systems
6 pages
Practical Mysql Indexing Guidelines
No ratings yet
Practical Mysql Indexing Guidelines
35 pages
Query Processing
No ratings yet
Query Processing
39 pages
m1 M Tech Topics in Database Technology 01cs6103 Dec 2017
No ratings yet
m1 M Tech Topics in Database Technology 01cs6103 Dec 2017
3 pages
Week 1 Tutorial
No ratings yet
Week 1 Tutorial
7 pages
Bitmap Index vs. B-Tree Index: Which and When?: Published 2005
No ratings yet
Bitmap Index vs. B-Tree Index: Which and When?: Published 2005
29 pages
CS460 Assignment2 2022 EN
No ratings yet
CS460 Assignment2 2022 EN
3 pages
Query Optimization Ways of DB2 To Improve Database
No ratings yet
Query Optimization Ways of DB2 To Improve Database
9 pages
MySQL-Indexing Best Practices (WEBINAR)
No ratings yet
MySQL-Indexing Best Practices (WEBINAR)
41 pages
L11 QueryProcessing I
No ratings yet
L11 QueryProcessing I
42 pages
Worksheet-1-MySQL-XII-2023-24
No ratings yet
Worksheet-1-MySQL-XII-2023-24
5 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
PDF
No ratings yet
PDF
97 pages
Performance and Tuning: Oracle Initialization Parameters Used in The Compilation of PLSQL Units
No ratings yet
Performance and Tuning: Oracle Initialization Parameters Used in The Compilation of PLSQL Units
19 pages
PostgreSQL Database Performance Optimization
No ratings yet
PostgreSQL Database Performance Optimization
59 pages
An in Depth Look at Database Indexing
No ratings yet
An in Depth Look at Database Indexing
3 pages
Index
No ratings yet
Index
23 pages
ADBMS TypicalQueryOptimizer
No ratings yet
ADBMS TypicalQueryOptimizer
30 pages
04 Assign Sol
No ratings yet
04 Assign Sol
6 pages
SQL Tuning
No ratings yet
SQL Tuning
27 pages
Dbms Ques & Ans-1
No ratings yet
Dbms Ques & Ans-1
9 pages
Apache Cassandra Developer Associate - Exam Practice Tests
From Everand
Apache Cassandra Developer Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
ElasticSearch Server
From Everand
ElasticSearch Server
Rafal Kuc
No ratings yet
Biostatistics by Example Using SAS Studio
From Everand
Biostatistics by Example Using SAS Studio
Ron Cody
No ratings yet
Shift &amp Key Lock System-01
No ratings yet
Shift &amp Key Lock System-01
1 page
Activity #4 -
No ratings yet
Activity #4 -
3 pages
Vesda VLF-500
100% (1)
Vesda VLF-500
2 pages
Roadmap
No ratings yet
Roadmap
1 page
Reading For Pleasure
No ratings yet
Reading For Pleasure
28 pages
MCT-222 Embedded Systems: Lecture 5:parameter Passing, Aapcs Pointers, Arrays
No ratings yet
MCT-222 Embedded Systems: Lecture 5:parameter Passing, Aapcs Pointers, Arrays
22 pages
Winchester Powerpoint Rebate
No ratings yet
Winchester Powerpoint Rebate
1 page
6.5.1.3 Packet Tracer Skills Integration Challenge
43% (7)
6.5.1.3 Packet Tracer Skills Integration Challenge
2 pages
CK-E55 H - Generator Set Caterpillar2
No ratings yet
CK-E55 H - Generator Set Caterpillar2
6 pages
window installation 1
No ratings yet
window installation 1
2 pages
Rohini 14151210016
No ratings yet
Rohini 14151210016
3 pages
Account Statement
No ratings yet
Account Statement
7 pages
Law2237 Company Law (Odl)
No ratings yet
Law2237 Company Law (Odl)
2 pages
The Art of Fencing Reduced To Its True Principles Sabre
No ratings yet
The Art of Fencing Reduced To Its True Principles Sabre
16 pages
Instruction Manual: Two-Party Home Blood Pressure Kit
No ratings yet
Instruction Manual: Two-Party Home Blood Pressure Kit
7 pages
Updated Cad Cam Lab Manual
No ratings yet
Updated Cad Cam Lab Manual
16 pages
Planificare-Anuala Upstream Intermediate B2 CLASA A9a L1
No ratings yet
Planificare-Anuala Upstream Intermediate B2 CLASA A9a L1
4 pages
Indent
No ratings yet
Indent
79 pages
Taa 7000
No ratings yet
Taa 7000
4 pages
Africa Tanzania Zambia Transmission Interconnector Project
No ratings yet
Africa Tanzania Zambia Transmission Interconnector Project
17 pages
Winter - 2019 Examination Subject Name: Basic Mathematics Model Answer Subject Code
No ratings yet
Winter - 2019 Examination Subject Name: Basic Mathematics Model Answer Subject Code
18 pages
Introduction To Networking Devices-1
No ratings yet
Introduction To Networking Devices-1
25 pages
Quy 1008 Asm1 5036
No ratings yet
Quy 1008 Asm1 5036
19 pages
Action Plan or ReAP or JEL or Impact Project Template For Non Teaching Personnel 2 1
No ratings yet
Action Plan or ReAP or JEL or Impact Project Template For Non Teaching Personnel 2 1
13 pages
AlagangWency - Partnersip Liquidation Short Quiz
No ratings yet
AlagangWency - Partnersip Liquidation Short Quiz
2 pages
Quiz #2 (3rd Grading)
No ratings yet
Quiz #2 (3rd Grading)
8 pages
MyStuff 2.0 - Guide for the MyStuff 2 App
No ratings yet
MyStuff 2.0 - Guide for the MyStuff 2 App
2 pages
Download ebooks file Iron Men and Tin Fish The Race to Build a Better Torpedo during World War II 1st Edition Anthony Newpower all chapters
100% (7)
Download ebooks file Iron Men and Tin Fish The Race to Build a Better Torpedo during World War II 1st Edition Anthony Newpower all chapters
40 pages

Take Assessment: Exercise 6: Index Choice and Query Optimization

Uploaded by

Take Assessment: Exercise 6: Index Choice and Query Optimization

Uploaded by

Take Assessment: Exercise 6

Please answer the following question(s).

Index Choice and Query Optimization

Customer( customer_id, customer_first_name, customer_last_name )

hospital=# explain SELECT * FROM patient WHERE gender='f';

Seq Scan on patient (cost=0.00..173.07 rows=6406 width=70)

Submit answers to all of these questions in a file named indexes.txt.

© Copyright 2004 iCarnegie, Inc. All rights reserved.

You might also like