Partition Table in STARS Concept and Evaluations
Partition Table in STARS Concept and Evaluations
LICENSE
CC BY 4.0
13-10-2022 / 17-10-2022
CITATION
Tanaem, Penidas Fiodinggo; Tanaamah, Andeka Rocky; boymau, infraim oktofianus (2022): Partition Table in
STARS: Concept and Evaluations. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.21324264.v1
DOI
10.36227/techrxiv.21324264.v1
Page 1 of 26 IEEE Internet Computing
TABLE PARTITION
1
study is that the combined approach (SPT + VP) applied to the SWCU STARS database. The results to be
2
outperforms advanced systems based on in-memory achieved from this research are to produce the partition
3
processing engines in a distributed environment.4 concept used in the STARS system by evaluating the
4
Furthermore Hassan and Bansal also proposed a concept.
5
relational partitioning scheme called Property Table
6
7
Partition (PTP) for RDF data, which further partitioned the TABLE PARTITION
existing Property Table into several tables based on Table Partition is a technique that can be used to
8
different properties to minimize data input and combined divide data in a table into smaller tables that can be
9
operations known as S3QLRDF, which was built on top of managed independently.1,2,4,8-10 Or it can also be interpreted
10
that. Spark and SPARQL via PTP scheme. In this study, an as a technique that can be used to map or distribute data
11
experimental evaluation related to preprocessing costs and from the main table into small tables to increase the
12
query performance was carried out, using the Lehigh efficiency of the database used. This technique also aims
13
University Benchmark (LUBM) and the Waterloo to reduce the number of physical reads on a database when
14
SPARQL Diversity Test Suite (WatDiv) dataset of up to a query is executed or executed.1 For example, in the
15
1.4 billion triples. The results obtained are that S3QLRDF Postgesql database there are three types of partitions, 11
16
outperforms sophisticated distributed RDF management namely:
17
systems.5
18 ➔ Range: This partition type is used to partition based
Subsequent research from Salgova and Matiasko,
19 on the specified range of data groups. For example
which compares various partitioning techniques that can be
20 partition by date, year, age etc.
implemented into a database by considering access
21
efficiency. There are several techniques used, namely ➔ List: This partition type is done explicitly by listing
22
Hash, Range, and List partition. The experiment was
23 which key values appear in each partition.
carried out using Oracle 11g. The results obtained that
24
significantly indicate that access to the data partition can ➔ Hash: This type of partition is done by specifying
25 the modulus and remainder for each partition. Each
significantly reduce the access time to the database. The
26 partition will hold a row whose hash value of the
use of the range partitioning technique provides
27 partition key divided by the specified modulus will
significantly improved access times when less than 30 of
28 produce the specified remainder.
the 34 partitions are accessed. With access to 30 or more
29
partitions, partitionless data storage proves to be more
30 Meanwhile, in terms of relations, there are 2 types of
appropriate. When using a list partition, there is a more
31 partitions, namely:
significant increase in performance time, and this
32
improvement persists even when accessing some or all of ➔ Horizontal: i.e. placing different rows into different
33
the partitions. In conclusion, it can be evaluated from the tables.2,9,12–14
34
results that partitioning can significantly increase data
35 ➔ Vertical: create a table with fewer columns and use
access time.6
36 an additional table to store the remaining
On the other hand, STARS is a system used to carry
37 columns.4,13
out a management process that focuses on the SWCU
38
student field.7 There are several subsystems that continue
39 In this study, it focuses more on the use of vertical
to be developed at STARS, one of which is the Student
40 partitions and list partitions by utilizing postgreSql version
Activity Credit (KKM). The KKM system at STARS itself
41 14.
was developed to do a recap for each student activity. With
42
43
the large number of students within SWCU, ranging from METHOD
15000 to 16000, it will greatly affect the performance of
44 There are stages used in this research. These stages
STARS itself. Where each student has different student
45 include data collection, partition design, technical
activities, which will later affect STARS itself. Thus we
46 partition, testing, and implementation. Data collection is
need a concept that can be used or implemented into
47 the stage used to collect any data that will be used for
STARS to produce the information needed by users
48 partitioning. From the results of the data collection
efficiently.
49 obtained, the partition design is then carried out. From the
Referring to the existing description, this research will
50 partition results, then it will be implemented into the
design and evaluate the partition table which will then be
51 partition form that is agreed upon from the design results.
52
53
54
55 2 Publication Title Month Year
56
57
58
59
60
Page 3 of 26 IEEE Internet Computing
TABLE PARTITION
1
The next stage is testing to find out the efficiency of the s4 1 1 2
2
developed partition. The last stage is implementation.
3 s5 1 1 1 3
4
5 Total
participan
6
ts 5 1 2 2 1 3 14
7
8 Based on table 1, the data to be partitioned from the
9 participant master table can be denoted as follows:
10 FIGURE 1. Research methods. master_participant = {a1, a4, a6, a1, a3, a1, a2, a4, a6, a1,
11 a3, a1, a5, a6}, or there are 14 students participating in all
12 DISCUSSION activities. Then the partition table for participant that can
13 be formed is as follows: p(s1) = {a1, a4, a6}, p(s2) = {a1,
14 a3}; p(s3) = {a1, a2, a4, a6}, p(s4) = {a1, a3}, and p(s5) =
15 {a1, a5, a6}.
Design
16 While the partition of the master participation table is
To show the simulation of the partition design that
17 denoted as follows: master_participation = {a1, a1, a1,
will be used, in this case two data will be used as samples,
18 a1, a1, a2, a3, a3, a4, a4, a5, a6, a6, a6}, or there are 14
namely participant data (s) and participation data (a) or
19 student participations in all activities. Then the partition
interpreted by a master table, namely the participation
20 table for participation that can be formed is as follows:
master table and the participant master table. The partition
21 p(a1) = {s1, s2, s3, s4, s5}, p(a2) = {s3}, p(a3) = {s2,
will be created in the form of a table with a horizontal type
22 s4}, p(a4) = {s1, s3}, p(a5) = {s5}, and p(a6) = {s1, s3,
and a list type. The initial assumption used is that each
23
student or participant data will be partitioned (table 1), s5}.
24
because each student has different or the same activity With the partition concept like this, the client will
25
data. So, if there are five students, then the participant find it easier to access the same data from different
26
partitions that will be created are five partition tables. The partitions. For example, when a student wants to collect
27
process of dividing this partition (participant partition) is data on his participation in an existing activity, the
28
carried out by identifying the primary key that has been student (s1) will be directed to access the p(s1) partition.
29
determined, in this case the student parent number. This is
30 On the other hand, if the admin will access the
the same as the participation partition, where the primary
31 participant data (a1), then the admin will be directed to
key used to create the participation partition table is the
32 access the p(a1) partition. Another goal is to facilitate
activity id. With this concept, the partition growth will
33 the system in making reports or reports.
increase exponentially as “s” and “a” increase. Thus the
34
notation to be used is as follows:
35
36 𝑠 = {𝑠1 , 𝑠2 , 𝑠3 , 𝑠4 , 𝑠5 , … , 𝑠𝑁 } (1) Technical Partition.
37 = {𝑠𝑛 } 𝑁 =1 (2) From the design obtained, then at the technical stage
𝑛
38 of the partition, an adapted development will be carried
39 𝑠 = {𝑎1 , 𝑎2 , 𝑎3 , 𝑎4 , 𝑎5 , … , 𝑎𝑁 } (3)
out. Partition technical includes how to generate partitions,
40 = {𝑎𝑛 } 𝑁 =1 (4) delete partitions, and implement CRUD functions on
𝑛
41 STARS, where each process will be carried out at the
42 TABLE 1. Partition matrix table.
backend layer of STARS and not at the database layer. This
43 step was taken with the aim of reducing the workload of
44 Total
the database.
participati
45 Every function such as generate partition, delete
s\a a1 a2 a3 a4 a5 a6 on
46 partition, create, update, and delete, is executed using
47 s1 1 1 1 3 TRANSACTION COMMIT or TRANSACTION
48 ROLLBACK in addition to the SELECT function, because
49 s2 1 1 2
SELECT is only used to retrieve data without any changes
50 s3 1 1 1 1 4 in the database. COMMIT is used to ensure that every SQL
51 syntax that is used is running properly, while ROLLBACK
52
53
54
55 Month Year Publication Title 3
56
57
58
59
60
IEEE Internet Computing Page 4 of 26
TABLE PARTITION
1
is used to detect if there is an error in executing the syntax AUTO VACUUM is used to remove the fragmented space
2
which will then return the database to its previous state. on tables and indexes that are generated when the
3
In executing each Create syntax, the system will first transaction is executed successfully. Data should be
4
execute create table partition participant (“CREATE regularly vacuumed to ensure the health and better
5
TABLE IF NOT EXISTS participant_partition_s1 performance of the database. The last one is the
6
PARTITION OF participant FOR VALUES IN ('s1');”) shared_buffers = 512MB configuration. The role of
7
and partition table participation (“CREATE TABLE IF shared_buffers itself is to determine the amount of memory
8
NOT EXISTS participation_partition_a1 PARTITION OF that PostgreSQL can use for caching. This caching
9
participation FOR VALUES IN ('a1');"). Where, if the mechanism is used to store the contents of tables and
10
table in question is not found, then the table will be created indexes in memory.11
11
into the database which is adjusted to the specified primary There are several scenarios used during testing. The
12
key values, if the table in question is found then the system first scenario is in the partition and non-partition tables,
13
will then execute the create function. When the create there are 25 million data used as samples. Testing is
14
function is executed, the system will execute two insert applied to two tables, namely the x_participant table (non-
15
syntaxes, namely insert into the partition table partition table) and the participant_partition_$1 table
16
participant_partition_s1 and insert into the partition table (partitioned table). The following scenario is testing will
17
participation_partition_a1. Furthermore, if every executed begin with insert then update, delete, and end with select.
18
syntax is executed successfully, then the next action is Inserts are performed 1k times with a time limit of 100
19
COMMIT. Otherwise, ROLLBACK will be executed. This seconds, where every second 10 insert processes will be
20
process is applied to the Update, Delete, and Drop the executed in the database. This configuration applies the
21
partition syntax. same as when updating and deleting. While the select is
22
Next is the read function. Read itself is used to retrieve slightly different, where the select is executed by 10k in
23
data from the database. This function is executed by using 100 seconds, which means every second there will be 100
24
the database syntax, namely SELECT. Where SELECT selects used. The last scenario is that the insert, update,
25
will be addressed specifically to the partition table in delete, and select processes only use standard processes
26
question (“SELECT * FROM participant_partition_s1 combined with where clauses for update, delete, and select,
27
ORDER BY id ASC, nim ASC;”). This is done to reduce other than that they are not used (Table 2). These scenarios
28
processing time in the database compared to directly will be executed using a tool, namely apache jmeter.15
29
fetching from the master table (“SELECT * FROM
30 TABLE 2. Sintaks Test.
participant ORDER BY id ASC, nim ASC;”), because the
31
stack of data in the master table will be more than the
32 Partition Query
partition table.
33
34 N UPDATE x_participant SET
35 catatan=$1 WHERE nim=$2 and
36 Testing kegiatan = $3
37 The testing process is carried out using a computer
with the following specifications: Y UPDATE participant_partition_$1
38
39 OS: Ubuntu 20.04.5 LTS; SET catatan=$2 WHERE nim=$3 and
40 Memory: 15,5 GiB; kegiatan = $4
41 Processor: AMD® Ryzen 7 5800x 8-core processor ×
N DELETE FROM x_participant
42 16;
Database: Postgresql 14. WHERE nim=$1 and kegiatan = $2
43
44 In the testing process of what has been made, several Y DELETE FROM
45 configurations or adjustments are used to improve
participant_partition_$1 WHERE
46 performance on partition tables and non-partition tables.
nim=$2 and kegiatan = $3
47 The first is indexing using btree from postgres, where each
48 table (partition and non-partition) is indexed on each of the N SELECT * FROM x_participant
49 5 columns namely id, nim, kegiatan, catatan, and ts. The WHERE nim=$1 and kegiatan = $2
50 advantage of indexing itself is to improve the performance
51 of data-based I/O. The following is AUTO VACUUM.
52
53
54
55 4 Publication Title Month Year
56
57
58
59
60
Page 5 of 26 IEEE Internet Computing
TABLE PARTITION
1
Y SELECT * FROM condition is caused on the basis that if the data is stored
2
participant_partition_$1 WHERE continuously, then Postgres will continue to look for the
3
nim=$2 and kegiatan = $3 key in the existing partition table. So if there are 16K
4
partition tables, then there will also be a stack of 16K
5 N INSERT INTO x_participant(nim, partition keys, so the time it takes will depend on how often
6
kegiatan, catatan, ts) VALUES ($1, the partitions have to be switched when the insert syntax is
7
$2, $3, $4); executed on the partition table.
8
From syntax execution to update, delete, and select,
9 Y INSERT INTO partition tables are much faster than non-partitioned tables
10 x_participant_partition_$1(nim, in terms of average execution time. The update is 0.16 ms
11 kegiatan, catatan, ts) VALUES ($2, for the partition table and 0.39 ms for the non-partitioned
12 $3, $4, $5); table or partition table 58.86% faster than the non-
13
partitioned table. While delete, 0.03 ms for partitions and
14
0.39 ms for non-partitioned or partitions is 90.18% faster
15
than non-partitioned. The last one is select. Select itself is
16
slightly different from the previous three syntaxes, in that
17
select has more execution processes to find out the success
18
rate of each execution. However, the select process on both
19
tables was actually successfully executed or executed
20
without any failure. This is evidenced by the presentation
21
of the existing hits (figure 2c) select. As for the average
22
execution time, the partition table is too far from the non-
23
partitioned around 86.53% for the select syntax. Where
24
select on partition can be executed in 0.02 ms and 0.26 ms
25
for non-partition.
26
Based on the descriptions of the existing discussions,
27
the hypothesis that can be drawn is that from the three
28
existing database syntaxes namely update, delete, and
29
select, the partition table can work optimally in executing
30 FIGURE 2. Testing Results. the three syntaxes. In contrast to the insert where the
31
The whole experiment was carried out by ensuring partition table is longer than the non-partition because of
32
that every executed syntax was executed successfully the large number of key stacks that exist based on the
33
without any failures (Figure 2c) with a hit percentage of number of partitions built.
34
35 100%. The results obtained from this test are as follows.
36 The first is the execution of the insert syntax, seen from the
37 Average (Figure 2a), Total (Figure 2b), Standard Deviation Implementation
38 (Figure 2d), Minimum (Figure 2e), and Maximum (Figure The results of this database development are
39 2f) execution time. The Insert syntax for the two tables has implemented on a server with the following specifications:
40 a significant difference, around 39.71%, where the OS: Ubuntu server 20.04 lts;
41 partition table has a value of 0.32 ms and the non- Processor: 4 core;
42 partitioned table has a value of 0.19 ms. Meanwhile, to RAM: 4 gb;
43 execute the entire insert syntax on the partition table it Database: Postgresql 14.
44 takes 325.28 ms and 196.10 ms for non-partitioned tables. As for the results of the implementation for the last
45 While the population of the time spent executing the syntax two years since the partition was implemented, no
46 is 0.05 ms for partitioned tables and 0.03 ms for non- problems were found that affect the sustainability of the
47 partitioned tables. Next is the fastest time to insert in the STARS system from the database side. The impact of this
48 partition table is 0.15 ms and the longest time is 0.67 ms, implementation is to increase the response time from the
49 while the non-partition table is the fastest 0.08 ms and the system to the client.
50 longest is 0.37 ms. From these conditions, the insert on the
51 partition table is slower than the non-partitioned table. This
52
53
54
55 Month Year Publication Title 5
56
57
58
59
60
IEEE Internet Computing Page 6 of 26
TABLE PARTITION
1 IEEE Int. Conf. Emerg. eLearning Technol. Appl. Proc., pp.
2 CONCLUSION 564–569, 2020, doi: 10.1109/ICETA51985.2020.9379231.
3 The conclusion that can be drawn from this research 7. P. F. Tanaem, A. F. Wijaya, A. D. Manuputty, and G. N.
4 is the application of the partition table in the database can Huwae, “Penerapan RESTFul Web Service Pada Disain
Arsitektur Sistem Informasi Pada Perguruan Tinggi (Studi
5 affect the performance of a system, in this case is STARS. Kasus: STARS UKSW),” JASIEK (Jurnal Apl. Sains,
6 This can be seen from the average time it takes to execute Informasi, Elektron. dan Komputer), vol. 2(1), no. 1, pp. 11–
7 a syntax, the amount of time it takes for the entire syntax 20, 2020.
8. H. Yin, S. Yang, H. Zhao, and Z. Chen, “Partial query
8 to execute, etc. One of the important notes is, in executing optimization techniques for partitioned tables,” Proc. - 2012
9 the insert syntax, there is a problem encountered, namely 9th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2012, no.
10 the slow execution time of the syntax from the partition Fskd, pp. 792–797, 2012, doi:
10.1109/FSKD.2012.6234270.
11 table compared to the non-partitioned table. However, this 9. L. Qu et al., “Are current benchmarks adequate to evaluate
12 can be paid off with the execution time of the update, distributed transactional databases?,” BenchCouncil Trans.
13 delete, and select syntax of the partition table which is too Benchmarks, Stand. Eval., vol. 2, no. 1, p. 100031, 2022, doi:
14 much faster than the non-partition table. The last thing is, 10.1016/j.tbench.2022.100031.
10. F. C. Daeng Bani, Suharjito, Diana, and A. S. Girsang,
15 this research can produce a design, the concept of the “Implementation of Database Massively Parallel Processing
16 partition table used in STARS which is supported by System to Build Scalability on Process Data Warehouse,”
17 evaluations carried out at the execution time obtained. Procedia Comput. Sci., vol. 135, pp. 68–79, 2018, doi:
10.1016/j.procs.2018.08.151.
18 From the results obtained from this study, it is 11. Postgresql, “Table Partitioning,” Postgresql, 2022.
19 necessary to conduct further research related to: https://www.postgresql.org/docs/current/ddl-
20 partitioning.html (accessed Sep. 16, 2022).
➔ increase the execution time of inserts in the 12. D. Vashi, H. B. Bhadka, K. Patel, and S. Garg, “An Efficient
21 Hybrid Approach of Attribute Based Encryption for Privacy
partition table.
22 Preserving Through Horizontally Partitioned Data,” Procedia
23 ➔ Partition table for real-time data consumption Comput. Sci., vol. 167, no. 2019, pp. 2437–2444, 2020, doi:
10.1016/j.procs.2020.03.296.
24 on STARS. 13. A. Dahal, “A Clustering Based Vertical Fragmentation and
25 Allocation of a Distributed Database,” 2019 Artif. Intell.
26 ACKNOWLEDGMENTS Transform. Bus. Soc., vol. 1, pp. 1–5.
27 The author would like to thank the SWCU student
14. M. Abhishek Nair, A. Dewangan, and A. Geetha Mary,
“Efficient Retrieval of Data from Cloud Databases using
28 service agency bureau for giving permission and support to Hash Partitioned Buckets,” 2019 Innov. Power Adv.
29 the author to conduct this research. Comput. Technol. i-PACT 2019, pp. 1–7, 2019, doi:
30 10.1109/i-PACT44901.2019.8960047.
15. A. JMeter, “Apache JMeter,” Apache JMeter, 2022.
31 REFERENCES https://jmeter.apache.org/ (accessed Oct. 02, 2022).
32 1. A. Viloria, G. C. Acuña, D. J. A. Franco, H. Hernández-
33 Palma, J. P. Fuentes, and E. P. Rambal, “Integration of data PENIDAS FIODINGGO TANAEM is STARS developers at
34 mining techniques to postgresQL database manager system,” Satya Wacana Christian University at Salatiga, Central Java,
Procedia Comput. Sci., vol. 155, no. 2018, pp. 575–580,
35 2019, doi: 10.1016/j.procs.2019.08.080. 50711, Indonesia. His research interests include Restful web-
36 2. K. S. Maabreh, “Optimizing Database Query Performance services, websockets, and database development. Tanaem
37 Using Table Partitioning Techniques,” ACIT 2018 - 19th Int. received his graduation degree in restful web service for
Arab Conf. Inf. Technol., pp. 1–4, 2019, doi:
38 transaction recording system from Satya Wacana Christian
10.1109/ACIT.2018.8672584.
39 3. N. Tabassam and R. Obermaisser, “Minimizing the Make University. He is a member at Information Technology
40 Span of Diagnostic Multi-Query Graphs Using Query Aware Governance and Management Study Center. Contact him at
41 Partitioning in Embedded Real-Time Systems,” Proc. - 2018
[email protected].
Int. Conf. Promis. Electron. Technol. ICPET 2018, no. 2, pp.
42 1–7, 2018, doi: 10.1109/ICPET.2018.00007.
43 4. M. Hassan and S. K. Bansal, “Data Partitioning Scheme for ANDEKA ROCKY TANAAMAH is project manager and
44 Efficient Distributed RDF Querying Using Apache Spark,” vice chancellor for student affairs at Satya Wacana Christian
Proc. - 13th IEEE Int. Conf. Semant. Comput. ICSC 2019,
45 pp. 24–31, 2019, doi: 10.1109/ICOSC.2019.8665614.
University at Salatiga, Central Java, 50711, Indonesia. His
46 5. M. Hassan and S. K. Bansal, “S3QLRDF: Property Table research interests include IT governance, decision support
47 Partitioning Scheme for Distributed SPARQL Querying of system, and knowledge management. Tanaamah received his
large-scale RDF data,” Proc. - 2020 IEEE Int. Conf. Smart
48 Data Serv. SMDS 2020, pp. 133–140, 2020, doi:
graduation degree in development studies from Satya Wacana
49 10.1109/SMDS49396.2020.00023. Christian University. He is a member at Information
50 6. V. Salgova and K. Matiasko, “Reducing Data Access Time Technology Governance and Management Study Center.
51 using Table Partitioning Techniques,” ICETA 2020 - 18th Contact him at [email protected].
52
53
54
55 6 Publication Title Month Year
56
57
58
59
60
Page 7 of 26 IEEE Internet Computing
TABLE PARTITION
1
INFRAIM OKTOFIANUS BOYMAU is graduate student at
2
Satya Wacana Christian University at Salatiga, Central Java,
3
50711, Indonesia. Contact him at
4
[email protected].
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55 Month Year Publication Title 7
56
57
58
59
60