0% found this document useful (0 votes)
31 views

Scalable Data Storage For PV Monitoring Systems: 1 Anastasios Kladas 2 Bert Herteleer 3 Jan Cappelle

The document discusses the development of a scalable relational database structure for hosting information related to photovoltaic (PV) research, including data modeling, comparative analysis, and operations and maintenance systems. Time-series compression algorithms and extensions are used to reduce the size of measurement data stored in the database while maintaining query performance. An optimization algorithm is also presented for selecting inputs to the compression algorithm to maximize storage savings without losing necessary information.

Uploaded by

ahmad hermawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Scalable Data Storage For PV Monitoring Systems: 1 Anastasios Kladas 2 Bert Herteleer 3 Jan Cappelle

The document discusses the development of a scalable relational database structure for hosting information related to photovoltaic (PV) research, including data modeling, comparative analysis, and operations and maintenance systems. Time-series compression algorithms and extensions are used to reduce the size of measurement data stored in the database while maintaining query performance. An optimization algorithm is also presented for selecting inputs to the compression algorithm to maximize storage savings without losing necessary information.

Uploaded by

ahmad hermawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Scalable data storage for PV monitoring systems

1st Anastasios Kladas 2nd Bert Herteleer 3rd Jan Cappelle


Faculty of Technology Engineering Faculty of Technology Engineering Faculty of Technology Engineering
KU Leuven KU Leuven KU Leuven
Ghent, Belgium Ghent, Belgium Ghent, Belgium
[email protected] [email protected] [email protected]

Abstract—Efficient PV research which includes a prolonged avoided. Taking the above into consideration, an accessible
data monitoring from multiple experiments with different char- centralized database approach would be beneficial.
arXiv:2209.08879v1 [cs.DB] 19 Sep 2022

acteristics, requires a scalable supporting system to handle all of


Current work presents a scalable and powerful relational
the collected information. This paper presents the development of
a relational database for hosting all the necessary information for database structure for PV research. The relations between the
data modeling, comparative analysis and O&M systems. Ramer- tables are heeding the physical connections between modules-
Douglas-Peucker algorithm and Timescaledb compression are equipment and the objective is the ability to store high defini-
used to decrease the size of the time-series data and increase tion historical and current measurement data while employing
the performance of the queries. A decision-making algorithm is
the minimum amount of disk space, to be fast and not need
presented for selecting the optimal inputs to the Ramer-Douglas-
Peucker algorithm to ensure the maximum disk space savings any major structural changes on additions. This database is
while not losing any of the necessary information. Furthermore, optimized for simulations, comparative analysis, modeling,
alternative ways of implementing the same database are provided. and monitoring purposes but it can be used in a broader range
of applications.
Index Terms—PV monitoring, database, Ramer-Douglas-
Peucker algorithm, SQL
II. DATABASE

I. I NTRODUCTION The whole database is developed using PostgreSQL 13. It


runs on a HP workstation and it is accessible from the local
The urgent need for clean energy production has increased network or remotely after VPN connection.
the PV research [1]. Research institutes, organizations, or com-
panies often need to monitor several measurement sites with A. Structure
different characteristics (PV modules, climate measurements,
The database starts with the operators of every site and
etc.) and make conclusions through comparative analysis. On
ends with the measurement of every sensor (fig. 1 right to
the other hand, as the data science has started to be involved
left). In other words, every measurement is connected with
in PV research [2], [3], there is a need to store efficiently
all information regarding its related hardware and people in
high-resolution historical data for training efficient data-driven
charge. A common approach to storing measurement data is
models.
the usage of NoSQL time-series databases for IoT applications
A common approach presented in PV literature is the data as they lean to be faster and lighter [10], [11]. In this ap-
storage into log files locally on a dedicated PC [4]–[7]. This proach, the measurements table which hosts all measurements
method despite its simplicity requires a big manual effort from is implemented as a time-series table using the extension
the user and a lot of processing power (loading all data before of PostgreSQL, Timescale DB. In this way, a combination
filtering etc.), increasing the hardware requirements especially takes place between the advantages of relational database
in cases of long period analyses. Implementations of relational (robustness and reliability) and the highest performance of
databases can be found in the literature [8]–[10]. The issue NoSQL, while filtering and manipulating the data using the
with those approaches is that their design is motivated by same SQL queries.
a single research objective. Therefore they will need signifi- The graphical representation of the presented schema is
cant modifications on further additions (different experimental displayed in fig. 1 and the utilization of all tables is described
characteristics, additional measurements, adjustable sampling in the table I.
frequency, etc.).
The data storage usually takes place in tables or datasheets III. C OMPRESSION
containing multiple columns for each of the measurements.
Despite its query simplicity, this approach requires a fixed Dedicating one row for each high-resolution measurement
sampling resolution for all readings even if the rate of change (e.g. every second), results in a table of tremendous size. To
between the data points differs among them. Thus, extra disk face this issue, a two-step compression is taking place on the
space will be needed to store all information that could be measurements dedicated table.
Fig. 1. Graph representation of the examined schema generated by DBeaver software.

A. Ramer-Douglas-Peucker algorithm all the data in between them. It resamples them to one second
periods and performs linear interpolation to fill missing data
The purpose of this step is to compress the data in such
points between the samples (if they exist). Then the Ramer-
a way that they can be retrievable via linear interpolation
Douglas-Peucker algorithm is performed to reduce the data
when there is a need of working with them. For that reason
points. Afterward, the new compressed dataset is appended to
the Ramer-Douglas-Peucker algorithm [12], [13] will be used.
the database. Meanwhile, the timestamp of the execution is
This algorithm is used to decrease the data points of a polyline
stored to calculate the time until the next one. Then the data
while preserving its characteristics/shape. Based on the Eu-
between the first and the last timestamp are deleted from the
clidian distance between the points and a factor symbolized
source temporary table.
by the Greek letter epsilon (), it dismisses the data points
that consist (or they are very close) to a line, keeping its first Epsilon is determined by taking into consideration the value
and the last point. The compression size is proportional to the range of the measurement and the fluctuation of the respective
epsilon value. sensors. A practical example is the following.
The data could be transmitted to the server in several ways In this example, the decision-making procedure takes place
(MQTT, HTTP, Cloud). If the data is delivered in a file format, for PAR light measurements. The data has been assessed from
the algorithm is performed to the entire file, as soon as the Apogee Model SP-214 PAR sensors, from the rooftop of the
file arrives. tallest building of KU Leuven, Ghent. A high fluctuation day
Otherwise, on live-streamed data, all data are stored in (upper left fig. 2) has been selected from an old database
temporary tables in a file-based local database (SQLite). On with one second sampling period, based on the maximum
the current system, the sampling period of the live streaming daily average value of the standard deviation for every two
data is one second. An algorithm runs periodically (on the hours interval. For the noise detection of the sensor, an
current running system every five minutes) to move the data observation takes place on its readings during steady state
from the file-based database to the main one, after compressing conditions. As the examined sensor works with irradiance,
them in a separate thread (to not interrupt the data collection the observation takes place at night time where zero values
procedure). The algorithm runs separately for every different are expected. As it can be noticed from fig. 2 (upper right
sensor. More specifically, when the algorithm is executed, it plot), the reading fluctuate between one and six. Therefore
stores the first and the last timestamps from the related table an epsilon equal to five is expected to result in minimal/zero
(on the file-based database) for the examined sensor and loads exploitable information loss. As several epsilon values have
TABLE I
B RIEF EXPLANATION OF EVERY TABLE ROW AND ITS DEPENDENCIES .

Table name Purpose Connected with


Operators Information about the people in charge of every experimental site.
Sites Site information : latitude, longitude, elevation, etc. Operators
Hardware Serial numbers of every device/module that is used
Trackers The PV trackers and their characteristics. Sites and hardware
Inverter datasheets Inverter characteristics
Inverters The inverters used. Sites, hardware and inverted datasheets
Batteries Information about the batteries installed. Inverters, sites and hardware
PV datasheets PV specifications
Modules PV modules used and installation characteristics (Orientation(s), tilt angle(s), PV datasheets, inverters, sites, trackers and
etc.). hardware
Electricity sensors All electricity related sensors (voltage, current, power). Batteries, inverters, sites(for easier queries),
modules and hardware
PV temperature sensors Information about the PV temperature sensors Modules, sites and hardware
Irradiance sensors Information about the irradiance sensors’ installation characteristics (tilt, orien- Sites and hardware
tation).
Ambient temperature sensors Information about the ambient temperature sensors. Sites and hardware
Wind speed sensors Information about the wind speed sensors. Sites and hardware
Wind direction sensors Information about the wind direction sensors. Sites and hardware
Climate sensors Information about the less frequent climate sensors (PAR, humidity etc.). Sites and hardware
Sensors Table were all IDs of sensor tables are concentrated and take a new ID for the All sensor tables
measurements table (as ”Sensor ID”). Trigger functions have been build to the
above sensor tables for automatic inserting their new IDs to this table on every
new sensor entry.
Measurements Table were all measurements are stored. Sensors

been tested (middle left fig. 2), the results show that there minimizing the manual work of the developer or analyst. The
is a tremendous data point decrease comparing to the initial motivation behind the database is the implementation of O&M
signal. More specifically, for epsilon equal to five, 98% less systems able to support the optimized PV performance as well
disk space is required, while for epsilon equal to 25, 99.5%. as to send notifications on unexpected circumstances. Also, it
On the time-series comparison line plots between the signal supports advanced dashboards.
and the compression (middle right fig. 2), it can be seen that The same schema could be implemented using one single
the deviations from the initial signal start to be noticeable for table for the sensor characteristics where a column with JSON
epsilon higher than 10. Therefore, considering that for epsilon data type would be used to store all special sensor type
values higher than five the storage savings are similar while characteristics. In that manner, queries of measurements would
the error metrics are increasing almost linearly (bottom table become much simpler, and also build trigger functions (for in-
fig. 2), epsilon has been selected to be equal to five. serting the IDs to the sensors table on every new sensor insert)
Following the same approach, epsilon values are selected could be avoided. On the other hand, accessing exceptional
for every sensor category. information for every sensor would become more complicated.
The fewer remaining data points lead also to higher query For sake of simplicity, manufacturer characteristics for every
performance on the database layer. sensor are included in the sensor table and not as PV modules
and inverters which have special tables for their datasheets.
B. Timescale DB compression
This results in fewer tables, but repeatable information on
Even more disk space can be saved if the compression built- certain cases. Furthermore, ADC converters and controllers
in function of Timescale DB will be used. More specifically, have been neglected from this version but they should be added
when compression is enabled, it converts the rows of stored if there is a need for a more holistic view of the system.
data into arrays containing the key-value pairs between date- In future versions of this database, an extra time-series table
time and measurements, for every partition of the table (which will be added considering the tilt and orientation of panels with
is based on the sensor identification key), ending up with fewer tracking systems. The additional many-to-many table should
rows. The compression policy of the compression function be added to reflect the physical components, such as links
determines the size of the data stored in each array. In this between PV modules and battery storage when a direct (DC)
work the selected compression policy is one day, meaning link exists between PV and batteries.
that all data from each day are concentrated in one row. Apart
from the saved disk space, compression can also speed up V. C ONCLUSIONS
some queries.
The development of a scalable relational database for host-
IV. D ISCUSSION ing data from PV research has been proposed. The relation-
The proposed data structure is optimized concerning scala- ships between the tables follow the physical connections of
bility and the ability to support a wide range of applications the modules while each system part has its own identity for
Fig. 2. Graphical representation of the procedure to determine the optimal epsilon for each PAR lighting. Upper left: Selection of a day with high fluctuation.
Upper right: Visualization of the sensor fluctuation or noise at night). Middle left: Difference between initial data points and final after the compression with
various epsilon. Middle right: Visualization of the initial along with the compressed signals on a time interval with high fluctuations. Bottom: Table comparing
the initial and the final (after compression and interpolation) data-points.

historical tracking reasons (equipment movement etc.). Every Energy Procedia, vol. 147, pp. 121–129, 2018, international Scientific
measurement should be connected to all physical equipment Conference “Environmental and Climate Technologies”, CONECT
2018, 16-18 May 2018, Riga, Latvia. [Online]. Available: https:
before it, as well as to the operator of the specific site. It has //www.sciencedirect.com/science/article/pii/S1876610218301978
been shown, that on time-series data, the proper usage of the [2] K. Arafet and R. Berlanga-Llavori, “Digital twins in solar farms: An
Ramer-Douglas-Peucker algorithm along with Timescale DB approach through time series and deep learning,” Algorithms, vol. 14,
p. 156, 05 2021.
compression, can save more than 98% of disk space while [3] R. Naik, A. Tiihonen, J. Thapa, C. Batali, Z. Liu, S. Sun, and T. Buonas-
increasing the performance of queries. This database can be sisi, “Discovering equations that govern experimental materials stability
used for modeling, comparative analysis, O&M systems as under environmental stress using scientific machine learning,” npj Com-
putational Materials, vol. 8, p. 72, 04 2022.
well as other applications. [4] U. B. Mujumdar and D. R. Tutkane, “Development of integrated
hardware set up for solar photovoltaic system monitoring,” in 2013
R EFERENCES Annual IEEE India Conference (INDICON), 2013, pp. 1–6.
[1] A. Elamim, B. Hartiti, A. Haibaoui, A. Lfakir, and P. Thevenin, [5] F. Touati, M. Al-Hitmi, N. Chowdhury, J. Hamad, and A. J. Gonzales,
“Performance evaluation and economical analysis of three photovoltaic “Investigation of solar pv performance under doha weather using a
systems installed in an institutional building in errachidia, morocco,” customized measurement and monitoring system,” Renewable Energy,
vol. 89, pp. 564–577, 04 2016.
[6] H. Gad and H. E. Gad, “Development of a new temperature
data acquisition system for solar energy applications,” Renewable
Energy, vol. 74, pp. 337–343, 2015. [Online]. Available: https:
//www.sciencedirect.com/science/article/pii/S0960148114004649
[7] B. Herteleer, B. Huyck, F. Catthoor, J. Driesen, and J. Cappelle,
“Normalised efficiency of photovoltaic systems: Going beyond the
performance ratio,” Solar Energy, vol. 157, pp. 408–418, 2017.
[Online]. Available: https://www.sciencedirect.com/science/article/pii/
S0038092X1730717X
[8] T. Zdanowicz, M. Prorok, W. Kolodenny, and H. Roguszczak, “Out-
door data acquisition system with advanced database for pv modules
characterization,” 06 2003, pp. 2497 – 2500 Vol.3.
[9] A. Meliones and A. Nouvaki, “A web-based three-tier control and mon-
itoring application for integrated facility management of photovoltaic
systems,” Applied Computing and Informatics, vol. 10, 01 2014.
[10] A. Nihar, A. J. Curran, A. M. Karimi, J. L. Braid, L. S. Bruckman,
M. Koyutürk, Y. Wu, and R. H. French, “Toward findable, accessible,
interoperable and reusable (fair) photovoltaic system time series data,”
in 2021 IEEE 48th Photovoltaic Specialists Conference (PVSC), 2021,
pp. 1701–1706.
[11] A. Perçuku, D. Minkovska, L. Stoyanova, and A. Abdullahu, “Iot using
raspberry pi and apache cassandra on pv solar system,” 09 2020, pp.
1–5.
[12] U. Ramer, “An iterative procedure for the polygonal approximation
of plane curves,” Computer Graphics and Image Processing,
vol. 1, no. 3, pp. 244–256, 1972. [Online]. Available: https:
//www.sciencedirect.com/science/article/pii/S0146664X72800170
[13] D. H. Douglas and T. K. Peucker, “Algorithms for the reduction of the
number of points required to represent a digitized line or its caricature,”
Cartographica: The International Journal for Geographic Information
and Geovisualization, vol. 10, pp. 112–122, 1973.

You might also like