Scalable Data Storage For PV Monitoring Systems: 1 Anastasios Kladas 2 Bert Herteleer 3 Jan Cappelle
Scalable Data Storage For PV Monitoring Systems: 1 Anastasios Kladas 2 Bert Herteleer 3 Jan Cappelle
Abstract—Efficient PV research which includes a prolonged avoided. Taking the above into consideration, an accessible
data monitoring from multiple experiments with different char- centralized database approach would be beneficial.
arXiv:2209.08879v1 [cs.DB] 19 Sep 2022
A. Ramer-Douglas-Peucker algorithm all the data in between them. It resamples them to one second
periods and performs linear interpolation to fill missing data
The purpose of this step is to compress the data in such
points between the samples (if they exist). Then the Ramer-
a way that they can be retrievable via linear interpolation
Douglas-Peucker algorithm is performed to reduce the data
when there is a need of working with them. For that reason
points. Afterward, the new compressed dataset is appended to
the Ramer-Douglas-Peucker algorithm [12], [13] will be used.
the database. Meanwhile, the timestamp of the execution is
This algorithm is used to decrease the data points of a polyline
stored to calculate the time until the next one. Then the data
while preserving its characteristics/shape. Based on the Eu-
between the first and the last timestamp are deleted from the
clidian distance between the points and a factor symbolized
source temporary table.
by the Greek letter epsilon (), it dismisses the data points
that consist (or they are very close) to a line, keeping its first Epsilon is determined by taking into consideration the value
and the last point. The compression size is proportional to the range of the measurement and the fluctuation of the respective
epsilon value. sensors. A practical example is the following.
The data could be transmitted to the server in several ways In this example, the decision-making procedure takes place
(MQTT, HTTP, Cloud). If the data is delivered in a file format, for PAR light measurements. The data has been assessed from
the algorithm is performed to the entire file, as soon as the Apogee Model SP-214 PAR sensors, from the rooftop of the
file arrives. tallest building of KU Leuven, Ghent. A high fluctuation day
Otherwise, on live-streamed data, all data are stored in (upper left fig. 2) has been selected from an old database
temporary tables in a file-based local database (SQLite). On with one second sampling period, based on the maximum
the current system, the sampling period of the live streaming daily average value of the standard deviation for every two
data is one second. An algorithm runs periodically (on the hours interval. For the noise detection of the sensor, an
current running system every five minutes) to move the data observation takes place on its readings during steady state
from the file-based database to the main one, after compressing conditions. As the examined sensor works with irradiance,
them in a separate thread (to not interrupt the data collection the observation takes place at night time where zero values
procedure). The algorithm runs separately for every different are expected. As it can be noticed from fig. 2 (upper right
sensor. More specifically, when the algorithm is executed, it plot), the reading fluctuate between one and six. Therefore
stores the first and the last timestamps from the related table an epsilon equal to five is expected to result in minimal/zero
(on the file-based database) for the examined sensor and loads exploitable information loss. As several epsilon values have
TABLE I
B RIEF EXPLANATION OF EVERY TABLE ROW AND ITS DEPENDENCIES .
been tested (middle left fig. 2), the results show that there minimizing the manual work of the developer or analyst. The
is a tremendous data point decrease comparing to the initial motivation behind the database is the implementation of O&M
signal. More specifically, for epsilon equal to five, 98% less systems able to support the optimized PV performance as well
disk space is required, while for epsilon equal to 25, 99.5%. as to send notifications on unexpected circumstances. Also, it
On the time-series comparison line plots between the signal supports advanced dashboards.
and the compression (middle right fig. 2), it can be seen that The same schema could be implemented using one single
the deviations from the initial signal start to be noticeable for table for the sensor characteristics where a column with JSON
epsilon higher than 10. Therefore, considering that for epsilon data type would be used to store all special sensor type
values higher than five the storage savings are similar while characteristics. In that manner, queries of measurements would
the error metrics are increasing almost linearly (bottom table become much simpler, and also build trigger functions (for in-
fig. 2), epsilon has been selected to be equal to five. serting the IDs to the sensors table on every new sensor insert)
Following the same approach, epsilon values are selected could be avoided. On the other hand, accessing exceptional
for every sensor category. information for every sensor would become more complicated.
The fewer remaining data points lead also to higher query For sake of simplicity, manufacturer characteristics for every
performance on the database layer. sensor are included in the sensor table and not as PV modules
and inverters which have special tables for their datasheets.
B. Timescale DB compression
This results in fewer tables, but repeatable information on
Even more disk space can be saved if the compression built- certain cases. Furthermore, ADC converters and controllers
in function of Timescale DB will be used. More specifically, have been neglected from this version but they should be added
when compression is enabled, it converts the rows of stored if there is a need for a more holistic view of the system.
data into arrays containing the key-value pairs between date- In future versions of this database, an extra time-series table
time and measurements, for every partition of the table (which will be added considering the tilt and orientation of panels with
is based on the sensor identification key), ending up with fewer tracking systems. The additional many-to-many table should
rows. The compression policy of the compression function be added to reflect the physical components, such as links
determines the size of the data stored in each array. In this between PV modules and battery storage when a direct (DC)
work the selected compression policy is one day, meaning link exists between PV and batteries.
that all data from each day are concentrated in one row. Apart
from the saved disk space, compression can also speed up V. C ONCLUSIONS
some queries.
The development of a scalable relational database for host-
IV. D ISCUSSION ing data from PV research has been proposed. The relation-
The proposed data structure is optimized concerning scala- ships between the tables follow the physical connections of
bility and the ability to support a wide range of applications the modules while each system part has its own identity for
Fig. 2. Graphical representation of the procedure to determine the optimal epsilon for each PAR lighting. Upper left: Selection of a day with high fluctuation.
Upper right: Visualization of the sensor fluctuation or noise at night). Middle left: Difference between initial data points and final after the compression with
various epsilon. Middle right: Visualization of the initial along with the compressed signals on a time interval with high fluctuations. Bottom: Table comparing
the initial and the final (after compression and interpolation) data-points.
historical tracking reasons (equipment movement etc.). Every Energy Procedia, vol. 147, pp. 121–129, 2018, international Scientific
measurement should be connected to all physical equipment Conference “Environmental and Climate Technologies”, CONECT
2018, 16-18 May 2018, Riga, Latvia. [Online]. Available: https:
before it, as well as to the operator of the specific site. It has //www.sciencedirect.com/science/article/pii/S1876610218301978
been shown, that on time-series data, the proper usage of the [2] K. Arafet and R. Berlanga-Llavori, “Digital twins in solar farms: An
Ramer-Douglas-Peucker algorithm along with Timescale DB approach through time series and deep learning,” Algorithms, vol. 14,
p. 156, 05 2021.
compression, can save more than 98% of disk space while [3] R. Naik, A. Tiihonen, J. Thapa, C. Batali, Z. Liu, S. Sun, and T. Buonas-
increasing the performance of queries. This database can be sisi, “Discovering equations that govern experimental materials stability
used for modeling, comparative analysis, O&M systems as under environmental stress using scientific machine learning,” npj Com-
putational Materials, vol. 8, p. 72, 04 2022.
well as other applications. [4] U. B. Mujumdar and D. R. Tutkane, “Development of integrated
hardware set up for solar photovoltaic system monitoring,” in 2013
R EFERENCES Annual IEEE India Conference (INDICON), 2013, pp. 1–6.
[1] A. Elamim, B. Hartiti, A. Haibaoui, A. Lfakir, and P. Thevenin, [5] F. Touati, M. Al-Hitmi, N. Chowdhury, J. Hamad, and A. J. Gonzales,
“Performance evaluation and economical analysis of three photovoltaic “Investigation of solar pv performance under doha weather using a
systems installed in an institutional building in errachidia, morocco,” customized measurement and monitoring system,” Renewable Energy,
vol. 89, pp. 564–577, 04 2016.
[6] H. Gad and H. E. Gad, “Development of a new temperature
data acquisition system for solar energy applications,” Renewable
Energy, vol. 74, pp. 337–343, 2015. [Online]. Available: https:
//www.sciencedirect.com/science/article/pii/S0960148114004649
[7] B. Herteleer, B. Huyck, F. Catthoor, J. Driesen, and J. Cappelle,
“Normalised efficiency of photovoltaic systems: Going beyond the
performance ratio,” Solar Energy, vol. 157, pp. 408–418, 2017.
[Online]. Available: https://www.sciencedirect.com/science/article/pii/
S0038092X1730717X
[8] T. Zdanowicz, M. Prorok, W. Kolodenny, and H. Roguszczak, “Out-
door data acquisition system with advanced database for pv modules
characterization,” 06 2003, pp. 2497 – 2500 Vol.3.
[9] A. Meliones and A. Nouvaki, “A web-based three-tier control and mon-
itoring application for integrated facility management of photovoltaic
systems,” Applied Computing and Informatics, vol. 10, 01 2014.
[10] A. Nihar, A. J. Curran, A. M. Karimi, J. L. Braid, L. S. Bruckman,
M. Koyutürk, Y. Wu, and R. H. French, “Toward findable, accessible,
interoperable and reusable (fair) photovoltaic system time series data,”
in 2021 IEEE 48th Photovoltaic Specialists Conference (PVSC), 2021,
pp. 1701–1706.
[11] A. Perçuku, D. Minkovska, L. Stoyanova, and A. Abdullahu, “Iot using
raspberry pi and apache cassandra on pv solar system,” 09 2020, pp.
1–5.
[12] U. Ramer, “An iterative procedure for the polygonal approximation
of plane curves,” Computer Graphics and Image Processing,
vol. 1, no. 3, pp. 244–256, 1972. [Online]. Available: https:
//www.sciencedirect.com/science/article/pii/S0146664X72800170
[13] D. H. Douglas and T. K. Peucker, “Algorithms for the reduction of the
number of points required to represent a digitized line or its caricature,”
Cartographica: The International Journal for Geographic Information
and Geovisualization, vol. 10, pp. 112–122, 1973.