n today’s changing world huge amount of data is ge
nerated and transferred frequently.
Although the data is sometimes static but most comm
only it is dynamic and transactional. New
data that is being generated is getting constantly
added to the old/existing data. To discover the
knowledge from this incremental data, one approach
is to run the algorithm repeatedly for the
modified data sets which is time consuming. The pap
er proposes a dimension reduction
algorithm that can be applied in dynamic environmen
t for generation of reduced attribute set as
dynamic reduct.
The method analyzes the new dataset, when it become
s available, and modifies
the reduct accordingly to fit the entire dataset. T
he concepts of discernibility relation, attribute
dependency and attribute significance of Rough Set
Theory are integrated for the generation of
dynamic reduct set, which not only reduces the comp
lexity but also helps to achieve higher
accuracy of the decision system. The proposed metho
d has been applied on few benchmark
dataset collected from the UCI repository and a dyn
amic reduct is computed. Experimental
result shows the efficiency of the proposed method
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
A Novel Algorithm for Design Tree Classification with PCAEditor Jacotech
This document summarizes a research paper titled "A Novel Algorithm for Design Tree Classification with PCA". It discusses dimensionality reduction techniques like principal component analysis (PCA) that can improve the efficiency of classification algorithms on high-dimensional data. PCA transforms data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate, called the first principal component. The paper proposes applying PCA and linear transformation on an original dataset before using a decision tree classification algorithm, in order to get better classification results.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
The document discusses employing a rough set based approach for clustering categorical time-evolving data. It proposes using node importance and a rough membership function to label unlabeled data points and detect concept drift between data clusters over time. Specifically, it defines key terms like nodes, introduces the problem of clustering categorical time-series data and concept drift detection. It then describes using a rough membership function to calculate similarity between unlabeled data and existing clusters in order to label the data and detect changes in cluster characteristics over time.
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
The problem of evaluating node importance in clustering has been active research in present days and many methods have been developed. Most of the clustering algorithms deal with general similarity measures. However In real situation most of the cases data changes over time. But clustering this type of data not only decreases the quality of clusters but also disregards the expectation of users, when usually require recent clustering results. In this regard we proposed Our-NIR method that is better than Ming-Syan Chen proposed a method and it has proven with the help of results of node importance, which is related to calculate the node importance that is very useful in clustering of categorical data, still it has deficiency that is importance of data labeling and outlier detection. In this paper we modified Our-NIR method for evaluating of node importance by introducing the probability distribution which will be better than by comparing the results.
A frame work for clustering time evolving dataiaemedu
The document proposes a framework for clustering time-evolving categorical data using a sliding window technique. It uses an existing clustering algorithm (Node Importance Representative) and a Drifting Concept Detection algorithm to detect changes in cluster distributions between the current and previous data windows. If a threshold difference in clusters is exceeded, reclustering is performed on the new window. Otherwise, the new clusters are added to the previous results. The framework aims to improve on prior work by handling drifting concepts in categorical time-series data.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Cluster analysis is used to group similar objects together and separate dissimilar objects. It has applications in understanding data patterns and reducing large datasets. The main types are partitional which divides data into non-overlapping subsets, and hierarchical which arranges clusters in a tree structure. Popular clustering algorithms include k-means, hierarchical clustering, and graph-based clustering. K-means partitions data into k clusters by minimizing distances between points and cluster centroids, but requires specifying k and is sensitive to initial centroid positions. Hierarchical clustering creates nested clusters without needing to specify the number of clusters, but has higher computational costs.
This document proposes a new approximate approach for computing the optimal optimistic decision within possibilistic networks. The approach avoids transforming the initial graph into a junction tree, which is computationally expensive. Instead, it performs the computation by calculating the degree of normalization in the moral graph resulting from merging the possibilistic network representing the agent's beliefs and the one representing its preferences. This allows the approach to have polynomial complexity compared to the exact approach based on junction trees, which is NP-hard.
The document describes an image pattern matching method using Principal Component Analysis (PCA). It involves preprocessing training images by converting them to grayscale, resizing them, and storing them in a matrix. PCA is then performed on the training images to extract eigenfaces. Test images are projected onto the eigenfaces to obtain a projection matrix. The test image with the minimum Euclidean distance from the training projections in the matrix is considered the best match. The method provides fast and robust image pattern matching through PCA dimensionality reduction and efficient preprocessing.
Improved probabilistic distance based locality preserving projections method ...IJECEIAES
In this paper, a dimensionality reduction is achieved in large datasets using the proposed distance based Non-integer Matrix Factorization (NMF) technique, which is intended to solve the data dimensionality problem. Here, NMF and distance measurement aim to resolve the non-orthogonality problem due to increased dataset dimensionality. It initially partitions the datasets, organizes them into a defined geometric structure and it avoids capturing the dataset structure through a distance based similarity measurement. The proposed method is designed to fit the dynamic datasets and it includes the intrinsic structure using data geometry. Therefore, the complexity of data is further avoided using an Improved Distance based Locality Preserving Projection. The proposed method is evaluated against existing methods in terms of accuracy, average accuracy, mutual information and average mutual information.
This document describes an expandable Bayesian network (EBN) approach for 3D object description from multiple images and sensor data. The key points are:
- EBNs can dynamically instantiate network structures at runtime based on the number of input images, allowing the use of a varying number of evidence features.
- EBNs introduce the use of hidden variables to handle correlation of evidence features across images, whereas previous approaches did not properly model this.
- The document presents an application of an EBN for building detection and description from aerial images using multiple views and sensor data. Experimental results showed the EBN approach provided significant performance improvements over other methods.
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
Cluster analysis involves grouping data objects into clusters so that objects within the same cluster are more similar to each other than objects in other clusters. There are several major clustering approaches including partitioning methods that iteratively construct partitions, hierarchical methods that create hierarchical decompositions, density-based methods based on connectivity and density, grid-based methods using a multi-level granularity structure, and model-based methods that find the best fit of a model to the clusters. Partitioning methods like k-means and k-medoids aim to optimize a partitioning criterion by iteratively updating cluster centroids or medoids.
JAVA BASED VISUALIZATION AND ANIMATION FOR TEACHING THE DIJKSTRA SHORTEST PAT...ijseajournal
This document describes a Java software tool developed to help transportation engineering students understand the Dijkstra shortest path algorithm. The software provides an intuitive interface for generating transportation networks and animating how the shortest path is updated at each iteration of the Dijkstra algorithm. It offers multiple visual representations like color mapping and tables. The software can step through each iteration or run continuously, and includes voice narratives in different languages to further aid comprehension. A demo video of the animation and results is available online.
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
the presentation contains the following :
-Data Objects and Attribute Types.
-Basic Statistical Descriptions of Data.
-Data Visualization.
-Measuring Data Similarity and Dissimilarity.
-Summary.
This document summarizes a research paper that proposes a new method to accelerate the nearest neighbor search step of the k-means clustering algorithm. The k-means algorithm is computationally expensive due to calculating distances between data points and cluster centers. The proposed method uses geometric relationships between data points and centers to reject centers that are unlikely to be the nearest neighbor, without decreasing clustering accuracy. Experimental results showed the method significantly reduced the number of distance computations required.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
The document discusses various clustering algorithms and concepts:
1) K-means clustering groups data by minimizing distances between points and cluster centers, but it is sensitive to initialization and may find local optima.
2) K-medians clustering is similar but uses point medians instead of means as cluster representatives.
3) K-center clustering aims to minimize maximum distances between points and clusters, and can be approximated with a farthest-first traversal algorithm.
Cluster analysis is a technique used to group objects into clusters based on similarities. There are several major approaches to cluster analysis including partitioning methods, hierarchy methods, density-based methods, and grid-based methods. Partitioning methods construct partitions of the data objects into a set number of clusters by optimizing a chosen criterion, such as k-means and k-medoids clustering algorithms.
Chapter 9 of the book discusses advanced classification methods including Bayesian belief networks, classification using backpropagation neural networks, support vector machines, frequent pattern-based classification, lazy learning, and other techniques. It describes how these methods work, including how to construct Bayesian networks, train neural networks using backpropagation, find optimal separating hyperplanes with support vector machines, and more. The chapter also covers topics like network topologies, training scenarios, efficiency and interpretability of different methods.
A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...aciijournal
A Horizontal scaling is a Cloud architectural strategy by which the number of nodes or computers
increased to meet the demand of continuously increasing workload. The cost of compute instances
increases with increased workload & the research is aimed to bring an optimization of the reserved Cloud instances using principles of Inventory theory applied to IoT datasets with variable stochastic nature. With a structured solution architecture laid down for the business problem to understand the checkpoints of compute instances – the range of approximate reserved compute instances have been optimized & pinpointed by analysing the probability distribution curves of the IoT datasets. The Inventory theory applied to the distribution curves of the data provides the optimized number of compute instances required taking the range prescribed from the solution architecture. The solution would help Cloud solution architects & Project sponsors in planning the compute power required in AWS® Cloud platform in any business situation where ingestion & processing data of stochastic nature is a business need.
This document presents an approach for clustering a mixed dataset containing both numeric and categorical attributes using an ART-2 neural network model. The dataset contains daily stock price data with 19 attributes describing comparisons between consecutive days. Clustering mixed datasets is challenging due to different attribute types. The ART-2 model is used to classify the dataset without requiring a distance function. Then an autoencoder model reduces the dimensionality to allow visual validation of the clusters. The results demonstrate the ART-2 model's ability to cluster complex, mixed datasets.
Local governance in Tanzania involves both political leadership and administration. The political leadership consists of councillors who are elected every five years by residents to represent wards and make decisions through the Full Council and Standing Committees. The administration is made up of civil servants and technical staff who implement the day-to-day activities, plans, and decisions of the council, as well as collect revenues and provide technical advice. While councillors are elected, chief executives who oversee the administration are appointed by the Minister and President.
This document discusses delegation in three parts. It begins by defining delegation as assigning responsibility for goals to staff at lower levels of authority. It describes the need for delegation to appropriately utilize levels of an organization and improve efficiency. The document outlines steps for effective delegation, including selecting the right person, specifying expectations, and providing feedback. It also identifies potential hindrances like ego and fear of losing control, and principles for delegation like specifying authority in writing and following lines of command.
This document discusses performance appraisal in three main sections:
[1] It defines performance appraisal as a process for assessing an employee's performance on their job duties and their potential for future development. Key aspects include both quantitative and qualitative evaluations.
[2] It describes the characteristics and purposes of performance appraisal, which include systematically assessing strengths and weaknesses, providing feedback, and aiding decision-making regarding promotion, training, and other personnel actions.
[3] It outlines the steps for an effective performance appraisal process, from establishing performance standards to communicating them, measuring and comparing actual performance, discussing evaluations, and initiating corrective actions.
This document provides an overview of controlling processes in supervision. It defines controlling, lists its objectives, and describes the controlling process and types of control. The controlling process involves establishing objectives and standards, measuring actual performance, comparing results to objectives, and taking corrective actions. Effective control is strategic, results-oriented, understandable, encourages self-control, is timely and exception-oriented, positive in nature, fair and objective, and flexible. The types of control discussed are preliminary, concurrent, post-action, internal, and external controls. Learners are expected to describe controlling concepts and demonstrate understanding of an effective controlling process.
This document provides an overview of public policy for students in public policy and economics programs. It defines key terms related to public policy, examines the nature and characteristics of public policy, and discusses the importance of public policy and some common policy areas. The lecturer defines a policy as a purposive course of action by a government or group to influence decisions. Public policy involves government actions and decisions in response to public problems. Characteristics of public policy include being goal-oriented, made by public authorities, and consisting of patterns of actions over time. The relationship between politics and policy is also examined.
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Cluster analysis is used to group similar objects together and separate dissimilar objects. It has applications in understanding data patterns and reducing large datasets. The main types are partitional which divides data into non-overlapping subsets, and hierarchical which arranges clusters in a tree structure. Popular clustering algorithms include k-means, hierarchical clustering, and graph-based clustering. K-means partitions data into k clusters by minimizing distances between points and cluster centroids, but requires specifying k and is sensitive to initial centroid positions. Hierarchical clustering creates nested clusters without needing to specify the number of clusters, but has higher computational costs.
This document proposes a new approximate approach for computing the optimal optimistic decision within possibilistic networks. The approach avoids transforming the initial graph into a junction tree, which is computationally expensive. Instead, it performs the computation by calculating the degree of normalization in the moral graph resulting from merging the possibilistic network representing the agent's beliefs and the one representing its preferences. This allows the approach to have polynomial complexity compared to the exact approach based on junction trees, which is NP-hard.
The document describes an image pattern matching method using Principal Component Analysis (PCA). It involves preprocessing training images by converting them to grayscale, resizing them, and storing them in a matrix. PCA is then performed on the training images to extract eigenfaces. Test images are projected onto the eigenfaces to obtain a projection matrix. The test image with the minimum Euclidean distance from the training projections in the matrix is considered the best match. The method provides fast and robust image pattern matching through PCA dimensionality reduction and efficient preprocessing.
Improved probabilistic distance based locality preserving projections method ...IJECEIAES
In this paper, a dimensionality reduction is achieved in large datasets using the proposed distance based Non-integer Matrix Factorization (NMF) technique, which is intended to solve the data dimensionality problem. Here, NMF and distance measurement aim to resolve the non-orthogonality problem due to increased dataset dimensionality. It initially partitions the datasets, organizes them into a defined geometric structure and it avoids capturing the dataset structure through a distance based similarity measurement. The proposed method is designed to fit the dynamic datasets and it includes the intrinsic structure using data geometry. Therefore, the complexity of data is further avoided using an Improved Distance based Locality Preserving Projection. The proposed method is evaluated against existing methods in terms of accuracy, average accuracy, mutual information and average mutual information.
This document describes an expandable Bayesian network (EBN) approach for 3D object description from multiple images and sensor data. The key points are:
- EBNs can dynamically instantiate network structures at runtime based on the number of input images, allowing the use of a varying number of evidence features.
- EBNs introduce the use of hidden variables to handle correlation of evidence features across images, whereas previous approaches did not properly model this.
- The document presents an application of an EBN for building detection and description from aerial images using multiple views and sensor data. Experimental results showed the EBN approach provided significant performance improvements over other methods.
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
Cluster analysis involves grouping data objects into clusters so that objects within the same cluster are more similar to each other than objects in other clusters. There are several major clustering approaches including partitioning methods that iteratively construct partitions, hierarchical methods that create hierarchical decompositions, density-based methods based on connectivity and density, grid-based methods using a multi-level granularity structure, and model-based methods that find the best fit of a model to the clusters. Partitioning methods like k-means and k-medoids aim to optimize a partitioning criterion by iteratively updating cluster centroids or medoids.
JAVA BASED VISUALIZATION AND ANIMATION FOR TEACHING THE DIJKSTRA SHORTEST PAT...ijseajournal
This document describes a Java software tool developed to help transportation engineering students understand the Dijkstra shortest path algorithm. The software provides an intuitive interface for generating transportation networks and animating how the shortest path is updated at each iteration of the Dijkstra algorithm. It offers multiple visual representations like color mapping and tables. The software can step through each iteration or run continuously, and includes voice narratives in different languages to further aid comprehension. A demo video of the animation and results is available online.
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
the presentation contains the following :
-Data Objects and Attribute Types.
-Basic Statistical Descriptions of Data.
-Data Visualization.
-Measuring Data Similarity and Dissimilarity.
-Summary.
This document summarizes a research paper that proposes a new method to accelerate the nearest neighbor search step of the k-means clustering algorithm. The k-means algorithm is computationally expensive due to calculating distances between data points and cluster centers. The proposed method uses geometric relationships between data points and centers to reject centers that are unlikely to be the nearest neighbor, without decreasing clustering accuracy. Experimental results showed the method significantly reduced the number of distance computations required.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
The document discusses various clustering algorithms and concepts:
1) K-means clustering groups data by minimizing distances between points and cluster centers, but it is sensitive to initialization and may find local optima.
2) K-medians clustering is similar but uses point medians instead of means as cluster representatives.
3) K-center clustering aims to minimize maximum distances between points and clusters, and can be approximated with a farthest-first traversal algorithm.
Cluster analysis is a technique used to group objects into clusters based on similarities. There are several major approaches to cluster analysis including partitioning methods, hierarchy methods, density-based methods, and grid-based methods. Partitioning methods construct partitions of the data objects into a set number of clusters by optimizing a chosen criterion, such as k-means and k-medoids clustering algorithms.
Chapter 9 of the book discusses advanced classification methods including Bayesian belief networks, classification using backpropagation neural networks, support vector machines, frequent pattern-based classification, lazy learning, and other techniques. It describes how these methods work, including how to construct Bayesian networks, train neural networks using backpropagation, find optimal separating hyperplanes with support vector machines, and more. The chapter also covers topics like network topologies, training scenarios, efficiency and interpretability of different methods.
A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...aciijournal
A Horizontal scaling is a Cloud architectural strategy by which the number of nodes or computers
increased to meet the demand of continuously increasing workload. The cost of compute instances
increases with increased workload & the research is aimed to bring an optimization of the reserved Cloud instances using principles of Inventory theory applied to IoT datasets with variable stochastic nature. With a structured solution architecture laid down for the business problem to understand the checkpoints of compute instances – the range of approximate reserved compute instances have been optimized & pinpointed by analysing the probability distribution curves of the IoT datasets. The Inventory theory applied to the distribution curves of the data provides the optimized number of compute instances required taking the range prescribed from the solution architecture. The solution would help Cloud solution architects & Project sponsors in planning the compute power required in AWS® Cloud platform in any business situation where ingestion & processing data of stochastic nature is a business need.
This document presents an approach for clustering a mixed dataset containing both numeric and categorical attributes using an ART-2 neural network model. The dataset contains daily stock price data with 19 attributes describing comparisons between consecutive days. Clustering mixed datasets is challenging due to different attribute types. The ART-2 model is used to classify the dataset without requiring a distance function. Then an autoencoder model reduces the dimensionality to allow visual validation of the clusters. The results demonstrate the ART-2 model's ability to cluster complex, mixed datasets.
Local governance in Tanzania involves both political leadership and administration. The political leadership consists of councillors who are elected every five years by residents to represent wards and make decisions through the Full Council and Standing Committees. The administration is made up of civil servants and technical staff who implement the day-to-day activities, plans, and decisions of the council, as well as collect revenues and provide technical advice. While councillors are elected, chief executives who oversee the administration are appointed by the Minister and President.
This document discusses delegation in three parts. It begins by defining delegation as assigning responsibility for goals to staff at lower levels of authority. It describes the need for delegation to appropriately utilize levels of an organization and improve efficiency. The document outlines steps for effective delegation, including selecting the right person, specifying expectations, and providing feedback. It also identifies potential hindrances like ego and fear of losing control, and principles for delegation like specifying authority in writing and following lines of command.
This document discusses performance appraisal in three main sections:
[1] It defines performance appraisal as a process for assessing an employee's performance on their job duties and their potential for future development. Key aspects include both quantitative and qualitative evaluations.
[2] It describes the characteristics and purposes of performance appraisal, which include systematically assessing strengths and weaknesses, providing feedback, and aiding decision-making regarding promotion, training, and other personnel actions.
[3] It outlines the steps for an effective performance appraisal process, from establishing performance standards to communicating them, measuring and comparing actual performance, discussing evaluations, and initiating corrective actions.
This document provides an overview of controlling processes in supervision. It defines controlling, lists its objectives, and describes the controlling process and types of control. The controlling process involves establishing objectives and standards, measuring actual performance, comparing results to objectives, and taking corrective actions. Effective control is strategic, results-oriented, understandable, encourages self-control, is timely and exception-oriented, positive in nature, fair and objective, and flexible. The types of control discussed are preliminary, concurrent, post-action, internal, and external controls. Learners are expected to describe controlling concepts and demonstrate understanding of an effective controlling process.
This document provides an overview of public policy for students in public policy and economics programs. It defines key terms related to public policy, examines the nature and characteristics of public policy, and discusses the importance of public policy and some common policy areas. The lecturer defines a policy as a purposive course of action by a government or group to influence decisions. Public policy involves government actions and decisions in response to public problems. Characteristics of public policy include being goal-oriented, made by public authorities, and consisting of patterns of actions over time. The relationship between politics and policy is also examined.
The document discusses the processes of recruitment and selection. It defines recruitment as attracting candidates to apply for open positions, while selection is the process of evaluating applicants and hiring the most suitable candidate. The key steps of selection outlined include receiving applications, evaluating qualifications, testing candidates, conducting interviews, extending job offers, and performing medical examinations. Criteria used for selection are qualifications, experience, skills, attitude and physical characteristics as matched to the job requirements.
The document discusses the relationship between central and local governments in governance. It notes that the central government lays down general policy for local governments through the ministry of local government to ensure services meet national interests. The central government also entrusts powers to local authorities but maintains checks and control over their functions. Specifically, the central government exercises political, administrative, legislative, fiscal and judicial forms of control over local governments. It facilitates local powers while also coordinating, monitoring and developing policies to guide local authorities' work.
This document provides an overview of controlling processes in supervision. It defines controlling, lists its objectives, and describes the controlling process and types of control. The controlling process involves establishing objectives and standards, measuring actual performance, comparing results to objectives, and taking corrective actions. Effective control is strategic, results-oriented, understandable, encourages self-control, is timely and exception-oriented, positive in nature, fair and objective, and flexible. The types of control discussed are preliminary, concurrent, post-action, internal, and external controls. Learners are expected to describe controlling concepts and demonstrate understanding of an effective controlling process.
The document discusses employee induction, which is the process of welcoming new employees and providing them with basic information to help them settle into their new job and company quickly. It outlines the objectives of induction for both the employee and employer, which include clarifying roles and responsibilities, familiarizing employees with policies and procedures, and reducing employee turnover. The document also describes the different levels of induction programs from compliance to connection and lists the key topics that should be covered in an induction, such as company history, benefits, and health and safety measures. It concludes by posing questions about designing an induction program and discussing induction practices in Tanzanian public sector institutions.
The document discusses different theories of motivation. It describes content theories like Maslow's hierarchy of needs and Herzberg's two-factor theory, which propose that certain internal needs drive motivation. Process theories like equity theory and expectancy theory examine how people perceive and respond to rewards. McClelland's acquired needs theory suggests people develop needs for achievement, power, and affiliation through life experiences. Effective motivation requires understanding individual needs and allocating rewards to satisfy both personal and organizational interests.
Local governments in Tanzania include village councils, ward councils, district councils, town councils, municipal councils, and city councils. They are classified as either rural authorities, which include villages and districts, or urban authorities like towns, municipalities, and cities. Local governments are responsible for administrative functions and development programs within their jurisdiction, and their revenues come from sources like rents, grants, development levies, licenses, and fees.
Here are the key points on how records can be categorized according to their use or value:
- Administrative value: Records containing information on procedures, operations, decisions needed to support current business functions.
- Fiscal value: Records providing evidence of financial transactions and accounting needed for auditing like invoices, receipts, payment records.
- Legal value: Records containing information needed to protect the legal and financial interests of an organization in case of litigation or investigation.
- Evidential value: Records providing proof of decisions made, actions taken etc. important for accountability and good governance.
- Historical/informational value: Records important for historical research that give an overview of the development of an organization or society over time.
This document discusses agenda setting in public policymaking. It defines agenda setting as the process of adopting social issues or problems as policy problems to be addressed by the government. The document outlines the five stages of policymaking according to Kingdon: agenda setting, formulation, adoption, implementation, and evaluation. It describes different levels of agendas, from the agenda universe to the decision agenda. Finally, it discusses actors involved in shaping policy agendas, including political officials, civil society, international organizations, and the public. It also summarizes Kingdon's model of three streams that influence when an issue gets on the political agenda.
This document discusses the importance and principles of office organization. It provides definitions of organization from various sources and outlines the key steps and factors to consider in planning an effective office organization structure. The document emphasizes that organizing individual efforts and dividing labor is necessary to efficiently achieve common goals. It also lists several important principles for office organization, including having clear lines of authority, optimal span of supervision, and flexibility to adapt to changes. Overall, the document promotes the importance of properly organizing a office through defining roles and responsibilities in order to maximize coordination and productivity.
This document discusses the nature and functions of supervision. It defines supervision as guiding the work of subordinates through planning, organizing, directing and controlling their activities. The key points are:
1. A supervisor's main functions are to plan work, organize resources, staff departments, and maintain discipline.
2. Supervision involves overseeing subordinates' work to ensure plans and policies are followed, and includes setting objectives, assigning work, training, and communicating policies.
3. Supervisors represent management and are responsible for accomplishing departmental goals through subordinates. They are vital links between management and workers.
This document discusses levels and typologies of public policy. It begins by outlining the learning objectives, which are to identify the levels of policies, classify policies, identify policy types, and describe examples of public policy types. It then defines four levels of policy: individual, family, organizational, and government (public policy). The document goes on to classify policies as either substantive or procedural. It identifies and provides examples of four major types of public policy: distributive, redistributive, regulatory, and constituent. Group assignments related to public policy analysis are also listed.
The document discusses the significance and process of research. It notes that research promotes logical thinking, provides the basis for government policies, and helps solve business problems. The document then outlines the main steps in the research process, including formulating the problem, conducting a literature review, developing hypotheses, collecting and analyzing data, testing hypotheses, and presenting findings. It provides details on various types of research and emphasizes the importance of scientific methodology.
Descriptive and correlational research aim to observe and describe characteristics or relationships between variables. Descriptive research provides an accurate portrayal of characteristics or behaviors, while correlational research examines relationships between two or more variables without manipulation. Both approaches are non-experimental and can be used to explore phenomena, identify problems or form hypotheses for future research. The document outlines the nature, aims, types, steps and examples of descriptive and correlational research methods.
This document provides an overview of public policy for students in public policy and economics programs. It defines key terms related to public policy, examines the nature and importance of public policy, and discusses some specific policy areas. The learning objectives are to define public policy terms, analyze the importance of policy, describe the nature and characteristics of policy, examine what policies do, and examine the rationale for public policy. It provides definitions of policy, discusses elements and goals of policy, and distinguishes between policy outputs and outcomes. It also covers the relationship between politics and public policy.
This document discusses different types of descriptive research studies including normative surveys, educational surveys, and psychological research studies. It provides examples of each type of descriptive study including the purpose, procedures, and key findings. A normative survey examines typical conditions and practices to establish norms. An educational survey looks at factors related to the teaching and learning process. A psychological research study compares behaviors and reactions in different situations. Descriptive research aims to describe current conditions and phenomena without manipulating variables.
This document summarizes a research paper titled "A Novel Algorithm for Design Tree Classification with PCA". It discusses dimensionality reduction techniques like principal component analysis (PCA) that can improve the efficiency of classification algorithms on high-dimensional data. PCA transforms data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate, called the first principal component. The paper proposes applying PCA and linear transformation on an original dataset before using a decision tree classification algorithm, in order to get better classification results.
Estimating project development effort using clustered regression approachcsandit
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a
challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the
complex and dynamic interaction of factors that impact software development. Heterogeneity
exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying
them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency
due to heterogeneity of the data. Using a clustered approach creates the subsets of data having
a degree of homogeneity that enhances prediction accuracy. It was also observed in this study
that ridge regression performs better than other regression techniques used in the analysis.
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.
Single Reduct Generation Based on Relative Indiscernibility of Rough Set Theo...ijsc
In real world everything is an object which represents particular classes. Every object can be fully described by its attributes. Any real world dataset contains large number of attributes and objects. Classifiers give poor performance when these huge datasets are given as input to it for proper classification. So from these huge dataset most useful attributes need to be extracted that contribute the maximum to the decision. In the paper, attribute set is reduced by generating reducts using the indiscernibility relation of Rough Set Theory (RST). The method measures similarity among the attributes using relative indiscernibility relation and computes attribute similarity set. Then the set is minimized and an attribute similarity table is constructed from which attribute similar to maximum number of attributes is selected so that the resultant minimum set of selected attributes (called reduct) cover all attributes of the attribute similarity table. The method has been applied on glass dataset collected from the UCI repository and the classification accuracy is calculated by various classifiers. The result shows the efficiency of the proposed method.
1) The document discusses mining data streams using an improved version of McDiarmid's bound. It aims to enhance the bounds obtained by McDiarmid's tree algorithm and improve processing efficiency.
2) Traditional data mining techniques cannot be directly applied to data streams due to their continuous, rapid arrival. The document proposes using Gaussian approximations to McDiarmid's bounds to reduce the size of training samples needed for split criteria selection.
3) It describes Hoeffding's inequality, which is commonly used but not sufficient for data streams. The document argues that McDiarmid's inequality, used appropriately, provides a more efficient technique for high-speed, time-changing data streams.
Dimensionality reduction by matrix factorization using concept lattice in dat...eSAT Journals
Abstract Concept lattices is the important technique that has become a standard in data analytics and knowledge presentation in many fields such as statistics, artificial intelligence, pattern recognition ,machine learning ,information theory ,social networks, information retrieval system and software engineering. Formal concepts are adopted as the primitive notion. A concept is jointly defined as a pair consisting of the intension and the extension. FCA can handle with huge amount of data it generates concepts and rules and data visualization. Matrix factorization methods have recently received greater exposure, mainly as an unsupervised learning method for latent variable decomposition. In this paper a novel method is proposed to decompose such concepts by using Boolean Matrix Factorization for dimensionality reduction. This paper focuses on finding all the concepts and the object intersections. Keywords: Data mining, formal concepts, lattice, matrix factorization dimensionality reduction.
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
This document discusses machine learning algorithms for image classification using five different classification schemes. It summarizes the mathematical models behind each classification algorithm, including Nearest Class Centroid classifier, Nearest Sub-Class Centroid classifier, k-Nearest Neighbor classifier, Perceptron trained using Backpropagation, and Perceptron trained using Mean Squared Error. It also describes two datasets used in the experiments - the MNIST dataset of handwritten digits and the ORL face recognition dataset. The performance of the five classification schemes are compared on these datasets.
The document summarizes a student project analyzing restaurant data using Python. It includes an introduction to the project goals, dataset, data mining techniques, and machine learning algorithms used. Specifically, the project aims to collect a restaurant dataset from Kaggle, apply knowledge in data mining and machine learning using Python, and perform classification, regression, and prediction tasks. Key algorithms discussed include linear regression, covariance, standard deviation, and prediction using support vector machines.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
Analysis of Classification Algorithm in Data Miningijdmtaiir
Data Mining is the extraction of hidden predictive
information from large database. Classification is the process
of finding a model that describes and distinguishes data classes
or concept. This paper performs the study of prediction of class
label using C4.5 and Naïve Bayesian algorithm.C4.5 generates
classifiers expressed as decision trees from a fixed set of
examples. The resulting tree is used to classify future samples
.The leaf nodes of the decision tree contain the class name
whereas a non-leaf node is a decision node. The decision node
is an attribute test with each branch (to another decision tree)
being a possible value of the attribute. C4.5 uses information
gain to help it decide which attribute goes into a decision node.
A Naïve Bayesian classifier is a simple probabilistic classifier
based on applying Baye’s theorem with strong (naive)
independence assumptions. Naive Bayesian classifier assumes
that the effect of an attribute value on a given class is
independent of the values of the other attribute. This
assumption is called class conditional independence. The
results indicate that Predicting of class label using Naïve
Bayesian classifier is very effective and simple compared to
C4.5 classifier
Clustering for Stream and Parallelism (DATA ANALYTICS)DheerajPachauri
The document summarizes information about a group project involving data stream clustering. It lists the group members and then discusses key concepts related to data stream clustering like requirements for algorithms, common algorithm types and steps, prototypes and windows. It also touches on outliers and applications of clustering.
An overlapping conscious relief-based feature subset selection methodIJECEIAES
Feature selection is considered as a fundamental prepossessing step in various data mining and machine learning based works. The quality of features is essential to achieve good classification performance and to have better data analysis experience. Among several feature selection methods, distance-based methods are gaining popularity because of their eligibility in capturing feature interdependency and relevancy with the endpoints. However, most of the distance-based methods only rank the features and ignore the class overlapping issues. Features with class overlapping data work as an obstacle during classification. Therefore, the objective of this research work is to propose a method named overlapping conscious MultiSURF (OMsurf) to handle data overlapping and select a subset of informative features discarding the noisy ones. Experimental results over 20 benchmark dataset demonstrates the superiority of OMsurf over six existing state-of-the-art methods.
Application for Logical Expression Processing csandit
Processing of logical expressions – especially a conversion from conjunctive normal form (CNF) to disjunctive normal form (DNF) – is very common problem in many aspects of information retrieval and processing. There are some existing solutions for the logical symbolic calculations, but none of them offers a functionality of CNF to DNF conversion. A new application for this purpose is presented in this paper.
Density Based Clustering Approach for Solving the Software Component Restruct...IRJET Journal
This document presents research on using the DBSCAN clustering algorithm to solve the problem of software component restructuring. It begins with an abstract that introduces DBSCAN and describes how it can group related software components. It then provides background on software component clustering and describes DBSCAN in more detail. The methodology section outlines the 4 phases of the proposed approach: data collection and processing, clustering with DBSCAN, visualization and analysis, and final restructuring. Experimental results show that DBSCAN produces more evenly distributed clusters compared to fuzzy clustering. The conclusion is that DBSCAN is a better technique for software restructuring as it can identify clusters of varying shapes and sizes without specifying the number of clusters in advance.
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...sherinmm
This document proposes a correntropy induced dictionary pair learning framework for physical activity recognition using wearable sensors. It begins with an introduction to physical activity recognition and related work. It then presents the proposed methodology, which consists of two stages: data processing and recognition. The recognition stage involves jointly learning a synthesis dictionary and analysis dictionary based on the maximum correntropy criterion. This is done using an alternating direction method of multipliers combined with an iteratively reweighted method to solve the non-convex objective function. The framework is validated on physical activity recognition and intensity estimation tasks using a publicly available dataset. Experimental results show the correntropy induced dictionary learning approach achieves high accuracy using simple features and is competitive with other methods requiring prior knowledge
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...sherinmm
Due to its symbolic role in ubiquitous health monitoring,
physical activity recognition with wearable body sensors has been in the
limelight in both research and industrial communities. Physical activity
recognition is difficult due to the inherent complexity involved with different
walking styles and human body movements. Thus we present a
correntropy induced dictionary pair learning framework to achieve this
recognition. Our algorithm for this framework jointly learns a synthesis
dictionary and an analysis dictionary in order to simultaneously perform
signal representation and classification once the time-domain features
have been extracted. In particular, the dictionary pair learning algorithm
is developed based on the maximum correntropy criterion, which
is much more insensitive to outliers. In order to develop a more tractable
and practical approach, we employ a combination of alternating direction
method of multipliers and an iteratively reweighted method to approximately
minimize the objective function. We validate the effectiveness of
our proposed model by employing it on an activity recognition problem
and an intensity estimation problem, both of which include a large number
of physical activities from the recently released PAMAP2 dataset.
Experimental results indicate that classifiers built using this correntropy
induced dictionary learning based framework achieve high accuracy by
using simple features, and that this approach gives results competitive
with classical systems built upon features with prior knowledge.
Abdul Ahad Abro presented on data science, predictive analytics, machine learning algorithms, regression, classification, Microsoft Azure Machine Learning Studio, and academic publications. The presentation introduced key concepts in data science including machine learning, predictive analytics, regression, classification, and algorithms. It demonstrated regression analysis using Microsoft Azure Machine Learning Studio and Microsoft Excel. The methodology section described using a dataset from Azure for classification and linear regression in both Azure and Excel to compare results.
This document provides an overview of different techniques for clustering categorical data. It discusses various clustering algorithms that have been used for categorical data, including K-modes, ROCK, COBWEB, and EM algorithms. It also reviews more recently developed algorithms for categorical data clustering, such as algorithms based on particle swarm optimization, rough set theory, and feature weighting schemes. The document concludes that clustering categorical data remains an important area of research, with opportunities to develop techniques that initialize cluster centers better.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
OPTIMIZATION IN ENGINE DESIGN VIA FORMAL CONCEPT ANALYSIS USING NEGATIVE ATTR...csandit
There is an exhaustive study around the area of engine design that covers different methods that try to reduce costs of production and to optimize the performance of these engines.
Mathematical methods based in statistics, self-organized maps and neural networks reach the best results in these designs but there exists the problem that configuration of these methods is
not an easy work due the high number of parameters that have to be measured.
A simple Introduction to Algorithmic FairnessPaolo Missier
Algorithmic bias and its effect on Machine Learning models.
Simple fairness metrics and how to achieve them by fixing either the data, the model, or both
AI in Java - MCP in Action, Langchain4J-CDI, SmallRye-LLM, Spring AIBuhake Sindi
This is the presentation I gave with regards to AI in Java, and the work that I have been working on. I've showcased Model Context Protocol (MCP) in Java, creating server-side MCP server in Java. I've also introduced Langchain4J-CDI, previously known as SmallRye-LLM, a CDI managed too to inject AI services in enterprise Java applications. Also, honourable mention: Spring AI.
Managing Geospatial Open Data Serverlessly [AWS Community Day CH 2025]Chris Bingham
At the AWS Community Day 2025 in Dietlikon I presented a journey through the technical successes, service issues, and open-source perils that have made up the paddelbuch.ch story. With the goal of a zero-ops, (nearly) zero-cost system, serverless was the apparent technology approach. However, this was not without its ups and downs!
AI stands for Artificial Intelligence.
It refers to the ability of a computer system or machine to perform tasks that usually require human intelligence, such as:
thinking,
learning from experience,
solving problems, and
making decisions.
Building Connected Agents: An Overview of Google's ADK and A2A ProtocolSuresh Peiris
Google's Agent Development Kit (ADK) provides a framework for building AI agents, including complex multi-agent systems. It offers tools for development, deployment, and orchestration.
Complementing this, the Agent2Agent (A2A) protocol is an open standard by Google that enables these AI agents, even if from different developers or frameworks, to communicate and collaborate effectively. A2A allows agents to discover each other's capabilities and work together on tasks.
In essence, ADK helps create the agents, and A2A provides the common language for these connected agents to interact and form more powerful, interoperable AI solutions.
Breaking it Down: Microservices Architecture for PHP Developerspmeth1
Transitioning from monolithic PHP applications to a microservices architecture can be a game-changer, unlocking greater scalability, flexibility, and resilience. This session will explore not only the technical steps but also the transformative impact on team dynamics. By decentralizing services, teams can work more autonomously, fostering faster development cycles and greater ownership. Drawing on over 20 years of PHP experience, I’ll cover essential elements of microservices—from decomposition and data management to deployment strategies. We’ll examine real-world examples, common pitfalls, and effective solutions to equip PHP developers with the tools and strategies needed to confidently transition to microservices.
Key Takeaways:
1. Understanding the core technical and team dynamics benefits of microservices architecture in PHP.
2. Techniques for decomposing a monolithic application into manageable services, leading to more focused team ownership and accountability.
3. Best practices for inter-service communication, data consistency, and monitoring to enable smoother team collaboration.
4. Insights on avoiding common microservices pitfalls, such as over-engineering and excessive interdependencies, to keep teams aligned and efficient.
MuleSoft RTF & Flex Gateway on AKS – Setup, Insights & Real-World TipsPatryk Bandurski
This presentation was delivered during the Warsaw MuleSoft Meetup in April 2025.
Paulina Uhman (PwC Polska) shared her hands-on experience running MuleSoft Runtime Fabric (RTF) and Flex Gateway on Azure Kubernetes Service (AKS).
The deck covers:
What happens after installation (pods, services, and artifacts demystified)
Shared responsibility model: MuleSoft vs Kubernetes
Real-world tips for configuring connectivity
Key Kubernetes commands for troubleshooting
Lessons learned from practical use cases
🎙️ Hosted by: Patryk Bandurski, MuleSoft Ambassador & Meetup Leader
💡 Presented by: Paulina Uhman, Integration Specialist @ PwC Polska
AI Unboxed - How to Approach AI for Maximum ReturnMerelda
Keynote for a client.
In this session, Merelda addressed common misconceptions about AI technologies, particularly the confusion between Predictive AI and Generative AI, and provided clarity on when to use each. Predictive AI analyzes historical data to forecast future outcomes, while Generative AI creates new content, from text to images, rapidly. Understanding the differences between these technologies is crucial for making informed, strategic decisions.
She introduced the three archetypes of AI adoption: Takers, Shapers, and Makers, inviting the audience to identify which role their organisation plays. Based on these archetypes, she presented industry-specific examples relevant to Schauenburg’s portfolio, showcasing how Predictive AI can drive operational efficiency (e.g., predicting equipment maintenance), while Generative AI enhances customer interactions (e.g., generating technical documents).
The session received a 10/10 rating from attendees for its practical insights and immediate applicability.
NVIDIA’s Enterprise AI Factory and Blueprints_ Paving the Way for Smart, Scal...derrickjswork
In a landmark step toward making autonomous AI agents practical and production-ready for enterprises, NVIDIA has launched the Enterprise AI Factory validated design and a set of AI Blueprints. This initiative is a critical leap in transitioning generative AI from experimental projects to business-critical infrastructure.
Designed for CIOs, developers, and AI strategists alike, these new offerings provide the architectural backbone and application templates necessary to build AI agents that are scalable, secure, and capable of complex reasoning — all while being deeply integrated with enterprise systems.
Automating Call Centers with AI Agents_ Achieving Sub-700ms Latency.docxIhor Hamal
Automating customer support with AI-driven agents fundamentally involves integrating Speech-to-Text (STT), Large Language Models (LLM), and Text-to-Speech (TTS). However, simply plugging these models together using their standard APIs typically results in high latency, often 2-3 seconds, which is inadequate for smooth, human-like interactions. After three years of deep-diving into automation in SapientPro, I've identified several crucial strategies that reduce latency to below 700 milliseconds, delivering near-human conversational speed.
Fully Open-Source Private Clouds: Freedom, Security, and ControlShapeBlue
In this presentation, Swen Brüseke introduced proIO's strategy for 100% open-source driven private clouds. proIO leverage the proven technologies of CloudStack and LINBIT, complemented by professional maintenance contracts, to provide you with a secure, flexible, and high-performance IT infrastructure. He highlighted the advantages of private clouds compared to public cloud offerings and explain why CloudStack is in many cases a superior solution to Proxmox.
--
The CloudStack European User Group 2025 took place on May 8th in Vienna, Austria. The event once again brought together open-source cloud professionals, contributors, developers, and users for a day of deep technical insights, knowledge sharing, and community connection.
Apache CloudStack 101 - Introduction, What’s New and What’s ComingShapeBlue
This session provided an introductory overview of CloudStack, covering its core features, architecture, and practical use cases. Attendees gained insights into how CloudStack simplifies cloud orchestration, supports multiple hypervisors, and integrates seamlessly with existing IT infrastructures.
--
The CloudStack European User Group 2025 took place on May 8th in Vienna, Austria. The event once again brought together open-source cloud professionals, contributors, developers, and users for a day of deep technical insights, knowledge sharing, and community connection.
2. 292 Computer Science & Information Technology (CS & IT)
repeatedly a knowledge acquisition algorithm. Rough Set Theory (RST) [8, 9, and 10], a new
mathematical approach to imperfect knowledge, helps to find the static as well as dynamic reduct.
Dynamic reducts can put up better performance in very large datasets as well as enhance
effectively the ability to accommodate noise data. The problem of attribute reduction for
incremental data falls under the class of Online Algorithms and hence demands a dynamic
solution to reduce re-computation. Liu [11] developed an algorithm for finding the smallest
attribute set of dynamic reducts with increase data. Wang and Wang [12] proposed a distributed
algorithm of attribute reduction based on discernibility matrix and function. Zheng et al. [13]
presented an incremental algorithm based on positive region for generation of dynamic reduct.
Deng [14] presented a method of attribute reduction by voting in a series of decision subsystems
for generation of dynamic reduct. Jan G. Bazan et al. [15] presented the concept of dynamic
reducts to solve the problem of large amount of data or incremental data.
In the proposed method, a novel heuristic approach is proposed to find out a dynamic reduct of
the incremental dataset using the concept of Rough Set Theory. To understand the concepts of
dynamic data, a sample dataset is divided into two sub sets considering one as old dataset and
other as new dataset. Using the concept of discernibility matrix and attribute dependency of
Rough Set Theory reduct is computed from old dataset. Then to handle the new data or
incremental data, previously computed reduct is modified wherever changes are necessary and
generates dynamic reduct for the entire system. The details of the algorithm are provided in
subsequent section.
The rest of the paper is organized as follows: Basic Concepts of Rough Set Theory is described in
section 2. Section 3 demonstrated the process of generation of dynamic reduct and Section 4
shows the experimental result of the proposed method. Finally conclusion of the paper is stated in
section 5.
2. BASIC CONCEPTS OF ROUGH SET THEORY
The rough set theory is based on indiscernibility relations and approximations. Indiscernibility
relation is usually assumed to be equivalence relation, interpreted so that two objects are
equivalent if they are not distinguishable by their properties. Given a decision system DS = (U, A,
C, D), where U is the universe of discourse and A is the total number of attributes, the system
consists of two types of attributes namely conditional attributes (C) and decision attributes (D) so
that A = C ∪ D. Let the universe U = {x1, x2... xn}, then with any P ⊆ A, there is an associated P-
indiscernibility relation IND(P) defined by equation (1).
If (x, y) ∈ IND (P), then x and y are indiscernible with respect to attribute set P. These
indistinguishable sets of objects, therefore define an indiscernibilty relation referred to as the P-
indiscernibility relation and the class of objects are denoted by [x]P.
The lower approximation of a target set X with respect to P is the set of all objects which
certainly belongs to X, as defined by equation (2).
The upper approximation of the target set X with respect to P is the set of all objects which can
possibly belong to X, as defined by equation (3)
3. Computer Science & Information Technology (CS & IT) 293
As rough set theory models dissimilarities of objects based on the notions of discernibility, a
discernibility matrix is constructed to represent the family of discernibility relations. Each cell in
a discernibility matrix consists of all the attributes on which the two objects have the different
values. Two objects are discernible with respect to a set of attributes if the set is a subset of the
corresponding cell of the discernibility matrix.
(a) Discernibility Matrix and Core
Given a decision system DS = (U, A, C, D), where U is the universe of discourse and A is the
total number of attributes. The system consists of two types of attributes namely conditional
attributes (C) and decision attributes (D) so that A = C ∪ D. Let the universe U = {x1, x2... xn},
then discernibility matrix M = (mij) is a |U| × |U| matrix, in which the element mij for an object
pair (xi, xj) is defined by (4).
where, i, j = 1, 2, 3... n
Thus, each entry (i, j) in the matrix S contains the attributes which distinguish the objects i and j.
So, if an entry contains a single attribute say, As, it implies that the attribute is self sufficient to
distinguish two objects and thus it is considered as the most important attribute, or core attribute.
But in reality, several entries may contain single attribute, union of which is known as core CR of
the dataset, as defined in (5).
(b) Attribute Dependency and Reduct
One of the most important aspects of database analysis or data acquisition is the discovery of
attribute dependencies; that establishes a relationship by finding which variables are strongly
related to which other variables. In rough set theory, the notion of dependency is defined very
simply. Assume two (disjoint) sets of attributes, P and Q, and inquire what degree of dependency
is present between them. Each attribute set induces an (indiscernibility) equivalence class
structure. Say, the equivalence classes induced by P is [x]P, and the equivalence classes induced
by Q is [x]Q. Then, the dependency of attribute set Q on attribute set P is denoted by γP(Q) and is
given by equation (6).
Where, Qi is a class of objects in [x]Q ; ∀ i = 1, 2, …, N.
A reduct can be thought of as a sufficient set of attributes to represent the category structure and
the decision system. Projected on just these attributes, the decision system possesses the same
equivalence class structure as that expressed by the full attribute set. Taking the partition induced
4. 294 Computer Science & Information Technology (CS & IT)
by decision attribute D as the target class and R as the minimal attribute set, R is called the reduct
if it satisfies (7). In other words, R is a reduct if the dependency of decision attribute D on R is
exactly equal to that of D on whole conditional attribute set C.
The reduct of an information system is not unique. There may be many subsets of attributes
which preserve the equivalence-class structure (i.e., the knowledge) expressed in the decision
system.
(c) Attribute Significance: Significance of an attribute a in a decision table A= (U, CUD) (with
the decision set D) can be evaluated by measuring the effect of removing of an attribute a C from
the attribute set C on the positive region. The number γ(C, D) expresses the degree of dependency
between attributes C and D. If attribute ‘a’ is removed from the attribute set C then the value of
(γ(C, D)) will be changed.
So the significance of an attribute a is defined as
(d) Dynamic Reduct: The purpose of dynamic reducts is to get the stable reducts from decision
subsystems. Dynamic reduct can be defined in the following direction.
Definition 1: If DS = (U, A, d) is a decision system, then any system DT = (Uʹ, A, d) such that Uʹ
⊆ U is called a subsystem of DS. By P (DS) we denote the set of all subsystems of DS. Let DS =
(U, A, d) be a decision system and F ⊆ P (DS). By DR (DS, F) we denote the set RED (DS)
∩ RED (DT).Any elements of DR (DS, F) are called an F-dynamic reduct of DS.
So from the definition of dynamic reducts it follows that a relative reduct of DS is dynamic if it is
also a reduct of all sub tables from a given family of F.
Definition 2: Let DS = (U, A, d) be a decision system and F ⊆ P (DS). By GDR (DS, F) we
denote the set
Any elements of GDR (DS, F) are called an F generalized dynamic reduct of DS. From the above
definitions of generalized dynamic reduct it follows that any subset of A is a generalized dynamic
reduct if it is also a reduct of all sub tables from a given family F.
Time complexity of computation of all reducts is NP-Complete. Also, the intersection of all
reducts of subsystems may be empty. This idea can be sometimes too much restrictive, so more
general notion of dynamic reducts are described. They are called (F, ɛ) dynamic reducts, where ɛ
> 0. The set DR (DS, F) of all (F, ɛ) dynamic reducts is defined by
5. Computer Science & Information Technology (CS & IT) 295
3. DYNAMIC REDUCT GENERATION USING ROUGH SET THEORY
Various concepts of rough set theory like discernibility matrix, attribute significance and attribute
dependency are applied together to compute dynamic reducts of a decision system. The term
dynamic reduct is used in the sense that the method computes a set of reducts for the incremental
data very quickly without unnecessarily increasing the complexity since they are sufficient to
represent the system and subsystems of it. Based on the discernibility matrix M and the frequency
value of the attributes, the attributes are divided [16 ] into the core set CR and noncore set NC for
old subsystem DSold. Next, highest ranked element of NC is added to the core CR in each iteration
provided the dependency of the decision attribute D on the resultant set increases for the old
subsystem ; otherwise it is ignored and next iteration with the remaining elements in NC is
performed. The process terminates when the resultant set satisfies the condition of equation (7)
for the old subsystem and is considered as an initial reduct RED_OLD. Then backward attribute
removal process is applied for each noncore attribute x in the generated reduct RED_OLD, it is
checked whether (7) is satisfied using RED_OLD – {x}, instead of R. Now if it is satisfied, then x
is redundant and must be removed. Thus, all redundant attributes are removed and final reduct
RED_OLD is obtained.
To generate the dynamic reduct, discernibility matrix is constructed for the new subsystem DSnew
and frequency values of all conditional attributes are calculated. Now the previously computed
reduct (RED_OLD) from the old dataset is applied to new dataset for checking whether it can
preserve the positive region in the new data set i.e., whether the dependency value of the decision
attribute on that reduct set is equal to that of the decision attribute on the whole conditional
attribute set. If the condition is satisfied, then that reduct set is considered as dynamic reduct
(DRED). Otherwise; according to the frequency values obtained using [16] of the conditional
attributes, higher ranked attribute is added to the most important attribute set in each iteration
provided attribute dependency of the resultant set increases and subsequently a reduct is formed
after certain iteration when dependency of the decision attribute on the resultant set is equal to
that of the decision attribute on the whole condition attribute set for the new subsystem. Then
backward attribute removal process is applied for generation of final dynamic reduct of the
system. In this process, significance value of each individual attribute is calculated using equation
(8) except that most important attribute set in a reduct. If the significance value of a particular
attribute is zero, then that attribute is deleted from the reduct. In this way, all redundant attributes
are removed and finally dynamic reduct is generated by modifying the old reducts for the entire
data.
The proposed method describes the attribute selection method for the computation of reducts
from old data and dynamic reduct set DRED for entire data considering incremental data.
Algorithm1 generates initial reduct for the old decision system DSold = (U, A, C, D) and
Algorithm2 generates dynamic reduct for the entire data, by considering the old data as
well as incremental data.
6. 296 Computer Science & Information Technology (CS & IT)
Algorithm1: Initial_Reduct_Formation (DSold, CR, NC)
Input: DSold, the decision system with C conditional attributes and D decisions with objects x, CR,
the core and NC, the non-core attributes
Output: RED_OLD, initial reduct
Begin
RED_OLD = CR /* core is considered as initial reduct*/
NC_OLD = NC /* take a copy of initial elements of NC*/
/*Repeat-until below forward selection to give one reduct*/
Repeat
x = highest ranked element of NC_OLD
If (x = φ) break /*if no element found in NC*/
If (γRED_OLD ∪ {x} (D) > γRED_OLD (D))
{
RED_OLD = RED_OLD ∪ {x}
NC_OLD = NC_OLD - {x}
}
Until (γRED_OLD(D) = γC(D))
// apply backward removal
For each x in (RED_OLD – CR)
If (γRED_OLD - {x}(D) = = γC(D))
RED_OLD = RED_OLD - {x}
Return (RED_OLD);
End
Algorithm2: Dynamic_Reduct_Formation (DS, C, D)
//An algorithm for computation of dynamic reducts for incremental data
Input: DS = {DSnew}, the new decision system with C conditional attributes and D decisions
attribute and reduct RED_OLD obtained from ‘Reduct Formation’ algorithm for the old
dataset (DSold).
Output: Dynamic reduct (DRED), reduct of DSold∪ DSnew
Begin
If ((γ(RED_OLD) (D) = γ(C) (D))
{
DRED = RED_OLD
Return DRED
}
Else {
NC = C - RED_OLD
CR = DRED /*initial reduct is considered as core reduct of new system */
Repeat
DRED = RED_OLD /* Old reduct is considered as core */
x = highest frequency attribute of NC
7. Computer Science & Information Technology (CS & IT) 297
If (γDRED∪{x}(D) > γDRED(D))
{
DRED = DRED ∪ {x}
NC = NC - {x}
}
Until (γDRED(D) = γC(D))
// apply backward removal
For each highest ranked attribute x in (DRED – CR) using (8)
If (γDRED - {x}(D) = = γC(D))
DRED = DRED - {x}
Return (DRED);
} /* end of else*/
End
4. EXPERIMENTAL RESULTS
The method is applied on some benchmark datasets obtained from UCI repository ‘http:
//www.ics.uci.edu/mlearn/MLRepository’. The wine dataset contains 178 instances and 13
conditional attributes. The attributes are abbreviated by letters A, B, and so on, starting from their
column position in the dataset. In our method, for computation of dynamic reduct the wine dataset
is divided into 2 sub tables considering randomly 80% of data as old data and other 20% of data
is new data. Reduct is calculated for the old data using Algorithm1.Then based on previous
reducts, the proposed algorithm worked on new data and generates two dynamic reducts
{{ABCGJLM}, {ABIJKLM}} for the whole dataset. Similarly dynamic reducts are calculated for
the heart and Zoo dataset. Reducts are also calculated for the modified data set using static data
approach. All results are given in Table 1. Accuracies of the reduct of our proposed algorithm
(PRP) are calculated and compared with existing attribute reduction techniques like ‘Correlation-
based Feature Selection’ (CFS) and ‘Consistency-based Subset Evaluation’ (CSE), from the
‘weka’ tool [17] as shown in Table 2. The proposed method, on average, contains lesser number
of attributes compared to CFS and CSE and at the same time achieves higher accuracy, which
shows the effectiveness of the method.
Table 1. Dynamic reducts of datasets
8. 298 Computer Science & Information Technology (CS & IT)
Table 2. Classification accuracy of reducts obtained by proposed and existing method
5. CONCLUSION
The paper describes a new method of attribute reduction for incremental data by using the
concepts of Rough Set theory. Even if the data is not completely available at a time, i.e it keeps
arriving or increasing, the algorithm can find the reduct of such data without recomputing the data
that has already arrived. The proposed dimension reduction method used only the concepts of
rough set theory which does not require any additional information except the decision system
itself. Since, reduct generation is a NP-complete problem, so different researchers’ use different
heuristics to compute reducts used for developing classifiers. Dynamic reducts are very important
for construction of a strong classifier. A future enhancement to this work is to formation of
classifiers from dynamic reduct sets and finally ensemble them to generate an efficient classifier.
REFERENCES
[1] Han, and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco,
2001.
[2] Handbook of Research on Innovations in Database Technologies and Applications: Current and
Future Trends Viviana E. Ferraggine , Jorge H. Doorn , Laura C. Rivero, ISBN-10: 1605662429
ISBN-13: 978-1605662428
[3] Devijver, P.A., and Kittler, J. (1982) Pattern Recognition: A Statistical Approach Englewood Cliffs,
NJ: Prentice Hall.
[4] Della Pietra,S., Della Pietra, V., and Lafferty, J. (1997) Inducing features of random fields. IEEE
transactions on pattern Analysis and Machine Intelligence, 19(4),pp. 380-393.
[5] R. Jensen, QiangShen, “Fuzzy-Rough Attribute Reduction with Application to Web Categorization,
Fuzzy Sets and Systems, Vol.141, No.3, pp.469-485, 2004
9. Computer Science & Information Technology (CS & IT) 299
[6] N.Zhong and A. Skowron, “A Rough Set-Based Knowledge Discovery Process”, Int. Journal of
Applied Mathematics and Computer Science. 11(3), 603-619, 2001. BIME Journal, Volume (05),
Issue (1), 2005
[7] Ethem Alpaydin Introduction to Machine Learning.PHI, 2010
[8] Pawlak, Z.: “Rough sets.: International journal of information and computer sciences,” Vol, 11, pp.
341-356 (1982)
[9] Pawlak, Z.: “Rough set theory and its applications to data analysis,” Cybernetics and systems 29
(1998) 661-688, (1998)
[10] K. Thangavel, A. Pethalakshmi. Dimensionality reduction based on rough set theory : A review,
Journal of Applied Soft Computing, Volume 9, Issue 1, pages 1 -12, 2009.
[11] Z.T,Liu.: “An incremental arithmetic for the smallest reduction of attributes” Acta Electro nicasinicia,
vol.27, no.11, pp.96—98,1999
[12] J.Wang and J.Wang,”Reduction algorithms based on discernibility matrx:The order attributes
method.Journal of computer Science and Technology,vol.16.No.6,2001,pp.489-504
[13] G.Y.Wang,Z.Zheng and Y.Zhang”RIDAS-A rough set based intelligent data analysis system”
Proceedigs of the 1st International conference on machine Learning and
Cybernatics,Beiing,Vol2,Feb,2002,pp.646-649.
[14] D.Deng,D.Yan and J.Wang,”parallel Reducts based on Attribute significance”, LNAI6401, 2010,
pp.336-343.
[15] G..Bazan ,”Dynamic reducts and statistical Inference” Proceedigs of the 6th International conference
on Information Processing and Management of uncertainity in knowledge based system,July 125,
Granada,Spain,(2),1996pp.1147-1152
[16] Asit Kumar Das, Saikat Chakrabarty, Shampa Sengupta “Formation of a Compact Reduct Set Based
on Discernibility Relation and Attribute Dependency of Rough Set Theory” Proceedings of the
Sixth International Conference on Information Processing – 2012 August 10 - 12, 2012, Bangalore,
Wireless Network and Computational Intelligence Springer pp 253-261.
[17] WEKA: Machine Learning Software, http://www.cs.waikato.ac.nz/~ml/