This document discusses and compares the performance of four rule-based classification algorithms (Decision Table, One R, PART, and Zero R) on different datasets using the WEKA data mining tool. It first provides background on classification and rule-based classification in data mining. It then describes the four algorithms and the experimental process used to implement them in WEKA, evaluate their performance based on accuracy, number of correct/incorrect predictions, and execution time, and analyze the results.
For the agriculture sector, detecting and identifying plant diseases at an early stage is extremely important and
still very challenging. Machine learning is an application of AI that helps us achieve this purpose effectively. It
uses a group of algorithms to analyze and interpret data, learn from it, and using it, smart decisions can be
made. For accomplishing this project, a dataset that contains a set of healthy & diseased plant leaf images are
used then using image processing we extract the features of the image. Then we model this dataset with
different machine learning algorithms like Random Forest, Support Vector Machine, Naïve Bayes etc. The aim is
to hold out a comparative study to spot which of those algorithm can predict diseases with the at most
accuracy. We compare factors like precision, accuracy, error rates as well as prediction time of different
machine learning algorithms. After all these comparison, valuable conclusions can be made for this project.
IRJET-Comparison between Supervised Learning and Unsupervised LearningIRJET Journal
This document compares supervised and unsupervised learning models in artificial neural networks. It describes how supervised learning uses labeled training data to learn relationships between inputs and outputs, while unsupervised learning identifies hidden patterns in unlabeled data. The document outlines techniques for each like classification and regression for supervised learning, and clustering and density estimation for unsupervised learning. It then presents experiments applying multilayer perceptron (supervised) and k-means clustering (unsupervised) to datasets, finding unsupervised learning had higher accuracy. The document concludes unsupervised learning is favored for this task but supervised learning remains useful for problems with labeled data.
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDYIAEME Publication
In the present digital era massive amount of data is being continuously generated
at exceptional and increasing scales. This data has become an important and
indispensable part of every economy, industry, organization, business and individual.
Further handling of these large datasets due to the heterogeneity in their formats is
one of the major challenge. There is a need for efficient data processing techniques to
handle the heterogeneous data and also to meet the computational requirements to
process this huge volume of data. The objective of this paper is to review, describe
and reflect on heterogeneous data with its complexity in processing, and also the use
of machine learning algorithms which plays a major role in data analytics
Comparative Analysis: Effective Information Retrieval Using Different Learnin...RSIS International
Information Retrieval is the activity of searching meaningful information from a collection of information resources such as Documents, relational databases and the World Wide Web. Information retrieval system mainly consists of two phases, storing indexed documents and retrieval of relevant result. Retrieving information effectively from huge data storage, it requires Machine Learning for computer systems. Machine learning has objective to instruct computers to use data or past experience to solve a given problem. Machine learning has number of applications, including classifier to be trained on email messages to learn in order to distinguish between spam and non-spam messages, systems that analyze past sales data to predict customer buying behavior, fraud detection etc. Machine learning can be applied as association analysis through supervised learning, unsupervised learning and Reinforcement Learning. The goal of these three learning is to provide an effective way of information retrieval from data warehouse to avoid problems such as ambiguity. This study will compare the effectiveness and impuissance of these learning approaches.
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...IRJET Journal
This document describes a comparative analysis of GUI-based machine learning approaches for predicting Parkinson's disease. It analyzes various machine learning algorithms including logistic regression, decision trees, support vector machines, random forests, k-nearest neighbors, and naive Bayes. The document discusses data preprocessing techniques like variable identification, data validation, cleaning and preparing. It also covers data visualization and evaluating model performance using accuracy calculations. The goal is to compare the performance of these machine learning algorithms and identify the approach that predicts Parkinson's disease with the highest accuracy based on a given hospital dataset.
This document presents a comparison of four reference papers on automated question paper generation systems. It summarizes each paper's domain, algorithms, techniques, implementation platforms, advantages, and disadvantages. The key papers implemented question paper generation on Android, used algorithms like randomization and verification, and implemented systems using techniques like fuzzy logic, cloze question generation, and taxonomy levels. Platforms included Android, Java, and cloud computing. Advantages included reducing human effort and limitations like needing to define difficulty levels. The document concludes different approaches have been used to automatically generate question papers while addressing issues like workload and maintaining quality.
Due to diagnosis problem in detecting lung Cancer, it becomes the most dangerous cancer seen in human being. Because of early diagnosis, the survival rate among people is increased. The prediction of lung cancer is the most challenging cancer problem, due to its structure of cells in human body. In which most of tissues or cells are overlapping on one another. Now-a-days, the use of images processing techniques is increased in growing medical field for its disease diagnosis, where the time factor plays important role. Detecting cancer within a time, increases the survival rate of patients. Many radiologists still use MRI only for assessment of superior sulcus tumors and in cases where invasion of spinal cord canal is suspected. MRI can detect and stage lung cancer and this method would be excellent of lung malignancies and other diseases.
IRJET-A Hybrid Intrusion Detection Technique based on IRF & AODE for KDD-CUP ...IRJET Journal
This document presents a hybrid intrusion detection technique based on Improved Random Forest (IRF) and Average One-Dependence Estimator (AODE) to address challenges with existing methods for the KDD Cup 99 dataset. The proposed hybrid intrusion detection system aims to improve accuracy, precision, recall and other performance metrics. It uses an ensemble of IRF with bagging and AODE classifiers along with a new sampling method. The performance of the proposed technique is evaluated and compared to existing random forest and SVM methods using metrics like accuracy, detection rate, and false alarm rate. Experimental results demonstrate the proposed hybrid method performs better than existing approaches.
IRJET- Machine Learning Classification Algorithms for Predictive Analysis in ...IRJET Journal
This document discusses machine learning classification algorithms and their applications for predictive analysis in healthcare. It provides an overview of data mining techniques like association, classification, clustering, prediction, and sequential patterns. Specific classification algorithms discussed include Naive Bayes, Support Vector Machine, Decision Trees, K-Nearest Neighbors, Neural Networks, and Bayesian Methods. The document examines examples of these algorithms being used for disease diagnosis, prognosis, and healthcare management. It analyzes their predictive performance on datasets for conditions like breast cancer, heart disease, and ICU readmissions. Overall, the document reviews how machine learning techniques can enhance predictive accuracy for various healthcare problems.
IRJET - House Price Prediction using Machine Learning and RPAIRJET Journal
This document discusses using machine learning and robotic process automation (RPA) to predict house prices. Specifically, it proposes using the CatBoost algorithm and RPA to extract real-time data for house price prediction. RPA involves using software robots to automate data extraction, while CatBoost will be used to predict prices based on the extracted dataset. The system aims to reduce problems faced by customers by providing more accurate price predictions compared to relying solely on real estate agents. It will extract data using RPA, clean the data, then apply machine learning algorithms like CatBoost to predict house prices based on various attributes.
This document discusses online feature selection (OFS) for data mining applications. It addresses two tasks of OFS: 1) learning with full input, where the learner can access all features to select a subset, and 2) learning with partial input, where only a limited number of features can be accessed for each instance. Novel algorithms are presented for each task, and their performance is analyzed theoretically. Experiments on real-world datasets demonstrate the efficacy of the proposed OFS techniques for applications in computer vision, bioinformatics, and other domains involving high-dimensional sequential data.
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...IRJET Journal
This document presents a novel approach for software defect prediction using dimensionality reduction techniques. The proposed approach uses an artificial neural network to extract features from initial change measures, and then trains a classifier on the extracted features. This is compared to other dimensionality reduction techniques like principal component analysis, linear discriminant analysis, and kernel principal component analysis. Five open source datasets from NASA are used to evaluate the different techniques based on accuracy, F1 score, and area under the receiver operating characteristic curve. The results show that the artificial neural network approach outperforms the other dimensionality reduction techniques, and kernel principal component analysis performs best among those techniques. The document also discusses related work on using machine learning for software defect prediction.
IRJET-Clustering Techniques for Mushroom DatasetIRJET Journal
The document evaluates the performance of different clustering algorithms (Expectation Maximization, Farthest Fast, and K-means) on a mushroom dataset from the UCI machine learning repository. The algorithms are compared based on the number of correctly clustered instances and time taken to build the model. The mushroom dataset contains 8124 instances with 22 attributes classified as edible or poisonous mushrooms. The goal is to group similar mushrooms together using the different clustering techniques in the data mining tool WEKA.
EDGE DETECTION IN DIGITAL IMAGE USING MORPHOLOGY OPERATIONIJEEE
This paper shows a method in digital image processing technique to find the defects in tablets. In this paper we use mathematical manipulation, to detect the defected tablet packet.
IRJET- Breast Cancer Prediction using Deep LearningIRJET Journal
This document discusses using deep learning to predict breast cancer based on tumor data. It proposes using a neural network model to classify tumors as malignant or benign. The key steps are:
1. Collecting and preprocessing tumor cell data to remove noise and inconsistencies.
2. Developing a neural network model and training it on labeled training data to learn patterns.
3. Testing the trained model on unlabeled testing data to evaluate its accuracy in classifying tumors.
The goal is to develop an accurate model to help doctors determine the critical condition of patients and classify difficult tumors.
Automatic Query Expansion Using Word Embedding Based on Fuzzy Graph Connectiv...YogeshIJTSRD
The aim of information retrieval systems is to retrieve relevant information according to the query provided. The queries are often vague and uncertain. Thus, to improve the system, we propose an Automatic Query Expansion technique, to expand the query by adding new terms to the user s initial query so as to minimize query mismatch and thereby improving retrieval performance. Most of the existing techniques for expanding queries do not take into account the degree of semantic relationship among words. In this paper, the query is expanded by exploring terms which are semantically similar to the initial query terms as well as considering the degree of relationship, that is, “fuzzy membership- between them. The terms which seemed most relevant are used in expanded query and improve the information retrieval process. The experiments conducted on the queries set show that the proposed Automatic query expansion approach gave a higher precision, recall, and F measure then non fuzzy edge weights. Tarun Goyal | Ms. Shalini Bhadola | Ms. Kirti Bhatia "Automatic Query Expansion Using Word Embedding Based on Fuzzy Graph Connectivity Measures" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-5 , August 2021, URL: https://www.ijtsrd.com/papers/ijtsrd45074.pdf Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/45074/automatic-query-expansion-using-word-embedding-based-on-fuzzy-graph-connectivity-measures/tarun-goyal
IRJET - Breast Cancer Risk and Diagnostics using Artificial Neural Network(ANN)IRJET Journal
This document describes research using an artificial neural network (ANN) to classify breast cancer as benign or malignant based on the Wisconsin Breast Cancer dataset. The ANN model was trained and tested on 683 instances from the dataset. The model achieved 97.8% accuracy on the training set and 97.5% accuracy on the test set. Various performance metrics including mean absolute error, root mean square error, and kappa statistics were used to evaluate the model, demonstrating low error rates. The ANN model outperformed other classification algorithms in related work and efficiently classified breast cancer with high accuracy and precision.
Efficient decentralized iterative learning tracker for unknown sampled data i...ISA Interchange
In this paper, an efficient decentralized iterative learning tracker is proposed to improve the dynamic performance of the unknown controllable and observable sampled-data interconnected large-scale state-delay system, which consists of NN multi-input multi-output (MIMO) subsystems, with the closed-loop decoupling property. The off-line observer/Kalman filter identification (OKID) method is used to obtain the decentralized linear models for subsystems in the interconnected large-scale system. In order to get over the effect of modeling error on the identified linear model of each subsystem, an improved observer with the high-gain property based on the digital redesign approach is developed to replace the observer identified by OKID. Then, the iterative learning control (ILC) scheme is integrated with the high-gain tracker design for the decentralized models. To significantly reduce the iterative learning epochs, a digital-redesign linear quadratic digital tracker with the high-gain property is proposed as the initial control input of ILC. The high-gain property controllers can suppress uncertain errors such as modeling errors, nonlinear perturbations, and external disturbances (Guo et al., 2000) [18]. Thus, the system output can quickly and accurately track the desired reference in one short time interval after all drastically-changing points of the specified reference input with the closed-loop decoupling property.
IRJET - A Survey on Machine Learning Intelligence Techniques for Medical ...IRJET Journal
This document discusses machine learning techniques for classifying medical datasets. It provides an overview of various artificial intelligence and machine learning algorithms that have been applied for medical dataset classification, including artificial neural networks, support vector machines, k-nearest neighbors, and decision trees. The document surveys works that have used these techniques for diseases like breast cancer, heart disease, and diabetes. It also describes common pre-processing steps for medical datasets like data normalization and feature selection methods like F-score and PCA that are used to select the most important features for classification. The classification algorithms are then evaluated based on accuracy metrics like sensitivity, specificity, and accuracy.
IRJET - Grape Leaf Diseases Classification using Transfer LearningIRJET Journal
This document summarizes a research paper that used transfer learning with the Inception v3 model to classify grape leaf diseases with high accuracy. Specifically:
1. The researchers used the PlantVillage dataset containing over 55,000 images of healthy and diseased grape leaves to train and test their model.
2. They used Inception v3 to extract features from the grape leaf images due to its state-of-the-art performance in image classification tasks.
3. After extracting features with Inception v3, they classified the images using various classifiers like logistic regression, SVM, and neural networks. Logistic regression achieved the highest test accuracy of 99.4%.
Neighborhood search methods with moth optimization algorithm as a wrapper met...IJECEIAES
Feature selection methods are used to select a subset of features from data, therefore only the useful information can be mined from the samples to get better accuracy and improves the computational efficiency of the learning model. Moth-flame Optimization (MFO) algorithm is a population-based approach, that simulates the behavior of real moth in nature, one drawback of the MFO algorithm is that the solutions move toward the best solution, and it easily can be stuck in local optima as we investigated in this paper, therefore, we proposed a MFO Algorithm combined with a neighborhood search method for feature selection problems, in order to avoid the MFO algorithm getting trapped in a local optima, and helps in avoiding the premature convergence, the neighborhood search method is applied after a predefined number of unimproved iterations (the number of tries fail to improve the current solution). As a result, the proposed algorithm shows good performance when compared with the original MFO algorithm and with state-of-the-art approaches.
Comparison of Data Mining Techniques used in Anomaly Based IDS IRJET Journal
This document discusses anomaly-based intrusion detection systems and compares various data mining techniques used in these systems. It begins by defining intrusion detection systems and the two main categories of misuse detection and anomaly detection. Anomaly detection involves learning normal patterns from data and detecting deviations from these patterns as potential anomalies or intrusions.
The document then examines several data mining techniques used for anomaly detection, including statistical-based approaches like chi-square statistics, and clustering algorithms like k-means, k-medoids, and EM clustering. It notes that these techniques can be applied to intrusion detection to analyze data and detect anomalies representing potential malicious activity. The methodology of anomaly detection is also summarized as involving parameterization of data,
IRJET - Disease Detection in Plant using Machine LearningIRJET Journal
This document discusses using machine learning and image processing techniques to detect diseases in plants. The proposed system utilizes convolutional neural networks (CNNs) to classify plant images as either healthy or diseased based on features extracted from the images. The system architecture includes preprocessing the images, extracting color and texture features, running the features through a CNN model for classification training and testing, and outputting whether plants are normal or abnormal. The goal is to help farmers automatically detect plant diseases early on by analyzing images of plant leaves.
IRJET - House Price Predictor using ML through Artificial Neural NetworkIRJET Journal
This document discusses predicting house prices in Bangalore, India using machine learning algorithms like artificial neural networks. The researchers collected data on house features like area, bedrooms, square footage etc. and applied regression techniques like linear regression, decision tree regression and random forest regression. Decision tree regression had the highest accuracy (R-squared value of 0.998) in predicting prices. A web application was developed using the decision tree model to enable real-time house price predictions based on property features. The study aims to more accurately predict prices based on location and neighborhood amenities compared to existing methods.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSIJCI JOURNAL
This paper is written for predicting Bankruptcy using different Machine Learning Algorithms. Whether the company will go bankrupt or not is one of the most challenging and toughest question to answer in the 21st Century. Bankruptcy is defined as the final stage of failure for a firm. A company declares that it has gone bankrupt when at that present moment it does not have enough funds to pay the creditors. It is a global
problem. This paper provides a unique methodology to classify companies as bankrupt or healthy by applying predictive analytics. The prediction model stated in this paper yields better accuracy with standard parameters used for bankruptcy prediction than previously applied prediction methodologies.
Optimization of network traffic anomaly detection using machine learning IJECEIAES
In this paper, to optimize the process of detecting cyber-attacks, we choose to propose 2 main optimization solutions: Optimizing the detection method and optimizing features. Both of these two optimization solutions are to ensure the aim is to increase accuracy and reduce the time for analysis and detection. Accordingly, for the detection method, we recommend using the Random Forest supervised classification algorithm. The experimental results in section 4.1 have proven that our proposal that use the Random Forest algorithm for abnormal behavior detection is completely correct because the results of this algorithm are much better than some other detection algorithms on all measures. For the feature optimization solution, we propose to use some data dimensional reduction techniques such as information gain, principal component analysis, and correlation coefficient method. The results of the research proposed in our paper have proven that to optimize the cyberattack detection process, it is not necessary to use advanced algorithms with complex and cumbersome computational requirements, it must depend on the monitoring data for selecting the reasonable feature extraction and optimization algorithm as well as the appropriate attack classification and detection algorithms.
This document describes a hybrid expert system called GAMBLE that uses a genetic algorithm to help students get admitted to the best engineering program based on their skills and the branch requirements. GAMBLE calculates each student's aptitude for different branches, suggests the most suitable option, and learns over time to improve its recommendations by modifying the branch thresholds and classifier strings based on past student admissions. The system aims to model human learning aspects and help more students get placed in the appropriate engineering programs.
This document presents a study that compares the performance of 10 classification algorithms (Naive Bayes, SMO, KStar, AdaBoostM1, JRip, OneR, PART, J48, LMT, Random Tree) using 3 datasets from the UCI Machine Learning Repository (German credit data, ionosphere data, vote data). The algorithms are tested using the WEKA machine learning tool. The results show that Random Tree and LMT generally have the best predictive performance across the different testing modes and datasets, with Random Tree achieving the highest accuracy on the German credit and vote datasets, and LMT performing best on the ionosphere data.
Predicting performance of classification algorithmsIAEME Publication
This paper presents a performance comparison study of common classification algorithms using three datasets from the UCI machine learning repository. The algorithms evaluated include Naive Bayes, SMO, KStar, AdaBoostM1, JRip, OneR, PART, J48, LMT, and Random Tree. Each algorithm is evaluated based on accuracy and training time using the WEKA machine learning tool. The goal is to determine which algorithms perform best for a given dataset and identify the optimal size of training data needed.
IRJET- Machine Learning Classification Algorithms for Predictive Analysis in ...IRJET Journal
This document discusses machine learning classification algorithms and their applications for predictive analysis in healthcare. It provides an overview of data mining techniques like association, classification, clustering, prediction, and sequential patterns. Specific classification algorithms discussed include Naive Bayes, Support Vector Machine, Decision Trees, K-Nearest Neighbors, Neural Networks, and Bayesian Methods. The document examines examples of these algorithms being used for disease diagnosis, prognosis, and healthcare management. It analyzes their predictive performance on datasets for conditions like breast cancer, heart disease, and ICU readmissions. Overall, the document reviews how machine learning techniques can enhance predictive accuracy for various healthcare problems.
IRJET - House Price Prediction using Machine Learning and RPAIRJET Journal
This document discusses using machine learning and robotic process automation (RPA) to predict house prices. Specifically, it proposes using the CatBoost algorithm and RPA to extract real-time data for house price prediction. RPA involves using software robots to automate data extraction, while CatBoost will be used to predict prices based on the extracted dataset. The system aims to reduce problems faced by customers by providing more accurate price predictions compared to relying solely on real estate agents. It will extract data using RPA, clean the data, then apply machine learning algorithms like CatBoost to predict house prices based on various attributes.
This document discusses online feature selection (OFS) for data mining applications. It addresses two tasks of OFS: 1) learning with full input, where the learner can access all features to select a subset, and 2) learning with partial input, where only a limited number of features can be accessed for each instance. Novel algorithms are presented for each task, and their performance is analyzed theoretically. Experiments on real-world datasets demonstrate the efficacy of the proposed OFS techniques for applications in computer vision, bioinformatics, and other domains involving high-dimensional sequential data.
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...IRJET Journal
This document presents a novel approach for software defect prediction using dimensionality reduction techniques. The proposed approach uses an artificial neural network to extract features from initial change measures, and then trains a classifier on the extracted features. This is compared to other dimensionality reduction techniques like principal component analysis, linear discriminant analysis, and kernel principal component analysis. Five open source datasets from NASA are used to evaluate the different techniques based on accuracy, F1 score, and area under the receiver operating characteristic curve. The results show that the artificial neural network approach outperforms the other dimensionality reduction techniques, and kernel principal component analysis performs best among those techniques. The document also discusses related work on using machine learning for software defect prediction.
IRJET-Clustering Techniques for Mushroom DatasetIRJET Journal
The document evaluates the performance of different clustering algorithms (Expectation Maximization, Farthest Fast, and K-means) on a mushroom dataset from the UCI machine learning repository. The algorithms are compared based on the number of correctly clustered instances and time taken to build the model. The mushroom dataset contains 8124 instances with 22 attributes classified as edible or poisonous mushrooms. The goal is to group similar mushrooms together using the different clustering techniques in the data mining tool WEKA.
EDGE DETECTION IN DIGITAL IMAGE USING MORPHOLOGY OPERATIONIJEEE
This paper shows a method in digital image processing technique to find the defects in tablets. In this paper we use mathematical manipulation, to detect the defected tablet packet.
IRJET- Breast Cancer Prediction using Deep LearningIRJET Journal
This document discusses using deep learning to predict breast cancer based on tumor data. It proposes using a neural network model to classify tumors as malignant or benign. The key steps are:
1. Collecting and preprocessing tumor cell data to remove noise and inconsistencies.
2. Developing a neural network model and training it on labeled training data to learn patterns.
3. Testing the trained model on unlabeled testing data to evaluate its accuracy in classifying tumors.
The goal is to develop an accurate model to help doctors determine the critical condition of patients and classify difficult tumors.
Automatic Query Expansion Using Word Embedding Based on Fuzzy Graph Connectiv...YogeshIJTSRD
The aim of information retrieval systems is to retrieve relevant information according to the query provided. The queries are often vague and uncertain. Thus, to improve the system, we propose an Automatic Query Expansion technique, to expand the query by adding new terms to the user s initial query so as to minimize query mismatch and thereby improving retrieval performance. Most of the existing techniques for expanding queries do not take into account the degree of semantic relationship among words. In this paper, the query is expanded by exploring terms which are semantically similar to the initial query terms as well as considering the degree of relationship, that is, “fuzzy membership- between them. The terms which seemed most relevant are used in expanded query and improve the information retrieval process. The experiments conducted on the queries set show that the proposed Automatic query expansion approach gave a higher precision, recall, and F measure then non fuzzy edge weights. Tarun Goyal | Ms. Shalini Bhadola | Ms. Kirti Bhatia "Automatic Query Expansion Using Word Embedding Based on Fuzzy Graph Connectivity Measures" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-5 , August 2021, URL: https://www.ijtsrd.com/papers/ijtsrd45074.pdf Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/45074/automatic-query-expansion-using-word-embedding-based-on-fuzzy-graph-connectivity-measures/tarun-goyal
IRJET - Breast Cancer Risk and Diagnostics using Artificial Neural Network(ANN)IRJET Journal
This document describes research using an artificial neural network (ANN) to classify breast cancer as benign or malignant based on the Wisconsin Breast Cancer dataset. The ANN model was trained and tested on 683 instances from the dataset. The model achieved 97.8% accuracy on the training set and 97.5% accuracy on the test set. Various performance metrics including mean absolute error, root mean square error, and kappa statistics were used to evaluate the model, demonstrating low error rates. The ANN model outperformed other classification algorithms in related work and efficiently classified breast cancer with high accuracy and precision.
Efficient decentralized iterative learning tracker for unknown sampled data i...ISA Interchange
In this paper, an efficient decentralized iterative learning tracker is proposed to improve the dynamic performance of the unknown controllable and observable sampled-data interconnected large-scale state-delay system, which consists of NN multi-input multi-output (MIMO) subsystems, with the closed-loop decoupling property. The off-line observer/Kalman filter identification (OKID) method is used to obtain the decentralized linear models for subsystems in the interconnected large-scale system. In order to get over the effect of modeling error on the identified linear model of each subsystem, an improved observer with the high-gain property based on the digital redesign approach is developed to replace the observer identified by OKID. Then, the iterative learning control (ILC) scheme is integrated with the high-gain tracker design for the decentralized models. To significantly reduce the iterative learning epochs, a digital-redesign linear quadratic digital tracker with the high-gain property is proposed as the initial control input of ILC. The high-gain property controllers can suppress uncertain errors such as modeling errors, nonlinear perturbations, and external disturbances (Guo et al., 2000) [18]. Thus, the system output can quickly and accurately track the desired reference in one short time interval after all drastically-changing points of the specified reference input with the closed-loop decoupling property.
IRJET - A Survey on Machine Learning Intelligence Techniques for Medical ...IRJET Journal
This document discusses machine learning techniques for classifying medical datasets. It provides an overview of various artificial intelligence and machine learning algorithms that have been applied for medical dataset classification, including artificial neural networks, support vector machines, k-nearest neighbors, and decision trees. The document surveys works that have used these techniques for diseases like breast cancer, heart disease, and diabetes. It also describes common pre-processing steps for medical datasets like data normalization and feature selection methods like F-score and PCA that are used to select the most important features for classification. The classification algorithms are then evaluated based on accuracy metrics like sensitivity, specificity, and accuracy.
IRJET - Grape Leaf Diseases Classification using Transfer LearningIRJET Journal
This document summarizes a research paper that used transfer learning with the Inception v3 model to classify grape leaf diseases with high accuracy. Specifically:
1. The researchers used the PlantVillage dataset containing over 55,000 images of healthy and diseased grape leaves to train and test their model.
2. They used Inception v3 to extract features from the grape leaf images due to its state-of-the-art performance in image classification tasks.
3. After extracting features with Inception v3, they classified the images using various classifiers like logistic regression, SVM, and neural networks. Logistic regression achieved the highest test accuracy of 99.4%.
Neighborhood search methods with moth optimization algorithm as a wrapper met...IJECEIAES
Feature selection methods are used to select a subset of features from data, therefore only the useful information can be mined from the samples to get better accuracy and improves the computational efficiency of the learning model. Moth-flame Optimization (MFO) algorithm is a population-based approach, that simulates the behavior of real moth in nature, one drawback of the MFO algorithm is that the solutions move toward the best solution, and it easily can be stuck in local optima as we investigated in this paper, therefore, we proposed a MFO Algorithm combined with a neighborhood search method for feature selection problems, in order to avoid the MFO algorithm getting trapped in a local optima, and helps in avoiding the premature convergence, the neighborhood search method is applied after a predefined number of unimproved iterations (the number of tries fail to improve the current solution). As a result, the proposed algorithm shows good performance when compared with the original MFO algorithm and with state-of-the-art approaches.
Comparison of Data Mining Techniques used in Anomaly Based IDS IRJET Journal
This document discusses anomaly-based intrusion detection systems and compares various data mining techniques used in these systems. It begins by defining intrusion detection systems and the two main categories of misuse detection and anomaly detection. Anomaly detection involves learning normal patterns from data and detecting deviations from these patterns as potential anomalies or intrusions.
The document then examines several data mining techniques used for anomaly detection, including statistical-based approaches like chi-square statistics, and clustering algorithms like k-means, k-medoids, and EM clustering. It notes that these techniques can be applied to intrusion detection to analyze data and detect anomalies representing potential malicious activity. The methodology of anomaly detection is also summarized as involving parameterization of data,
IRJET - Disease Detection in Plant using Machine LearningIRJET Journal
This document discusses using machine learning and image processing techniques to detect diseases in plants. The proposed system utilizes convolutional neural networks (CNNs) to classify plant images as either healthy or diseased based on features extracted from the images. The system architecture includes preprocessing the images, extracting color and texture features, running the features through a CNN model for classification training and testing, and outputting whether plants are normal or abnormal. The goal is to help farmers automatically detect plant diseases early on by analyzing images of plant leaves.
IRJET - House Price Predictor using ML through Artificial Neural NetworkIRJET Journal
This document discusses predicting house prices in Bangalore, India using machine learning algorithms like artificial neural networks. The researchers collected data on house features like area, bedrooms, square footage etc. and applied regression techniques like linear regression, decision tree regression and random forest regression. Decision tree regression had the highest accuracy (R-squared value of 0.998) in predicting prices. A web application was developed using the decision tree model to enable real-time house price predictions based on property features. The study aims to more accurately predict prices based on location and neighborhood amenities compared to existing methods.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSIJCI JOURNAL
This paper is written for predicting Bankruptcy using different Machine Learning Algorithms. Whether the company will go bankrupt or not is one of the most challenging and toughest question to answer in the 21st Century. Bankruptcy is defined as the final stage of failure for a firm. A company declares that it has gone bankrupt when at that present moment it does not have enough funds to pay the creditors. It is a global
problem. This paper provides a unique methodology to classify companies as bankrupt or healthy by applying predictive analytics. The prediction model stated in this paper yields better accuracy with standard parameters used for bankruptcy prediction than previously applied prediction methodologies.
Optimization of network traffic anomaly detection using machine learning IJECEIAES
In this paper, to optimize the process of detecting cyber-attacks, we choose to propose 2 main optimization solutions: Optimizing the detection method and optimizing features. Both of these two optimization solutions are to ensure the aim is to increase accuracy and reduce the time for analysis and detection. Accordingly, for the detection method, we recommend using the Random Forest supervised classification algorithm. The experimental results in section 4.1 have proven that our proposal that use the Random Forest algorithm for abnormal behavior detection is completely correct because the results of this algorithm are much better than some other detection algorithms on all measures. For the feature optimization solution, we propose to use some data dimensional reduction techniques such as information gain, principal component analysis, and correlation coefficient method. The results of the research proposed in our paper have proven that to optimize the cyberattack detection process, it is not necessary to use advanced algorithms with complex and cumbersome computational requirements, it must depend on the monitoring data for selecting the reasonable feature extraction and optimization algorithm as well as the appropriate attack classification and detection algorithms.
This document describes a hybrid expert system called GAMBLE that uses a genetic algorithm to help students get admitted to the best engineering program based on their skills and the branch requirements. GAMBLE calculates each student's aptitude for different branches, suggests the most suitable option, and learns over time to improve its recommendations by modifying the branch thresholds and classifier strings based on past student admissions. The system aims to model human learning aspects and help more students get placed in the appropriate engineering programs.
This document presents a study that compares the performance of 10 classification algorithms (Naive Bayes, SMO, KStar, AdaBoostM1, JRip, OneR, PART, J48, LMT, Random Tree) using 3 datasets from the UCI Machine Learning Repository (German credit data, ionosphere data, vote data). The algorithms are tested using the WEKA machine learning tool. The results show that Random Tree and LMT generally have the best predictive performance across the different testing modes and datasets, with Random Tree achieving the highest accuracy on the German credit and vote datasets, and LMT performing best on the ionosphere data.
Predicting performance of classification algorithmsIAEME Publication
This paper presents a performance comparison study of common classification algorithms using three datasets from the UCI machine learning repository. The algorithms evaluated include Naive Bayes, SMO, KStar, AdaBoostM1, JRip, OneR, PART, J48, LMT, and Random Tree. Each algorithm is evaluated based on accuracy and training time using the WEKA machine learning tool. The goal is to determine which algorithms perform best for a given dataset and identify the optimal size of training data needed.
J48 and JRIP Rules for E-Governance DataCSCJournals
Data are any facts, numbers, or text that can be processed by a computer. Data Mining is an analytic process which designed to explore data usually large amounts of data. Data Mining is often considered to be \"a blend of statistics. In this paper we have used two data mining techniques for discovering classification rules and generating a decision tree. These techniques are J48 and JRIP. Data mining tools WEKA is used in this paper.
This document analyzes and compares the performance of various classification algorithms (J48, Random Forest, Multilayer Perceptron, IB1, Decision Table) in predicting student performance using data from 260 students. Random Forest performed the best with 89.23% accuracy, taking the least time to build the model and having the lowest error rates compared to the other algorithms. Attributes like attendance, economic status, and parental education were found to be most important factors influencing student results. The analysis provides insight into how different factors impact student performance.
This document summarizes several major data classification techniques, including decision tree induction, Bayesian classification, rule-based classification, classification by back propagation, support vector machines, lazy learners, genetic algorithms, rough set approach, and fuzzy set approach. It provides an overview of each technique, describing their basic concepts and key algorithms. The goal is to help readers understand different data classification methodologies and which may be suitable for various domain-specific problems.
This document discusses medical data mining and classification techniques. It begins with an introduction to data mining and its applications in healthcare to improve treatment. Medical data mining can help discover patterns in medical data to aid diagnosis. Classification algorithms like decision trees can categorize medical records and help predict outcomes. Specifically, the document discusses the J48 decision tree algorithm available in the WEKA data mining tool, which implements the C4.5 algorithm for classification. Decision trees work by recursively splitting the data into subsets based on attribute values, forming a tree structure. The document concludes that while data mining can help with medical analysis, results from small medical datasets should be interpreted cautiously.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
IRJET- Study and Evaluation of Classification Algorithms in Data MiningIRJET Journal
The document discusses classification algorithms in data mining. It describes classification as a supervised learning technique that predicts categorical class labels. Six classification algorithms are evaluated: Naive Bayes, neural networks, decision trees, random forests, support vector machines, and K-nearest neighbors. The algorithms are evaluated using metrics like accuracy, precision, recall, F1-score and time using the WEKA tool on various datasets. Building accurate and efficient classifiers is an important task in data mining.
UNIT 3: Data Warehousing and Data MiningNandakumar P
UNIT-III Classification and Prediction: Issues Regarding Classification and Prediction – Classification by Decision Tree Introduction – Bayesian Classification – Rule Based Classification – Classification by Back propagation – Support Vector Machines – Associative Classification – Lazy Learners – Other Classification Methods – Prediction – Accuracy and Error Measures – Evaluating the Accuracy of a Classifier or Predictor – Ensemble Methods – Model Section.
Classification on multi label dataset using rule mining techniqueeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
This document discusses classification techniques for data mining. It provides an overview of common classification algorithms including decision trees, k-nearest neighbors (kNN), and Naive Bayes. Decision trees use a top-down approach to classify data based on attribute tests at each node. kNN identifies the k nearest training examples to classify new data points. Naive Bayes assumes independence between attributes and uses Bayes' theorem for classification. The document also discusses how these techniques are used for data cleaning, integration, transformation and knowledge representation in the data mining process.
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
Assessment of Decision Tree Algorithms on Student’s RecitalIRJET Journal
This document presents a study that compares the performance of various decision tree algorithms (J48, Hoeffding Tree, Random Forest, Random Tree, REPTree, Decision Stump) on student academic performance data. The study uses educational datasets containing student marks and percentages to classify students into performance grades (A,B,C) and predict their marks in future semesters. The decision tree algorithms are implemented on the datasets using the WEKA data mining tool. The algorithms are evaluated and compared based on accuracy in classifying students and predicting future marks. The results show that J48, Random Forest and Random Tree algorithms achieved 100% accuracy on the training and some test datasets, performing the best among the algorithms evaluated.
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMSijcsit
Diabetes disease is amongst the most common disease in India. It affects patient’s health and also leads to
other chronic diseases. Prediction of diabetes plays a significant role in saving of life and cost. Predicting
diabetes in human body is a challenging task because it depends on several factors. Few studies have reported the performance of classification algorithms in terms of accuracy. Results in these studies are difficult and complex to understand by medical practitioner and also lack in terms of visual aids as they arepresented in pure text format. This reported survey uses ROC and PRC graphical measures toimproveunderstanding of results. A detailed parameter wise discussion of comparison is also presented which lacksin other reported surveys. Execution time, Accuracy, TP Rate, FP Rate, Precision, Recall, F Measureparameters are used for comparative analysis and Confusion Matrix is prepared for quick review of each
algorithm. Ten fold cross validation method is used for estimation of prediction model. Different sets of
classification algorithms are analyzed on diabetes dataset acquired from UCI repository
Supervised Machine Learning: A Review of Classification ...butest
This document provides an overview of supervised machine learning classification techniques. It discusses 1) general issues in supervised learning such as data preprocessing, feature selection, and algorithm selection, 2) logical/symbolic techniques, 3) perceptron-based techniques, 4) statistical techniques, 5) instance-based learners, 6) support vector machines, and 7) directions for classifier selection. The goal is to describe various supervised machine learning algorithms and provide references for further research rather than provide a comprehensive review of all techniques.
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
Kevin Corke Spouse Revealed A Deep Dive Into His Private Life.pdfMedicoz Clinic
Kevin Corke, a respected American journalist known for his work with Fox News, has always kept his personal life away from the spotlight. Despite his public presence, details about his spouse remain mostly private. Fans have long speculated about his marital status, but Corke chooses to maintain a clear boundary between his professional and personal life. While he occasionally shares glimpses of his family on social media, he has not publicly disclosed his wife’s identity. This deep dive into his private life reveals a man who values discretion, keeping his loved ones shielded from media attention.
Digital Crime – Substantive Criminal Law – General Conditions – Offenses – In...ManiMaran230751
Digital Crime – Substantive Criminal Law – General Conditions – Offenses – Investigation Methods for
Collecting Digital Evidence – International Cooperation to Collect Digital Evidence.
Video Games and Artificial-Realities.pptxHadiBadri1
🕹️ #GameDevs, #AIteams, #DesignStudios — I’d love for you to check it out.
This is where play meets precision. Let’s break the fourth wall of slides, together.
UNIT-4-PPT UNIT COMMITMENT AND ECONOMIC DISPATCHSridhar191373
Statement of unit commitment problem-constraints: spinning reserve, thermal unit constraints, hydro constraints, fuel constraints and other constraints. Solution methods: priority list methods, forward dynamic programming approach. Numerical problems only in priority list method using full load average production cost. Statement of economic dispatch problem-cost of generation-incremental cost curve –co-ordination equations without loss and with loss- solution by direct method and lamda iteration method (No derivation of loss coefficients)
Optimize Indoor Air Quality with Our Latest HVAC Air Filter Equipment Catalogue
Discover our complete range of high-performance HVAC air filtration solutions in this comprehensive catalogue. Designed for industrial, commercial, and residential applications, our equipment ensures superior air quality, energy efficiency, and compliance with international standards.
📘 What You'll Find Inside:
Detailed product specifications
High-efficiency particulate and gas phase filters
Custom filtration solutions
Application-specific recommendations
Maintenance and installation guidelines
Whether you're an HVAC engineer, facilities manager, or procurement specialist, this catalogue provides everything you need to select the right air filtration system for your needs.
🛠️ Cleaner Air Starts Here — Explore Our Finalized Catalogue Now!
Module4: Ventilation
Definition, necessity of ventilation, functional requirements, various system & selection criteria.
Air conditioning: Purpose, classification, principles, various systems
Thermal Insulation: General concept, Principles, Materials, Methods, Computation of Heat loss & heat gain in Buildings
ISO 4020-6.1 – Filter Cleanliness Test Rig: Precision Testing for Fuel Filter Integrity
Explore the design, functionality, and standards compliance of our advanced Filter Cleanliness Test Rig developed according to ISO 4020-6.1. This rig is engineered to evaluate fuel filter cleanliness levels with high accuracy and repeatability—critical for ensuring the performance and durability of fuel systems.
🔬 Inside This Presentation:
Overview of ISO 4020-6.1 testing protocols
Rig components and schematic layout
Test methodology and data acquisition
Applications in automotive and industrial filtration
Key benefits: accuracy, reliability, compliance
Perfect for R&D engineers, quality assurance teams, and lab technicians focused on filtration performance and standard compliance.
🛠️ Ensure Filter Cleanliness — Validate with Confidence.
This presentation provides a comprehensive overview of a specialized test rig designed in accordance with ISO 4548-7, the international standard for evaluating the vibration fatigue resistance of full-flow lubricating oil filters used in internal combustion engines.
Key features include:
This research presents a machine learning (ML) based model to estimate the axial strength of corroded RC columns reinforced with fiber-reinforced polymer (FRP) composites. Estimating the axial strength of corroded columns is complex due to the intricate interplay between corrosion and FRP reinforcement. To address this, a dataset of 102 samples from various literature sources was compiled. Subsequently, this dataset was employed to create and train the ML models. The parameters influencing axial strength included the geometry of the column, properties of the FRP material, degree of corrosion, and properties of the concrete. Considering the scarcity of reliable design guidelines for estimating the axial strength of RC columns considering corrosion effects, artificial neural network (ANN), Gaussian process regression (GPR), and support vector machine (SVM) techniques were employed. These techniques were used to predict the axial strength of corroded RC columns reinforced with FRP. When comparing the results of the proposed ML models with existing design guidelines, the ANN model demonstrated higher predictive accuracy. The ANN model achieved an R-value of 98.08% and an RMSE value of 132.69 kN which is the lowest among all other models. This model fills the existing gap in knowledge and provides a precise means of assessment. This model can be used in the scientific community by researchers and practitioners to predict the axial strength of FRP-strengthened corroded columns. In addition, the GPR and SVM models obtained an accuracy of 98.26% and 97.99%, respectively.
Bituminous binders are sticky, black substances derived from the refining of crude oil. They are used to bind and coat aggregate materials in asphalt mixes, providing cohesion and strength to the pavement.