0% found this document useful (0 votes)
2 views

DMDW Lab10[1]

The document outlines the implementation of the FP-Growth and Hierarchical clustering algorithms using the WEKA tool and Python programming. FP-Growth efficiently finds frequent patterns in large datasets using an FP-Tree, while Hierarchical clustering builds a hierarchy of clusters without predefining the number of clusters. Both algorithms are demonstrated with code examples and visualizations for better understanding.

Uploaded by

jagnoorsm.cs.22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DMDW Lab10[1]

The document outlines the implementation of the FP-Growth and Hierarchical clustering algorithms using the WEKA tool and Python programming. FP-Growth efficiently finds frequent patterns in large datasets using an FP-Tree, while Hierarchical clustering builds a hierarchy of clusters without predefining the number of clusters. Both algorithms are demonstrated with code examples and visualizations for better understanding.

Uploaded by

jagnoorsm.cs.22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

LAB - 910

1. Implement and demonstrate the FP-Growth algorithm using (i) the WEKA tool and (ii)
Python programming.

The FP-Growth algorithm is an efficient method for finding frequent patterns in large datasets
without the need for candidate generation, which makes it much faster than the Apriori algorithm.
It compresses the input dataset into a compact data structure known as an FP-Tree (Frequent
Pattern Tree). The algorithm first scans the database to identify frequent items and organizes them
into the tree structure based on their frequency. Then, it recursively mines the FP-Tree to extract
frequent itemsets by exploring the conditional patterns. Since it avoids the expensive process of
generating and testing a large number of candidate sets, FP-Growth is highly efficient, especially
for large and dense datasets. It is widely used in applications like market basket analysis, customer
behavior analysis, and recommender systems.

(i) the WEKA tool

(ii)Python programming

import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# (A) Prepare Data


dataset = [
['milk', 'bread'],
['milk', 'butter'],
['bread', 'eggs'],
['milk', 'bread', 'butter']
]

# (B) Convert to one-hot encoding


from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_data = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_data, columns=te.columns_)

# (C) Apply FP-Growth (similar to Apriori in mlxtend)


from mlxtend.frequent_patterns import fpgrowth

frequent_itemsets = fpgrowth(df, min_support=0.5, use_colnames=True)


print(frequent_itemsets)

# (D) Generate association rules


rules = association_rules(frequent_itemsets, metric="confidence",
min_threshold=0.7)
print(rules)

OUTPUT:

2. Implement and demonstrate the Hierarchical clustering algorithm using (i) the WEKA
tool and (ii) Python programming.

Hierarchical clustering is a method of cluster analysis that seeks to build a hierarchy of clusters.
It does not require specifying the number of clusters in advance, unlike other methods such as K-
Means. The process begins by treating each data point as its own individual cluster. Then, in a
step-by-step manner, it merges the closest pairs of clusters based on a chosen distance metric
(like Euclidean distance) and linkage criterion (such as single, complete, or average linkage).
This continues until all points are combined into a single cluster, forming a tree-like structure
known as a dendrogram. This dendrogram can be cut at any level to obtain the desired number of
clusters. Hierarchical clustering is useful for visualizing data structure and is often applied in
fields like bioinformatics and social sciences.
(i) the WEKA tool

Hierarchical Clustering Visualization of the Iris Dataset


Dendrogram of Iris Dataset using Hierarchical Clustering

(ii)Python programming
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.cluster import AgglomerativeClustering

# 1. Generate some sample data


np.random.seed(42)
X = np.random.rand(10, 2) # 10 points in 2D

# 2. Plot the points


plt.scatter(X[:, 0], X[:, 1], color='black')
plt.title('Data Points')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
# 3. Create the linkage matrix
linked = linkage(X, method='ward') # Ward minimizes the variance

# 4. Plot the Dendrogram


plt.figure(figsize=(10, 5))
dendrogram(linked,
orientation='top',
distance_sort='descending',
show_leaf_counts=True)
plt.title('Dendrogram')
plt.xlabel('Data Points')
plt.ylabel('Distance')
plt.show()

# 5. Apply Agglomerative Clustering


from sklearn.cluster import AgglomerativeClustering

# Apply Agglomerative Clustering (corrected version)


cluster = AgglomerativeClustering(n_clusters=3, linkage='ward')
labels = cluster.fit_predict(X)

# 6. Plot clustered data


plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='rainbow')
plt.title('Hierarchical Clustering Results')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

OUTPUT:

Data Points
Dendrogram

Hierarchical Clustering Results

You might also like