0% found this document useful (0 votes)
47 views

GraphAnalyticsPeerReviewReportTemplate PDF

The document describes modeling chat data using a graph database. It includes the steps taken to create the graph database from 6 CSV files describing chat interactions and relationships. It then provides examples of analyzing the graph database to find the longest conversation chain, most active users and teams, and whether any of the most active users were part of the most active teams. The analysis of group activity connects mentioned users, response connections, eliminates self-interactions, and calculates cluster coefficients to identify the top 3 most active users.

Uploaded by

Nguyen Trong duy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

GraphAnalyticsPeerReviewReportTemplate PDF

The document describes modeling chat data using a graph database. It includes the steps taken to create the graph database from 6 CSV files describing chat interactions and relationships. It then provides examples of analyzing the graph database to find the longest conversation chain, most active users and teams, and whether any of the most active users were part of the most active teams. The analysis of group activity connects mentioned users, response connections, eliminates self-interactions, and calculates cluster coefficients to identify the top 3 most active users.

Uploaded by

Nguyen Trong duy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Graph Analytics

Coursera Big Data Specialization Capstone Project, Week 4

Modeling Chat Data using a Graph Data Model


The graph model is a network based on chat interactions between users. A chat session can be
initiated by a user, other users on the same team are able to join and leave the session.
Interactions between users begins when a user create a post. It’s possible for a user, mention
another user. All relationship between entities are logged with a timestamp.

Creation of the Graph Database for Chats


Describe the steps you took for creating the graph database.

Write the schema of the 6 CSV files

chat_create_team_chat.csv userID
teamID
teamChatSessionID
timestamp

chat_join_team_chat.csv userID
teamChatSessionID
timestamp

chat_leave_team_chat.csv userID
teamChatSessionID
timestamp

chat_item_team_chat.csv userID
teamChatSessionID
chatItemID
timestamp

chat_mention_team_chat.csv chatItemID
userID
timestamp

chat_respons_team_chat.csv chatItemID_1
chatItemID_2
timestamp
Explain the loading process and include a sample LOAD command

The first line load the csv from the specific location one row at a time. From the second line to fourth,
create the nodes for User, Team, TeamChatSession with a specific column converted to integer, this field
is used by the id attribute. The fifth and sixth lines create CreatesSession and OwnedBy edges and link
the nodes previously created. The edges have a timestamp property filled by the fourth column of
schema.

Present a screenshot of some part of the graph you have generated. The graphs must include clearly
visible examples of most node and edge types.

Finding the longest conversation chain and its participants


Report the results including the length of the conversation (path length) and how many unique users
were part of the conversation chain. Describe your steps. Write the query that produces the correct
answer.

How many cats are involved in it?


The longest conversation chain in the chat data has path length 9, therefore 10 chats are involved in it.

How many users participated in this chain?

With 9 as longest path, count the number of distinct users who create ChatItem in this longest path. The
query returns 5.

Analyzing the relationship between top 10 chattiest users and top 10 chattiest
teams
Describe your steps from Question 2. In the process, create the following two tables. You only need to
include the top 3 for each table. Identify and report whether any of the chattiest users were part of any
of the chattiest teams.

Chattiest Users

Determine the number of chats created by a user from the CreateChat edge
Users Number of Chats
394 115
2067 111
209 109

Chattiest Teams

Match all ChatItem with a PartOd edge and connect them with a TeamChatSession node that have an
OwnedBy edge connection them with any other node.

Teams Number of Chats


82 1324
185 1036
112 957

Finally, present your answer, i.e. whether or not any of the chattiest users are part of any of the
chattiest teams.

This query is used to investigate if the most chattiest user are part of any chattiest team and it return
one result, userID 999 is part of teamID 52.

How Active Are Groups of Users?


Describe your steps for performing this analysis. Be as clear, concise, and as brief as possible. Finally,
report the top 3 most active users in the table below.

Connect mentioned users


Connect users responses with the chat creator

Eliminate all self interaction

Calclulate the cluster coefficient.

Most Active Users (based on Cluster Coefficients)

User ID Coefficient
394 0.9167
2067 0.7679
209 0.9524

You might also like