DA_Unit_1
DA_Unit_1
4CS 1220
Module-I
Content
• Introduction to Data Analytics
Data
would sell and the profit we can expect in the future.
Unstructured Semistructured
Structured data
data data
• Structured Data: Structured data is created using a fixed
schema and is maintained in tabular format. The elements in
Data structured data are addressable for effective analysis. It
contains all the data which can be stored in the SQL
Classification database in a tabular format. Today, most of the data is
: Structured
developed and processed in the simplest way to manage
information.
Data • Examples of structured data include dates, names,
addresses, credit card numbers.
• Consider an example for Relational Data like you have to
maintain a record of students for a university like the name
of the student, ID of a student, address, and Email of the
student. To store the record of students used the following
relational schema and table for the same.
Example:
Phase 3:
will be used.
• Create an initial hypothesis about how the model will behave with
Model
accuracy.
• The team creates datasets for training, testing as well as production
Building use.
• The team is also evaluating whether its current tools are sufficient to
run the models or if they require an even more robust environment to
run models.
• Tools that are free or open-source or free tools Rand PL/R, Octave,
WEKA.
• Commercial tools - MATLAB
• Visualize results through charts, graphs, dashboards, and other
visuals to make insights understandable.
• Present key findings to stakeholders in a concise, actionable format.
• Summarize the insights derived from the data and explain how they
align with business goals.
Phase 5:
• Following the execution of the model, team members will need to
Communication evaluate the outcomes of the model to establish criteria for the success
Results or failure of the model.
• The team is considering how best to present findings and outcomes to
the various members of the team and other stakeholders while taking
into consideration cautionary tales and assumptions.
• The team should determine the most important findings, quantify their
value to the business and create a narrative to present findings and
summarize them to all stakeholders.
• Deploy the model into production environments, integrating it into
business processes.
• Automate tasks like decision-making or predictions based on the
model’s insights.
• Continuously monitor model performance to ensure it remains
accurate and relevant as new data becomes available.
Phase 6: • The team distributes the benefits of the project to a wider audience. It
Operationalize sets up a pilot project that will deploy the work in a controlled manner
prior to expanding the project to the entire enterprise of users.
• This technique allows the team to gain insight into the performance
and constraints related to the model within a production setting at a
small scale and then make necessary adjustments before full
deployment.
• The team produces the last reports, presentations, and codes.
• Open source or free tools such as WEKA, SQL, MADlib, and Octave.
Methods of data analytics
• The action of grouping a set of data elements in a
way that said elements are more similar (in a
particular sense) to each other than to those in other
Cluster groups – hence the term ‘cluster.’