0% found this document useful (0 votes)
56 views

Data Processing Functions: Collection

Data processing methods have evolved from manual methods using pen and paper, to mechanical methods leveraging devices like typewriters and printers, to modern electronic data processing using computers and software to manipulate large volumes of data quickly and produce meaningful insights and information. Common data processing methods include batch processing, online processing, real-time processing, and distributed processing, each suited to different types and volumes of data with varying needs for speed and interaction. The goal of all data processing methods is to take raw data as input and output useful information through systematic collection, organization, analysis, and presentation of data.

Uploaded by

Pushpa Pathak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Data Processing Functions: Collection

Data processing methods have evolved from manual methods using pen and paper, to mechanical methods leveraging devices like typewriters and printers, to modern electronic data processing using computers and software to manipulate large volumes of data quickly and produce meaningful insights and information. Common data processing methods include batch processing, online processing, real-time processing, and distributed processing, each suited to different types and volumes of data with varying needs for speed and interaction. The goal of all data processing methods is to take raw data as input and output useful information through systematic collection, organization, analysis, and presentation of data.

Uploaded by

Pushpa Pathak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Work related stress and depression is becoming a major concern at the work place and a major factor in decreased

productivity amongst office workers. A Virtual Reality (VR) application was developed as a new form of stress
management that could be deployed at the workplace and allow workers to relax. It begins with a real time face
recognition system installed at the desk of each employee, it can continuously perform emotional state recognition
and predict labels of the emotion. This facial recognition system works on field programmable gate array. When it
recognises mood to be stress or anxious it pings employee with a VR application that provides a series of relaxation
videos or customized videos that boost the employee's mind.

Sakshi The interaction between human beings and computers will be natural if computers are able to perceive and respond to
human non-verbal communication such as emotions. Facial expression conveys non-verbal cues which plays an important role in
interpersonal relations. Sentimental Analysis on the other hand is contextual mining of text that identifies the subjective
information, widely used as a part of social media analysis for any domain, be it business analysis, a product launch to understand
its reception by the people. The paper analyses the affective computing based on facial expressions and opinion mining or
sentimental analysis using text and explains various applications.
Keywords—affective computing, sentiment analysis, emotion

Nikita Stress at workplace has become an increasing phenomenon due to external factors such as technological
advancement, changes in the economy of a country which might lead to becoming redundant and so on. Stress can be
considered as an inevitable condition at least at one point in time or another; however, it can also be minimized to the
extent that the productivity and health of the employee are maintained which could lead to a productive organization.
Stress is also bound to occur in multinational companies where operation is global and employees have different cultural
background. In this paper, we exploratory study investigating the need of virtual reality for relaxation and to reduce the
stress of people at workplace.
Keywords - inevitable, productivity, virtual reality, employee

Data processing is, generally, "the collection and manipulation of items of data to
produce meaningful information."[1] In this sense it can be considered a subset
of information processing, "the change (processing) of information in any manner
detectable by an observer." [note 1]
The term Data Processing (DP) has also been used to refer to a department within an
organization responsible for the operation of data processing applications. [2]

Data processing functions[edit]


Data processing may involve various processes, including:

 Validation – Ensuring that supplied data is correct and relevant.


 Sorting – "arranging items in some sequence and/or in different sets."
 Summarization – reducing detail data to its main points.
 Aggregation – combining multiple pieces of data.
 Analysis – the "collection, organization, analysis, interpretation and presentation of data."
 Reporting – list detail or summary data or computed information.
 Classification – separation of data into various categories.

History
The United States Census Bureau history illustrates the evolution of data processing
from manual through electronic procedures.
Manual data processing[edit]
Although widespread use of the term data processing dates only from the nineteen-
fifties,[3] data processing functions have been performed manually for millennia. For
example, bookkeeping involves functions such as posting transactions and producing
reports like the balance sheet and the cash flow statement. Completely manual
methods were augmented by the application of mechanical or electronic calculators. A
person whose job was to perform calculations manually or using a calculator was called
a "computer."
The 1890 United States Census schedule was the first to gather data by individual
rather than household. A number of questions could be answered by making a check in
the appropriate box on the form. From 1850 through 1880 the Census Bureau employed
"a system of tallying, which, by reason of the increasing number of combinations of
classifications required, became increasingly complex. Only a limited number of
combinations could be recorded in one tally, so it was necessary to handle the
schedules 5 or 6 times, for as many independent tallies."[4] "It took over 7 years to
publish the results of the 1880 census"[5] using manual processing methods.
Automatic data processing[edit]
The term automatic data processing was applied to operations performed by means
of unit record equipment, such as Herman Hollerith's application of punched
card equipment for the 1890 United States Census. "Using Hollerith's punchcard
equipment, the Census Office was able to complete tabulating most of the 1890 census
data in 2 to 3 years, compared with 7 to 8 years for the 1880 census.... It is estimated
that using Hollerith's system saved some $5 million in processing costs" [5] in 1890
dollars even though there were twice as many questions as in 1880.
Electronic data processing[edit]
Computerized data processing, or Electronic data processing represents a later
development, with a computer used instead of several independent pieces of
equipment. The Census Bureau first made limited use of electronic computers for
the 1950 United States Census, using a UNIVAC I system,[4] delivered in 1952.
Other developments[edit]
The term data processing has mostly been subsumed by the more general
term information technology (IT).[6] The older term "data processing" is suggestive of
older technologies. For example, in 1996 the Data Processing Management
Association(DPMA) changed its name to the Association of Information Technology
Professionals. Nevertheless, the terms are approximately synonymous.

Applications[edit]
Commercial data processing[edit]
Main article: Electronic data processing
Commercial data processing involves a large volume of input data, relatively few
computational operations, and a large volume of output. For example, an insurance
company needs to keep records on tens or hundreds of thousands of policies, print and
mail bills, and receive and post payments.
Data analysis[edit]
Main article: Data analysis

In science and engineering, the terms data processing and information systems are
considered too broad, and the term data processing is typically used for the initial stage
followed by a data analysis in the second stage of the overall data handling.
Data analysis uses specialized algorithms and statistical calculations that are less often
observed in a typical general business environment. For data analysis, software suites
like SPSS or SAS, or their free counterparts such as DAP, gretl or PSPP are often
used.
n mechanical data processing method, data is processed by using different devices like
typewriters, mechanical printers or other mechanical devices. Thismethod of data
processing is faster and more accurate than manual data processing. ... Electronic data
processing or EDP is the modern technique toprocess data.

Data Processing & Data Processing


Methods
Published by -admin Category -Computer Applications Tags -
batch processing Data Processing data processing methods distributed processing electronic data
processing manual data processing methods of data processing online processing real time processing stages of
data processing

What is Data Processing


Data processing is simply the conversion of raw data to meaningful information through a
process. Data is manipulated to produce results that lead to a resolution of a problem or
improvement of an existing situation. Similar to a production process, it follows a cycle
where inputs (raw data) are fed to a process (computer systems, software, etc.) to produce
output (information and insights).

Generally, organizations employ computer systems to carry out a series of operations on


the data to present, interpret, or obtain information. The process includes activities like data
entry, summary, calculation, storage, etc. A useful and informative output is presented in
various appropriate forms such as diagrams, reports, graphics, etc.
Need of data processing
Data processing is important in business and scientific operations. Business data is
repeatedly processed, and usually needs large volumes of output. Scientific data requires
numerous computations and usually needs fast-generating outputs.

Related: Data Processing Cycle, Information Processing Cycle

Data processing methods and data processing techniques

1. Manual Data Processing

In manual data processing, data is processed manually without using any machine or tool
to get the required results. In manual data processing, all the calculations and logical
operations are performed manually on the data. Similarly, data is transferred manually from
one place to another. This method of data processing is very slow, and errors may occur in
the output. Mostly, is processed manually in many small business firms as well as
government offices & institutions. In an educational institute, for example, marks sheets, fee
receipts, and other financial calculations (or transactions) are performed by hand. This
method is avoided as far as possible because of the very high probability of error, labor
intensive and very time-consuming. This type of data processing forms the very primitive
stage when technology was not available, or it was not affordable. With the advancement of
technology, the dependency on manual methods has drastically decreased.

2. Mechanical Data Processing

In the mechanical data processing method, data is processed by using different devices
like typewriters, mechanical printers or other mechanical devices. This method of data
processing is faster and more accurate than manual data processing. These are faster than
the manual mode but still forms the early stages of data processing. With invention and
evolution of more complex machines with better computing power this type of processing
also started fading away. Examination boards and printing press use mechanical data
processing devices frequently.

3. Electronic Data Processing

Electronic data processing or EDP is the modern technique to process data. The data is
processed through a computer; Data and set of instructions are given to the computer as
input, and the computer automatically processes the data according to the given set of
instructions. The computer is also known as electronic data processing machine.

This method of processing data is very fast and accurate. For example, in a computerized
education environment results of students are prepared through a computer; in banks,
accounts of customers are maintained (or processed) through computers, etc.
Related: Methods of data collection

Methods of Data Processing by electronic means –


1. Batch Processing

Batch Processing is a method where the information to be organized is sorted into groups to
allow for efficient and sequential processing. Online Processing is a method that utilizes
Internet connections and equipment directly attached to a computer. It is used mainly for
information recording and research. Real-Time Processing is a technique that can respond
almost immediately to various signals to acquire and process information. Distributed
Processing is commonly utilized by remote workstations connected to one big central
workstation or server. ATMs are good examples of this data processing method.

2. Online Processing

This is a method that utilizes Internet connections and equipment directly attached to a
computer. This allows for the data stored in one place and being used at an altogether
different place. Cloud computing can be considered as an example which uses this type of
processing. It is used mainly for information recording and research.

3. Real-Time Processing

This technique can respond almost immediately to various signals to acquire and process
information. These involve high maintenance and upfront cost attributed to very advanced
technology and computing power. Time saved is maximum in this case as the output is
seen in real time. For example in banking transactions

4. Distributed Processing

This method is commonly utilized by remote workstations connected to one big central
workstation or server. ATMs are good examples of this data processing method. All the end
machines run on a fixed software located at a particular place and make use of exactly
same information and sets of instruction.
Data Processing Cycle

The Data Processing Cycle is a series of steps carried out to extract information from raw
data. Although each step must be taken in order, the order is cyclic. The output and storage
stage can lead to the repeat of the data collection stage, resulting in another cycle of data
processing. The cycle provides a view on how the data travels and transforms from
collection to interpretation, and ultimately, used in effective business decisions.

Stages of the Data Processing Cycle

1) Collection
It is the first stage of the cycle and is very crucial since the quality of data collected will
impact heavily on the output. The collection process needs to ensure that the data gathered
are both defined and accurate so that subsequent decisions based on the findings are
valid. This stage provides both the baseline from which to measure and a target on what to
improve.

Some types of data collection include census (data collection about everything in a group
or statistical population), sample survey (collection method that contains only part of the
total population), and administrative by-product (data collection is a byproduct of an
organization’s day-to-day operations).

2) Preparation

It is the manipulation of data into a form suitable for further analysis and processing. Raw
data cannot be processed and must be checked for accuracy. Preparation is about
constructing a dataset from one or more data sources to be used for further exploration and
processing. Analyzing data that has not been carefully screened for problems can produce
highly misleading results that are heavily dependent on the quality of data prepared.

3) Input

It is the task where verified data is coded or converted into machine-readable form so that it
can be processed through a computer. Data entry is done through the use of a keyboard,
digitizer, scanner, or data entry from an existing source. This time-consuming process
requires speed and accuracy. Most data need to follow a formal and strict syntax since a
great deal of processing power is needed to break down the complex data at this stage.
Due to the costs, many businesses are resorting to outsourcing this stage.

4) Processing

It is when the data is subjected to various means and methods of manipulation, the point
where a computer program is being executed, and it contains the program code and its
current activity. The process may be made up of multiple threads of execution that
simultaneously execute instructions, depending on the operating system. While a computer
program is a passive collection of instructions, a process is the actual execution of those
instructions. Many software programs are available for processing large volumes of data
within very short periods.

5) Output and interpretation

It is the stage where processed information is now transmitted to the user. The output is
presented to users in various report formats like a printed report, audio, video, or on the
monitor. Output needs to be interpreted so that it can provide meaningful information that
will guide future decisions of the company.

6) Storage

It is the last stage in the data processing cycle, where data, instruction, and information are
held for future use. The importance of this cycle is that it allows quick access and retrieval
of the processed information, allowing it to be passed on to the next stage directly when
needed. Every computer uses storage to hold system and application software.

Related: Data Management Best Practices, Information Processing Cycle

Data Processing System

A data processing system is a combination of machines and people that for a set of inputs
produces a defined set of outputs. The inputs and outputs are interpreted as data, facts,
information, depending on the interpreter’s relation to the system.

A data processing system may involve some combination of:

 Conversion is converting data to another format.


 Validation – Ensuring that supplied data is “clean, correct and useful.”
 Sorting – “arranging items in some sequence and/or in different sets.”
 Summarization – reducing detail data to its main points.
 Aggregation – combining multiple pieces of data.
 Analysis – the “collection, organization, analysis, interpretation and presentation of
data.”.
 Reporting – list detail or summary data or computed information.

Applications

Commercial Data Processing

Commercial data processing involves a large volume of input data, relatively few
computational operations, and a large volume of output. For example, an insurance
company needs to keep records on tens or hundreds of thousands of policies, print and mail
bills, and receive and post payments.

Data Analysis

In a science or engineering field, the terms data processing and information systems are
considered too broad, and the more specialized term data analysis is typically used. Data
analysis makes use of specialized and highly accurate algorithms and statistical
calculations that are less often observed in the typical general business environment.

12 Data Mining Tools and


Techniques
12 Data Mining
Tools and Techniques

What is Data Mining?

Data mining is a popular technological innovation that converts piles of data into
useful knowledge that can help the data owners/users make informed choices
and take smart actions for their own benefit. In specific terms, data mining
looks for hidden patterns amongst enormous sets of data that can help to
understand, predict, and guide future behavior. A more technical explanation:
Data Mining is the set of methodologies used in analyzing data from various
dimensions and perspectives, finding previously unknown hidden patterns,
classifying and grouping the data and summarizing the identified relationships.

The elements of data mining include extraction, transformation, and loading of


data onto the data warehouse system, managing data in a multidimensional
database system, providing access to business analysts and IT experts,
analyzing the data by tools, and presenting the data in a useful format, such as
a graph or table. This is achieved by identifying relationship using classes,
clusters, associations, and sequential patterns by the use of statistical analysis,
machine leaning and neural networks.
The Importance of Data Mining

Data can generate revenue. It is a valuable financial asset of an enterprise.


Businesses can use data mining for knowledge discovery and exploration of
available data. This can help them predict future trends, understand customer’s
preferences and purchase habits, and conduct a constructive market analysis.
They can then build models based on historical data patterns and garner more
from targeted market campaigns as well as strategize more profitable selling
approaches. Data mining helps enterprises to make informed business
decisions, enhances business intelligence, thereby improving the company’s
revenue and reducing cost overheads. Data mining is also useful in finding data
anomaly patterns that are essential in fraud detection and areas of weak or
incorrect data collation/ modification. Getting the help of experienceddata entry
service providers in the early stages of data management can make the
subsequent data mining easier.

Data Mining Techniques

The art of data mining has been constantly evolving. There are a number of
innovative and intuitive techniques that have emerged that fine-tune data
mining concepts in a bid to give companies more comprehensive insight into
their own data with useful future trends. Many techniques are employed by the
data mining experts, some of which are listed below:

1. Seeking Out Incomplete Data:

Data mining relies on the actual data present, hence if data is


incomplete, the results would be completely off-mark. Hence, it is
imperative to have the intelligence to sniff out incomplete data if
possible. Techniques such as Self-Organizing-Maps (SOM’s), help to
map missing data based by visualizing the model of multi-dimensional
complex data. Multi-task learning for missing inputs, in which one
existing and valid data set along with its procedures is compared with
another compatible but incomplete data set is one way to seek out
such data. Multi-dimensional preceptors using intelligent algorithms to
build imputation techniques can address incomplete attributes of data.
2. Dynamic Data Dashboards:

This is a scoreboard, on a manager or supervisor’s computer, fed with


real-time from data as it flows in and out of various databases within
the company’s environment. Data mining techniques are applied to
give live insight and monitoring of data to the stakeholders.

3. Database Analysis:

Databases hold key data in a structured format, so algorithms built


using their own language (such as SQL macros) to find hidden patterns
within organized data is most useful. These algorithms are sometimes
inbuilt into the data flows, e.g. tightly coupled with user-defined
functions, and the findings presented in a ready-to-refer-to report with
meaningful analysis.

A good technique is to have the snapshot dump of data from a large


database in a cache file at any time and then analyze it further.
Similarly, data mining algorithms must be able to pull out data from
multiple, heterogeneous databases and predict changing trends.

4. Text Analysis:

This concept is very helpful to automatically find patterns within the


text embedded in hordes of text files, word-processed files, PDFs, and
presentation files. The text-processing algorithms can for instance, find
out repeated extracts of data, which is quite useful in the publishing
business or universities for tracing plagiarism.

5. Efficient Handling of Complex and Relational Data:

A data warehouse or large data stors must be supported with


interactive and query-based data mining for all sorts of data mining
functions such as classification, clustering, association, prediction.
OLAP (Online Analytical Processing) is one such useful methodology.
Other concepts that facilitate interactive data mining are analyzing
graphs, aggregate querying, image classification, meta-rule guided
mining, swap randomization, and multidimensional statistical analysis.
6. Relevance and Scalability of Chosen Data Mining Algorithms:

While selecting or choosing data mining algorithms, it is imperative that


enterprises keep in mind the business relevance of the predictions and
the scalability to reduce costs in future. Multiple algorithms should be
able to be executed in parallel for time efficiency, independently and
without interfering with the transnational business applications,
especially time-critical ones. There should be support to include SVMs
on larger scale.

7. Popular Tools for Data Mining:

There are many ready made tools available for data mining in the
market today. Some of these have common functionalities packaged
within, with provisions to add-on functionality by supporting building of
business-specific analysis and intelligence.

Listed below are some of the popular multi-purpose data


mining tools that are leading the trends:

8. Rapid Miner (erstwhile YALE):

This is very popular since it is a ready made, open source, no-coding


required software, which gives advanced analytic s. Written in Java, it
incorporates multifaceted data mining functions such as data
preprocessing, visualization, predictive analysis, and can be easily
integrated with WEKA and R-tool to directly give models from scripts
written in the former two.

9. WEKA:

This is a JAVA based customization tool, which is free to use. It


includes visualization and predictive analysis and modeling techniques,
clustering, association, regression and classification.

10. R-Programming Tool:


This is written in C and FORTRAN, and allows the data miners to write
scripts just like a programming language/platform. Hence, it is used to
make statistical and analytical software for data mining. It supports
graphical analysis, both linear and nonlinear modeling, classification,
clustering and time-based data analysis.

11. Python based Orange and NTLK:

Python is very popular due to ease of use and its powerful features.
Orange is an open source tool that is written in Python with useful data
analytic s, text analysis, and machine-learning features embedded in a
visual programming interface. NTLK, also composed in Python, is a
powerful language processing data mining tool, which consists of data
mining, machine learning, and data scraping features that can easily be
built up for customized needs.

12. Knime:

Primarily used for data preprocessing – i.e. data extraction,


transformation and loading, Knime is a powerful tool with GUI that
shows the network of data nodes. Popular amongst financial data
analysts, it has modular data pipe lining, leveraging machine learning,
and data mining concepts liberally for building business intelligence
reports.

Data mining tools and techniques are now more important than ever for all
businesses, big or small, if they would like to leverage their existing data stores
to make business decisions that will give them a competitive edge. Such actions
based on data evidence and advanced analytics have better chances of
increasing sales and facilitating growth. Adopting well-established techniques
and tools and availing the help of data mining experts shall assist companies to
utilize relevant and powerful data mining concepts to their fullest potential.

You might also like