Integrity in The Data LifeCycle
Integrity in The Data LifeCycle
If you are working with data in a Life Sciences organisation it is imperative that you can guarantee its
integrity at every stage of the Data LifeCycle. Below we identify the 5 stages of Data LifeCycle
Management and what you need to ensure is in place at each stage.
https://www.dataworks.ie/5-stages-in-the-data-management-lifecycle-process/
1. Data Creation
The first phase of the data lifecycle is the creation/capture of data. This data can be in many
forms e.g. PDF, image, Word document, SQL database data. Data is typically created by an
organisation in one of 3 ways:
Data Acquisition: acquiring already existing data which has been produced outside the
organisation
Data Entry: manual entry of new data by personnel within the organisation
Data Capture: capture of data generated by devices used in various processes in the
organisation
2. Storage
Once data has been created within the organisation, it needs to be stored and protected, with
the appropriate level of security applied. A robust backup and recovery process should also
be implemented to ensure retention of data during the lifecycle.
3. Usage
During the usage phase of the data lifecycle, data is used to support activities in the
organisation. Data can be viewed, processed, modified and saved. An audit trail should be
maintained for all critical data to ensure that all modifications to data are fully traceable. Data
may also be made available to share with others outside the organisation.
4. Archival
Data Archival is the copying of data to an environment where it is stored in case it is needed
again in an active production environment, and the removal of this data from all active
production environments.
A data archive is simply a place where data is stored, but where no maintenance or general
usage occurs. If necessary, the data can be restored to an environment where it can be used.
5. Destruction
The volume of archived data inevitably grows, and while you may want to save all your data
forever, that’s not feasible. Storage cost and compliance issues exert pressure to destroy data
you no longer need. Data destruction or purging is the removal of every copy of a data item
from an organisation. It is typically done from an archive storage location. The challenge of
this phase of the lifecycle is to ensure that the data has been properly destroyed. It is
important to ensure before destroying data that the data items have exceeded their required
regulatory retention period.
Having a clearly defined and documented data lifecycle management process is key to
ensuring Data Governance can be carried out effectively within your organisation.
At Dataworks our highly skilled CSV & Software Engineers provide a full range of Data
Integrity services as part of our offering including: Data Integrity assessments,
remediation software and validation services.
SDLC Meaning:
The software development lifecycle (SDLC) is the series of steps an organization follows to
develop and deploy its software. There isn't a single, unified software development lifecycle.
Rather, there are several frameworks and models that development teams follow to create,
test, deploy, and maintain software
Software Development Methodologies
The most frequently used software development models include:
Waterfall
In the waterfall methodology, the development process only progresses to the next phase
when all work is completed. This means a slower, but more complete single release.
Agile
The agile framework is built around rapid change and continuous improvement. Agile
developers collaborate constantly, developing a framework with a clear set of principles and
objectives to guide their flexible development process.
Iterative Development
Spiral Development
The spiral methodology often relies on some of the other frameworks, such as Agile or
DevOps, depending on the components or projects. The spiral framework is a risk-based
approach that helps determine the right choices for the situation at hand.
V-Model Development
An extension of the waterfall methodology, the V-model involves testing methods. As its
name suggests, it uses a V-shaped model for validation purposes.
We reference this content a lot, so I decided to compile it all into a single post. This is the
original content, including internal links, and has not been re-edited.
Introduction
Four years ago I wrote the initial Data Security Lifecycle and a series of posts covering the
constituent technologies. In 2009 I updated it to better fit cloud computing, and it was
incorporated into the Cloud Security Alliance Guidance, but I have never been happy with
that work. It was rushed and didn’t address cloud specifics nearly sufficiently.
Adrian and I just spent a bunch of time updating the cycle and it is now a much better
representation of the real world. Keep in mind that this is a high-level model to help guide
your decisions, but we think this time around we were able to identify places where it can
more specifically guide your data security endeavors.
(As a side note, you might notice I use “data security” and “information-centric security”
interchangeably. I think infocentric is more accurate, but data security is more recognized, so
that’s what I tend to use.)
If you are familiar with the previous model you will immediately notice that this one is much
more complex. We hope it’s also much more useful. The old model really only listed controls
for data in different phases of the lifecycle – and didn’t account for location, ownership,
access methods, and other factors. This update should better reflect the more complex
environments and use cases we tend to see these days.
Due to its complexity, we need to break the new Lifecycle into a series of posts. In this first
post we will revisit the basic lifecycle, and in the next post we will add locations and access.
The lifecycle includes six phases from creation to destruction. Although we show it as a
linear progression, once created, data can bounce between phases without restriction, and
may not pass through all stages (for example, not all data is eventually destroyed).
These high-level activities describe the major phases of a datum’s life, and in a future post we
will cover security controls for each phase. But before we discuss controls we need to
incorporate two additional aspects: locations and access devices.
But as we mentioned, quite a bit has changed since then, exemplified by the emergence and
adoption of cloud computing and increased mobility. Although the Lifecycle itself still
applies to basic, traditional infrastructure, we will focus on these more complex use cases,
which better reflect what most of you are dealing with on a day to day basis.
Locations
One gap in the original Lifecycle was that it failed to adequately address movement of data
between repositories, environments, and organizations. A large amount of enterprise data
now transitions between a variety of storage locations, applications, and operating
environments. Even data created in a locked-down application may find itself backed up
someplace else, replicated to alternative standby environments, or exported for processing by
other applications. And all of this can happen at any phase of the Lifecycle.
We can illustrate this by thinking of the Lifecycle not as a single, linear operation, but as a
series of smaller lifecycles running in different operating environments. At nearly any phase
data can move into, out of, and between these environments – the key for data security is
identifying these movements and applying the right controls at the right security boundaries.
As with cloud deployment models, these locations may be internal, external, public, private,
hybrid, and so on. Some may be cloud providers, other traditional outsourcers, or perhaps
multiple locations within a single data center.
For data security, at this point there are four things to understand:
Access
Now that we know where our data lives and how it moves, we need to know who is accessing
it and how. There are two factors here:
Data today is accessed from all sorts of different devices. The days of employees only
accessing data through restrictive applications on locked-down desktops are quickly coming
to an end (with a few exceptions). These devices have different security characteristics and
may use different applications, especially with applications we’ve moved to SaaS providers –
who often build custom applications for mobile devices, which offer different functionality
than PCs.
Later in the model we will deal with who, but the diagram below shows how complex this
can be – with a variety of data locations (and application environments), each with its own
data lifecycle, all accessed by a variety of devices in different locations. Some data lives
entirely within a single location, while other data moves in and out of various locations… and
sometimes directly between external providers.
This completes our “topographic map” of the Lifecycle. In our next post we will dig into
mapping data flow and controls. In the next few posts we will finish covering background
material, and then show you how to use this to pragmatically evaluate and design security
controls.
Functions
There are three things we can do with a given datum:
Access: View/access the data, including copying, file transfers, and other exchanges of
information.
Process: Perform a transaction on the data: update it, use it in a business processing
transaction, etc.
Store: Store the data (in a file, database, etc.).
The table below shows which functions map to which phases of the lifecycle:
Controls
Essentially, a control is what we use to restrict a list of possible actions down to allowed
actions. For example, encryption can be used to restrict access to data, application controls to
restrict processing via authorization, and DRM storage to prevent unauthorized
copies/accesses.
To determine the necessary controls; we first list out all possible functions, locations, and
actors; and then which ones to allow. We then determine what controls we need to make that
happen (technical or process). Controls can be either preventative or detective (monitoring),
but keep in mind that monitoring controls that don’t tie back into some sort of alerting or
analysis merely provide an audit log, not a functional control.
Here you would list a function, the actor, and the location, and then check whether it is
allowed or not. Any time you have a ‘no’ in the allowed box, you would implement and
document a control.
Tying It together
In essence what we’ve produced is a high-level version of a data flow diagram (albeit not
using standard programming taxonomy). We start by mapping the possible data flows,
including devices and different physical and virtual locations, and at which phases in its
lifecycle data can move between those locations. Then, for each phase of the lifecycle in a
location, we determine which functions, people/systems, and more-granular locations for
working with the data are possible. We then figure out which we want to restrict, and what
controls we need to enforce those restrictions.
This looks complex, but keep in mind that you aren’t likely to do it for all data within an
entire organization. For given data in a given application/implementation you’ll be working
with a much more restrictive subset of possibilities. This clearly becomes more involved with
bigger applications, but practically speaking you need to know where data flows, what’s
possible, and what should be allowed, to design your security.
In a future post we’ll show you an example, and down the road we also plan to produce a
controls matrix which will show you where the different data security controls fit in.
https://www.securosis.com/blog/data-security-lifecycle-2.0
Data Lifecycle
Terminology
Acronyms
Acronym
Definition
DLP
Data Loss Prevention
DRM
Digital Rights Management
IRM
Information Rights Management
Overview
Being able to destroy data, or render it inaccessible, in the cloud is critical to ensuring
confidentiality and managing a secure lifecycle for data.
Map the different lifecycle phases.
Integrate the different data locations and access types.
The create phase is an ideal time to implement technologies such as SSL/TLS with the data
that is inputted or imported. It should be done in the create phase so that the data is protected
initially before any further phases.
Data Created Remotely
Data should be encrypted.
Connections should be secured (VPN).
Store
Usually meant to refer to near-term storage (as opposed to long-term storage). Occurs almost
concurrently with the Create phase.
As soon as data enters the store phase, it's important to immediately employ:
The use of backup methods on top of security controls to prevent data loss.
Additional encryption for data at rest.
DLP and IRM technologies are used to ensure that data security is enforced during the
Use and Share phases of the cloud data lifecycle. They may be implemented during
the Store phase, but do not enforce data security because data is not accessed during
this phase.
While security controls are implemented in the create phase in the form of SSL/TLS, this
only protects data in transit and not data at rest. The store phase is the first phase in
which security controls are implemented to protect data at rest.
Use
Data is vulnerable in this state since it must be unencrypted.
User Side
Connections should be secured (VPN).
The platforms with which users connect to the cloud should be secured.
Permissions for modifying and processing should be implemented.
Provider Side
Strong protections in the implementation of virtualization.
Due to the nature of data being actively used, viewed, and processed in the use phase, it is
more likely to be leaked in this phase than in others.
Share
IRM/DRM. Can control who can share and what they can share.
DLP. Can identify and prevent unauthorized sharing.
VPNs/encryption. For confidentiality.
Restrictions based on jurisdiction. Export or import controls, such as ITAR, EAR, or
Wassenaar.
Archive
Data should be encrypted.
Key management is of utmost importance.
Physical security.
Location (environmental, jurisdictional, geographical)
Format (medium, portability, weaknesses, age)
Retention policies
Retention period
Applicable regulations
Retention formats
Data classification
Archiving and retrieval procedures
o Monitoring, maintenance, and enforcement
Many cloud providers will offer archiving services as a feature of the basic cloud service;
realistically, most providers are already performing this function to avoid inadvertent loss of
customer data. Because the customer is ultimately responsible for the data, the customer may
elect to use another, or an additional, archive method. The contract will stipulate specific
terms, such as archive size, duration, and so on and will determine who is responsible for
performing archiving activities in a managed cloud environment.
Destroy
Cryptoshredding (cryptographic erasure)
https://ccsp.alukos.com/concepts/data/data-lifecycle
So, how can knowing your data lifecycle help improve your data security in 2022?
Now what happens if you add permission management (defining who can access specific data
to prevent malicious insider attacks) into the mix? Is your data lifecycle still robust across all
stages? How about Bring Your Own Device (BYOD)? Does it have an impact? How do you
protect company data outside of corporate-owned machines?
Let’s break down each lifecycle step a little more in an attempt to aid future brainstorming on
your process:
Data Creation
Data is created in many ways, whether by manual entry, acquired from third parties or
captured from devices such as sensors or other connected devices. It goes far beyond
traditional file creation. In a production environment, data is created in a database during
functional testing, for example. Website forms collect data. And VoIP solutions also create
data.
Consider where all your data comes from, whether from audio, video, or documents. Is it
structured or unstructured? Is it on multiple devices? In an e-discovery situation, for example,
even social media or vehicle data are possible targets under disclosure. All data, including
any generated by a connected device or cloud service, requires protection (with permission
management/access control where possible) as soon as it’s created, just to be safe.
Data Storage
It seems obvious, but no matter what storage method you use (tape drives, SSD or NAS),
securing that storage is a must. Backups prevent data loss, and you’ll want to ensure your
data restoration process works before relying on it. It’s alsoo helpful to regularly verify
backup integrity.
Most jurisdictions hold companies responsible for protecting their data from accidental loss.
Blaming hardware failures, or even natural disasters like flooding, is not an excuse – an
offsite solution is a requirement. Most security pros recommend at least three backups, with
one or more offsite.
Data Usage
Data usage includes viewing, processing, modifying and saving processes. This includes big
data (making sure to anonymize data where necessary for data privacy compliance). Now,
creating anonymous data does not stop at removing a person’s name, address and phone
number. It includes any combination of data entries that can specifically identify a person.
The fact that Citizen X is a music teacher from Nashville, drives a Camaro and is fond of pan
pipe renditions of “A boy named Sue” can be enough to pinpoint a real identity.
Another consideration is data collaboration, or data sharing, for all methods used. Given the
myriad of ways we share data (email, VoIP, cloud storage and many more), this is a pain
point for many companies, especially when trying to prevent insider threats.
Data Archiving
Most organizations use archives to store older and seldom-used data. They are secure but
available for use on demand. Again, regardless of storage method, backups a must and access
control procedures apply.
Data Destruction
A key element of the data lifecycle. When data is destroyed will depend on jurisdiction and
governing legislation. For example, some jurisdictions require companies to keep accounting
data for five years. Due to software licensing restrictions (software licenses do not transfer to
new owners in most cases) and a wide variety of available data recovery software solutions,
companies do not donate their computers anymore. They can repurpose older hardware by
using it as a print server, or NAS, or more typically arrange secure disposal of hard drives via
degaussing or incineration. Professional data recovery can recover fire or water damaged
drives, so this is a safer approach and protects company data when decommissioning
hardware.
No two companies have identical processes, since your data lifecycle will complement
operational processes for your unique situation. But understanding your data lifecycle,
and all of its complexities, is key to maximizing your cybersecurity efforts. By
identifying all potential risks, and reducing them, you can increase your data security.
Is the effort involved worth it? Most would say yes.
Database security is a complex and challenging endeavor that involves all aspects of
information security technologies and practices. It’s also naturally at odds with database
usability. The more accessible and usable the database, the more vulnerable it is to security
threats; the more invulnerable the database is to threats, the more difficult it is to access and
use. (This paradox is sometimes referred to as Anderson’s Rule. (link resides outside IBM)
Why is it important
By definition, a data breach is a failure to maintain the confidentiality of data in a database.
How much harm a data breach inflicts on your enterprise depends on a number of
consequences or factors:
Insider threats
An insider threat is a security threat from any one of three sources with privileged access to
the database:
Human error
Accidents, weak passwords, password sharing, and other unwise or uninformed user
behaviors continue to be the cause of nearly half (49%) of all reported data breaches.
Hackers make their living by finding and targeting vulnerabilities in all kinds of software,
including database management software. All major commercial database software vendors
and open source database management platforms issue regular security patches to address
these vulnerabilities, but failure to apply these patches in a timely fashion can increase your
exposure.
A database-specific threat, these involve the insertion of arbitrary SQL or non-SQL attack
strings into database queries served by web applications or HTTP headers. Organizations that
don’t follow secure web application coding practices and perform regular vulnerability
testing are open to these attacks.
Buffer overflow occurs when a process attempts to write more data to a fixed-length block of
memory than it is allowed to hold. Attackers may use the excess data, stored in adjacent
memory addresses, as a foundation from which to launch attacks.
In a denial of service (DoS) attack, the attacker deluges the target server—in this case the
database server—with so many requests that the server can no longer fulfill legitimate
requests from actual users, and, in many cases, the server becomes unstable or crashes.
In a distributed denial of service attack (DDoS), the deluge comes from multiple servers,
making it more difficult to stop the attack. See our video “What is a DDoS Attack”(3:51) for
more information:
Malware
Attacks on backups
Organizations that fail to protect backup data with the same stringent controls used to protect
the database itself can be vulnerable to attacks on backups.
These threats are exacerbated by the following:
Growing data volumes: Data capture, storage, and processing continues to grow
exponentially across nearly all organizations. Any data security tools or practices need to be
highly scalable to meet near and distant future needs.
Infrastructure sprawl: Network environments are becoming increasingly complex,
particularly as businesses move workloads to multicloud or hybrid cloud architectures,
making the choice, deployment, and management of security solutions ever more
challenging.
Increasingly stringent regulatory requirements: The worldwide regulatory compliance
landscape continues to grow in complexity, making adhering to all mandates more difficult.
Cybersecurity skills shortage: Experts predict there may be as many as 8 million unfilled
cybersecurity positions by 2022.
Best practices
Because databases are nearly always network-accessible, any security threat to any
component within or portion of the network infrastructure is also a threat to the database, and
any attack impacting a user’s device or workstation can threaten the database. Thus, database
security must extend far beyond the confines of the database alone.
When evaluating database security in your environment to decide on your team’s top
priorities, consider each of the following areas:
Physical security: Whether your database server is on-premise or in a cloud data center, it
must be located within a secure, climate-controlled environment. (If your database server is
in a cloud data center, your cloud provider will take care of this for you.)
Administrative and network access controls: The practical minimum number of users
should have access to the database, and their permissions should be restricted to the
minimum levels necessary for them to do their jobs. Likewise, network access should be
limited to the minimum level of permissions necessary.
End user account/device security: Always be aware of who is accessing the database and
when and how the data is being used. Data monitoring solutions can alert you if data
activities are unusual or appear risky. All user devices connecting to the network housing the
database should be physically secure (in the hands of the right user only) and subject to
security controls at all times.
Encryption: ALL data—including data in the database, and credential data—should be
protected with best-in-class encryption while at rest and in transit. All encryption keys
should be handled in accordance with best-practice guidelines.
Database software security: Always use the latest version of your database management
software, and apply all patches as soon as they are issued.
Application/web server security: Any application or web server that interacts with the
database can be a channel for attack and should be subject to ongoing security testing and
best practice management.
Backup security: All backups, copies, or images of the database must be subject to the same
(or equally stringent) security controls as the database itself.
Auditing: Record all logins to the database server and operating system, and log all
operations performed on sensitive data as well. Database security standard audits should be
performed regularly.
Controls and policies
In addition to implementing layered security controls across your entire network
environment, database security requires you to establish the correct controls and policies for
access to the database itself. These include:
Database security policies should be integrated with and support your overall business goals,
such as protection of critical intellectual property and your cybersecurity policies and cloud
security policies. Ensure you have designated responsibility for maintaining and auditing
security controls within your organization and that your policies complement those of your
cloud provider in shared responsibility agreements. Security controls, security awareness
training and education programs, and penetration testing and vulnerability assessment
strategies should all be established in support of your formal security policies.
Discovery: Look for a tool that can scan for and classify vulnerabilities across all your
databases—whether they’re hosted in the cloud or on-premise—and offer
recommendations for remediating any vulnerabilities identified. Discovery capabilities are
often required to conform to regulatory compliance mandates.
Data activity monitoring: The solution should be able to monitor and audit all data activities
across all databases, regardless of whether your deployment is on-premise, in the cloud, or
in a container. It should alert you to suspicious activities in real-time so that you can respond
to threats more quickly. You’ll also want a solution that can enforce rules, policies, and
separation of duties and that offers visibility into the status of your data through a
comprehensive and unified user interface. Make sure that any solution you choose can
generate the reports you’ll need to meet compliance requirements.
Encryption and tokenization capabilities: In case of a breach, encryption offers a final line of
defense against compromise. Any tool you choose should include flexible encryption
capabilities that can safeguard data in on-premise, cloud, hybrid, or multicloud
environments. Look for a tool with file, volume, and application encryption capabilities that
conform to your industry’s compliance requirements, which may demand tokenization (data
masking) or advanced security key management capabilities.
Data security optimization and risk analysis: A tool that can generate contextual insights by
combining data security information with advanced analytics will enable you to accomplish
optimization, risk analysis, and reporting with ease. Choose a solution that can retain and
synthesize large quantities of historical and recent data about the status and security of your
databases, and look for one that offers data exploration, auditing, and reporting capabilities
through a comprehensive but user-friendly self-service dashboard.
Database security and IBM Cloud
IBM-managed cloud databases feature native security capabilities powered by IBM Cloud
Security, including built-in identity and access management, visibility, intelligence, and data
protection capabilities. With an IBM-managed cloud database, you can rest easy knowing
that your database is hosted in an inherently secure environment, and your administrative
burden will be much smaller.
IBM also offers the IBM Security Guardium smarter data protection platform, which
incorporates data discovery, monitoring, encryption and tokenization, and security
optimization and risk analysis capabilities for all your databases, data warehouses, file shares,
and big data platforms, whether they’re hosted on-premise, in the cloud, or in hybrid
environments.
In addition, IBM offers managed Data Security Services for Cloud, which includes data
discovery and classification, data activity monitoring, and encryption and key management
capabilities to protect your data against internal and external threats through a streamlined
risk mitigation approach.
You can get started by signing up for an IBM Cloud account today.
https://www.ibm.com/cloud/learn/database-security
Privileged accounts are those with special permissions on a system, application, database or
any other asset that can be used to perform any administration activity (E.g. changing the
configuration), or have full access to the data. Failing to manage and monitor the usage of the
privilege accounts in a corporate environment or an organization could have serious
consequences.
Once hackers or malicious actors find a way to get into a system or a network, they will be
looking to compromise a privilege account to get access to those systems and information
that they are not authorized. Privilege Account Management is an important topic in Cyber
Security and a requirements for a lot of regulatory and compliance frameworks.
https://backendless.com/database-security-best-practices/
https://phdtopic.com/phd-guidance-in-digital-forensics/
Mission Critical Database is a database refers to any databases that performs a critical function for a
company’s operations like identity management, online banking systems, railway/aircraft operating
and control systems, electric power systems. In the event of a databases breaches, a business could
suffer financial losses, bankruptcy, legal troubles, and even closure.