CCV Notes Unit 3
CCV Notes Unit 3
Cloud storage is a cloud computing model that enables storing data and files on the internet
through a cloud computing provider that you access either through the public internet or a
dedicated private network connection. The provider securely stores, manages, and maintains
the storage servers, infrastructure, and network to ensure you have access to the data when
you need it at virtually unlimited scale, and with elastic capacity. Cloud storage removes the
need to buy and manage your own data storage infrastructure, giving you agility, scalability,
and durability, with anytime, anywhere data access.
Cloud storage delivers cost-effective, scalable storage. You no longer need to worry about
running out of capacity, maintaining storage area networks (SANs), replacing failed devices,
adding infrastructure to scale up with demand, or operating underutilized hardware when
demand decreases. Cloud storage is elastic, meaning you scale up and down with demand
and pay only for what you use. It is a way for organizations to save data securely online so
that it can be accessed anytime from any location by those with permission.
Whether you are a small business or a large enterprise, cloud storage can deliver the agility,
cost savings, security, and simplicity to focus on your core business growth. For small
businesses, you no longer must worry about devoting valuable resources to manage storage
yourself, and cloud storage gives you the ability to scale as the business grows.
Cost effectiveness
With cloud storage, there is no hardware to purchase, no storage to provision, and no extra
capital being used for business spikes. You can add or remove storage capacity on demand,
quickly change performance and retention characteristics, and only pay for storage that you
use. As data becomes infrequently and rarely accessed, you can even automatically move it
to lower-cost storage, thus creating even more cost savings. By moving storage workloads
from on premises to the cloud, you can reduce total cost of ownership by removing
overprovisioning and the cost of maintaining storage infrastructure.
Increased agility
With cloud storage, resources are only a click away. You reduce the time to make those
resources available to your organization from weeks to just minutes. This results in a dramatic
increase in agility for your organization. Your staff is largely freed from the tasks of
procurement, installation, administration, and maintenance. And because cloud storage
integrates with a wide range of analytics tools, your staff can now extract more insights from
your data to fuel innovation.
Faster deployment
When development teams are ready to begin, infrastructure should never slow them down.
Cloud storage services allow IT to quickly deliver the exact amount of storage needed,
whenever and wherever it's needed. Your developers can focus on solving complex
application problems instead of having to manage storage systems.
By using cloud storage lifecycle management policies, you can perform powerful information
management tasks including automated tiering or locking down data in support of compliance
requirements. You can also use cloud storage to create multi-region or global storage for your
distributed teams by using tools such as replication. You can organize and manage your data
in ways that support specific use cases, create cost efficiencies, enforce security, and meet
compliance requirements.
Cloud storage delivers virtually unlimited storage capacity, allowing you to scale up as much
and as quickly as you need. This removes the constraints of on-premises storage capacity. You
can efficiently scale cloud storage up and down as required for analytics, data lakes, backups,
or cloud native applications. Users can access storage from anywhere, at any time, without
worrying about complex storage allocation processes, or waiting for new hardware.
Business continuity
Cloud storage providers store your data in highly secure data centers, protecting your data
and ensuring business continuity. Cloud storage services are designed to handle concurrent
device failure by quickly detecting and repairing any lost redundancy. You can further protect
your data by using versioning and replication tools to recover from both unintended user
actions or application failures more easily.
Cloud storage is delivered by a cloud services provider that owns and operates data storage
capacity by maintaining large datacentres in multiple locations around the world. Cloud
storage providers manage capacity, security, and durability to make data accessible to your
applications over the internet in a pay-as-you-go model. Typically, you connect to the storage
cloud either through the internet or through a dedicated private connection, using a web
portal, website, or a mobile app. When customers purchase cloud storage from a service
provider, they turn over most aspects of the data storage to the vendor, including capacity,
security, data availability, storage servers and computing resources, and network data
delivery. Your applications access cloud storage through traditional storage protocols or
directly using an application programming interface (API). The cloud storage provider might
also offer services designed to help collect, manage, secure, and analyze data at a massive
scale.
There are three main cloud storage types: object storage, file storage, and block storage.
Each offers its own advantages and has its own use cases.
Object storage
Organizations must store a massive and growing amount of unstructured data, such as
photos, videos, machine learning (ML), sensor data, audio files, and other types of web
content, and finding scalable, efficient, and affordable ways to store them can be a challenge.
Object storage is a data storage architecture for large stores of unstructured data. Objects
store data in the format it arrives in and makes it possible to customize metadata in ways that
make the data easier to access and analyze. Instead of being organized in files or folder
hierarchies, objects are kept in secure buckets that deliver virtually unlimited scalability. It is
also less costly to store large data volumes.
File storage
File-based storage or file storage is widely used among applications and stores data in a
hierarchical folder and file format. This type of storage is often known as a network-attached
storage (NAS) server with common file level protocols of Server Message Block (SMB) used in
Windows instances and Network File System (NFS) found in Linux.
Block storage
Enterprise applications like databases or enterprise resource planning (ERP) systems often
require dedicated, low-latency storage for each host. This is analogous to direct-attached
storage (DAS) or a storage area network (SAN). In this case, you can use a cloud storage service
that stores data in the form of blocks. Each block has its own unique identifier for quick
storage and retrieval.
Cloud storage simplifies and enhances traditional data center practices around data durability
and availability. With cloud storage, data is redundantly stored on multiple devices across one
or more data centers.
Security
With cloud storage, you control where your data is stored, who can access it, and what
resources your organization is consuming at any given moment. Ideally, all data is encrypted,
both at rest and in transit. Permissions and access controls should work just as well in the
cloud as they do for on-premises storage.
What are cloud storage use cases?
Cloud storage has several use cases in application management, data management, and
business continuity. Let’s consider some examples below.
Traditional on-premises storage solutions can be inconsistent in their cost, performance, and
scalability — especially over time. Analytics demand large-scale, affordable, highly available,
and secure storage pools that are commonly referred to as data lakes.
Data lakes built on object storage keep information in its native form and include rich
metadata that allows selective extraction and use for analysis. Cloud-based data lakes can sit
at the center of multiple kinds of data warehousing and processing, as well as big data and
analytical engines, to help you accomplish your next project in less time and with more
targeted relevance.
Backup and disaster recovery are critical for data protection and accessibility, but keeping up
with increasing capacity requirements can be a constant challenge. Cloud storage brings low
cost, high durability, and extreme scale to data backup and recovery solutions. Embedded
data management policies can automatically migrate data to lower-cost storage based on
frequency or timing settings, and archival vaults can be created to help comply with legal or
regulatory requirements. These benefits allow for tremendous scale possibilities within
industries such as financial services, healthcare and life sciences, and media and
entertainment that produce high volumes of unstructured data with long-term retention
needs.
Software test and development environments often require separate, independent, and
duplicate storage environments to be built out, managed, and decommissioned. In addition
to the time required, the up-front capital costs required can be extensive.
Many of the largest and most valuable companies in the world create applications in record
time by using the flexibility, performance, and low cost of cloud storage. Even the simplest
static websites can be improved at low cost. IT professionals and developers are turning to
pay-as-you-go storage options that remove management and scale headaches.
The availability, durability, and low cloud storage costs can be very compelling. On the other
hand, IT personnel working with storage, backup, networking, security, and compliance
administrators might have concerns about the realities of transferring large amounts of data
to the cloud. For some, getting data into the cloud can be a challenge. Hybrid, edge, and data
movement services meet you where you are in the physical world to help ease your data
transfer to the cloud.
Compliance
Storing sensitive data in the cloud can raise concerns about regulation and compliance,
especially if this data is currently stored in compliant storage systems. Cloud data compliance
controls are designed to ensure that you can deploy and enforce comprehensive compliance
controls on your data, helping you satisfy compliance requirements for virtually every
regulatory agency around the globe. Often through a shared responsibility model, cloud
vendors allow customers to manage risk effectively and efficiently in the IT environment, and
provide assurance of effective risk management through compliance with established, widely
recognized frameworks and programs.
Archive
Enterprises today face significant challenges with exponential data growth. Machine learning
(ML) and analytics give data more uses than ever before. Regulatory compliance requires long
retention periods. Customers need to replace on-premises tape and disk archive
infrastructure with solutions that provide enhanced data durability, immediate retrieval
times, better security and compliance, and greater data accessibility for advanced analytics
and business intelligence.
Many organizations want to take advantage of the benefits of cloud storage, but have
applications running on premises that require low-latency access to their data, or need rapid
data transfer to the cloud. Hybrid cloud storage architectures connect your on-premises
applications and systems to cloud storage to help you reduce costs, minimize management
burden, and innovate with your data.
Database storage
Because block storage has high performance and is readily updatable, many organizations use
it for transactional databases. With its limited metadata, block storage is able to deliver the
ultra-low latency required for high-performance workloads and latency sensitive applications
like databases.
Block storage allows developers to set up a robust, scalable, and highly efficient transactional
database. As each block is a self-contained unit, the database performs optimally, even when
the stored data grows.
ML and IoT
With cloud storage, you can process, store, and analyze data close to your applications and
then copy data to the cloud for further analysis. With cloud storage, you can store data
efficiently and cost-effectively while supporting ML, artificial intelligence (AI), and advanced
analytics to gain insights and innovate for your business.
Security is our number one priority at AWS. AWS pioneered cloud computing in 2006, creating
cloud infrastructure that allows you to securely build and innovate faster. With AWS, you
control where your data is stored, who can access it, and what resources your organization is
consuming at any given moment. Fine-grain identity and access controls combined with
continual monitoring for near real-time security information ensures that the right resources
have the right access, wherever your information is stored. On AWS, you will gain the control
and confidence you need to securely run your business with the most flexible and secure
cloud computing environment available. As a result, the most highly regulated organizations
in the world trust AWS, every day.
Applications:
NFS
NFS stands for Networ File System. It is a client-server architecture that allows a computer
user to view, store, and update files remotely. The protocol of NFS is one of the several
distributed file system standards for Networ -Attached Storage (NAS).
CIFS
CIFS stands for Common Internet File System. CIFS is an accent of SMB. That is, CIFS is an
application of SIMB protocol, designed by Microsoft.
SMB
SMB stands for Server Message Bloc . It is a protocol for sharing a file and was invented by
IMB. The SMB protocol was created to allow computers to perform read and write operations
on files to a remote host over a Local Area Networ (LAN). The directories present in the
remote host can be accessed via SMB and are called as “shares”.
Hadoop
Hadoop is a group of open-source software services. It gives a software framewor for
distributed storage and operating of big data using the MapReduce programming model. The
core of Hadoop contains a storage part, nown as Hadoop Distributed File System (HDFS), and
an operating part which is a MapReduce programming model.
NetWare
NetWare is an abandon computer networ operating system developed by Novell, Inc. It
primarily used combined multitas ing to run different services on a personal computer, using
the IPX networ protocol.
Working of DFS:
There are two ways in which DFS can be implemented:
Standalone DFS namespace
It allows only for those DFS roots that exist on the local computer and are not using Active
Directory. A Standalone DFS can only be acquired on those computers on which it is created.
It does not provide any fault liberation and cannot be lin ed to any other DFS. Standalone DFS
roots are rarely come across because of their limited advantage.
Domain-based DFS namespace
It stores the configuration of DFS in Active Directory, creating the DFS namespace root
accessible at \\<domainname>\<dfsroot> or \\<FQDN>\<dfsroot>
Advantages:
DFS allows multiple user to access or store the data.
It allows the data to be share remotely.
It improved the availability of file, access time, and networ efficiency.
Improved the capacity to change the size of the data and improves the ability to exchange the
data.
Distributed File System provides transparency of data even if server or dis fails.
Disadvantages:
In Distributed File System nodes and connections needs to be secured therefore we can say
that security is at sta e.
There is a possibility of loss of messages and data in the networ while movement from one
node to another.
Database connection in case of Distributed File System is complicated.
Also handling of the database is not easy in Distributed File System as compared to a single
user system.
There are chances that overloading will ta e place if all nodes try to send data at once.
Introduction to Hadoop Distributed File System (HDFS)
With growing data velocity the data size easily outgrows the storage limit of a machine. A
solution would be to store the data across a networ of machines. Such filesystems are called
distributed filesystems. Since data is stored across a networ all the complications of a
networ come in.
This is where Hadoop comes in. It provides one of the most reliable filesystems. HDFS (Hadoop
Distributed File System) is a unique design that provides storage for extremely large files with
streaming data access pattern and it runs on commodity hardware. Let’s elaborate the terms:
Extremely large files: Here we are tal ing about the data in range of petabytes (1000 TB).
Streaming Data Access Pattern: HDFS is designed on principle of write-once and read-many-
times. Once data is written large portions of dataset can be processed any number times.
Commodity hardware: Hardware that is inexpensive and easily available in the mar et. This
is one of feature which specially distinguishes HDFS from other file system.
Nodes: Master-slave nodes typically forms the HDFS cluster.
NameNode(MasterNode):
Manages all the slave nodes and assign wor to them.
It executes filesystem namespace operations li e opening, closing, renaming files and
directories.
It should be deployed on reliable hardware which has the high config. not on commodity
hardware.
DataNode(SlaveNode):
Actual wor er nodes, who do the actual wor li e reading, writing, processing etc.
They also perform creation, deletion, and replication upon instruction from the master.
They can be deployed on commodity hardware.
HDFS daemons: Daemons are the processes running in bac ground.
Namenodes:
Run on the master node.
Store metadata (data about data) li e file path, the number of bloc s, bloc Ids. etc.
Require high amount of RAM.
Store meta-data in RAM for fast retrieval i.e to reduce see time. Though a persistent copy of
it is ept on dis .
DataNodes:
Run on slave nodes.
Require high memory as data is actually stored here.
Data storage in HDFS: Now let’s see how the data is stored in a distributed manner.
Lets assume that 100TB file is inserted, then masternode(namenode) will first divide the file
into bloc s of 10TB (default size is 128 MB in Hadoop 2.x and above). Then these bloc s are
stored across different datanodes(slavenode). Datanodes(slavenode)replicate the bloc s
among themselves and the information of what bloc s they contain is sent to the master.
Default replication factor is 3 means for each bloc 3 replicas are created (including itself). In
hdfs.site.xml we can increase or decrease the replication factor i.e we can edit its
configuration here.
Note: MasterNode has the record of everything, it nows the location and info of each and
every single data nodes and the bloc s they contain, i.e. nothing is done without the
permission of masternode.
Why divide the file into blocks?
Let’s assume that we don’t divide, now it’s very difficult to store a 100 TB file on a single
machine. Even if we store, then each read and write operation on that whole file is going to
ta e very high see time. But if we have multiple bloc s of size 128MB then its become easy
to perform various read and write operations on it compared to doing it on a whole file at
once. So we divide the file to have faster data access i.e. reduce see time.
Why replicate the blocks in data nodes while storing?
Let’s assume we don’t replicate and only one yellow bloc is present on datanode D1. Now if
the data node D1 crashes we will lose the bloc and which will ma e the overall data
inconsistent and faulty. So we replicate the bloc s to achieve fault-tolerance.
Features of HDFS:
Distributed data storage.
Bloc s reduce see time.
The data is highly available as the same bloc is present at multiple data nodes.
Even if multiple data nodes are down, we can still do our wor , thus ma ing it highly
reliable.
High fault tolerance.
Limitations of HDFS:
Low latency data access: Applications that require low-latency access to data i.e in the range
of milliseconds will not wor well with HDFS, because HDFS is designed eeping in mind that
we need high-throughput of data even at the cost of latency.
Small file problem: Having lots of small files will result in lots of see s and lots of movement
from one datanode to another datanode to retrieve each small file, this whole process is a
very inefficient data access pattern.
Cephfs
What is Ceph Storage?
Red Hat Ceph is essentially an open-source software that aims to facilitate highly scalable
object, bloc and file-based storage under one comprehensive system. As a powerful storage
solution, Ceph uses its own Ceph file system (CephFS) and is designed to be self-managed and
self-healing. It is equipped to deal with outages on its own and constantly wor s towards
reducing administration costs.
Another highlight of Ceph storage is that it is quite fault-tolerant and becomes so by easily
replicating data. What this means is that there are no bottlenec s as such in the process while
Ceph is operating.
Since its launch, there have been more than 15 Ceph releases, with Red Hat announcing a
major update recently as Red Hat Storage 4, which brings an array of improvements in
monitoring, scalability, management, and security improvements, thus ma ing it easier for
enterprises to get started with Ceph.
Features of Ceph are:
– High scalability
– Open-source
– High reliability through distributed data storage
– Robust data security through redundant storage
– Advantage of continuous memory allocation
– Convenient software-based increase in availability via an integrated algorithm for locating
data
Despite its limited development history, Ceph is free and is an established storage
method.
The application has been extensively and well-documented by the manufacturer.
A great deal of helpful information is available online for Ceph regarding its setup and
maintenance.
The scalability and integrated redundancy of Ceph storage ensure data security and
flexibility within the networ .
CRUSH algorithm of Ceph ensures high availability
Disadvantages
There are many vendors and options available to organizations looking for a cloud database
solution for their enterprise. You will want to select a model that works best for your specific
business needs. The following are some key features to look for from any cloud database:
Performance
Online and independent scaling of compute and storage, patching, and upgrade—
with uninterrupted data availability to applications—will ensure that your
database’s capacity meets your enterprise’s needs as they fluctuate, without
interrupting operations. Automated and online performance optimization, such as
auto-indexing, is a must. You’ll also want scale-out clustering for both read and
write to ensure that your mission-critical, real-time workloads run seamlessly.
Security
Robust security features are paramount. Any database model you select should be
able to perform data encryption at rest and in flight and provide automated
security updates. It’s also essential to ensure a strict separation of duties so
operations cannot access customer data. External attack detection and prevention
driven by machine learning provides an additional layer of real-time security.
Lastly, for your most business-critical applications, you will want a dedicated cloud
infrastructure that includes hardware isolation from other tenants.
Other qualities to look for include lower high-availability costs, and industry-
leading flashback technologies to help provide protection from user errors. Finally,
your database should have broad compatibility with third-party applications.
Migrating a database to the cloud might sound like a daunting task, but it does not have to
be. Advance planning is the key. It is also important to remember that not all migration
methods apply to every scenario.
There are several factors to consider when choosing a migration method—including data
types, host operating systems, and database versioning. Here are a few things to think about
and prepare for as you approach the migration of your databases to the cloud.
Is the target cloud database software compatible with what you are running on-
premises?
Is the version compatible?
Some cloud providers do not offer database services that are compatible with on-
premises versions. Also, if your target cloud database only supports a higher
version of the software you are using, you must plan for an upgrade.
What is the size and scale of your database, and does the target cloud support this
configuration?
Some cloud providers only offer smaller database configurations in terms of storage size
and number of cores. You will want to make sure in advance that your provider has the
capacity to meet your needs.
Do you run adjacent scripts on the database servers themselves? If so, you would
need to contract for infrastructure as a service (IaaS) or automated services—and
these might not be available through your cloud provider.
Do you need to migrate with little or no downtime to your existing
application? Leading cloud database providers, like Amazon, Microsoft, and
Oracle, are making database selection and migration easier than ever. Depending
on the circumstances, migrating to the cloud can take place in a matter of minutes.
Make migrating to a cloud database seamless
Oracle’s automated tools allow you to seamlessly move your on-premises database to Oracle
Cloud with virtually no downtime at all, because Oracle Cloud uses the same standards,
products, and skills you currently use on-premises.
Object Storage
What is object storage?
Object storage is a technology that stores and manages data in an unstructured format called
objects. Modern organizations create and analyze large volumes of unstructured data such as
photos, videos, email, web pages, sensor data, and audio files. Cloud object storage systems
distribute this data across multiple physical devices but allow users to access the content
efficiently from a single, virtual storage repository. Object storage solutions are ideal for
building cloud native applications that require scale and flexibility, and can also be used to
import existing data stores for analytics, backup, or archive.
Metadata is critical to object storage technology. With object storage, objects are kept in a
single bucket and are not files inside of folders. Instead, object storage combines the pieces
of data that make up a file, adds all the user-created metadata to that file, and attaches a
custom identifier. This creates a flat structure, called a bucket, as opposed to hierarchical or
tiered storage. This lets you retrieve and analyze any object in the bucket, no matter the file
type, based on its function and characteristics.
Object storage is the ideal storage for data lakes because it delivers an architecture for large
amounts of data, with each piece of data stored as an object, and the object metadata
provides a unique identifier for easier access. This architecture removes the scaling limitations
of traditional storage, and is why object storage is the storage of the cloud.
The major benefits of object storage are the virtually unlimited scalability and the lower cost
of storing large volumes of data for use cases such as data lakes, cloud native applications,
analytics, log files, and machine learning (ML). Object storage also delivers greater data
durability and resiliency because it stores objects on multiple devices, across multiple
systems, and even across multiple data centers and regions. This allows for virtually unlimited
scale and also improves resilience and availability of the data.
While objects can be stored on premises, object storage is built for the cloud and delivers
virtually unlimited scalability, high durability, and cost-effectiveness. With cloud object
storage, data is readily accessible from anywhere.
Analytics
You can collect and store virtually unlimited data of any type in cloud object storage and
perform big data analytics to gain valuable insights about your operations, customers, and
the market you serve.
Data lake
A data lake uses cloud object storage as its foundation because it has virtually unlimited
scalability and high durability. You can seamlessly and nondisruptively increase storage from
gigabytes to petabytes of content, paying only for what you use. It has scalable performance,
ease-of-use features, native encryption, and access control capabilities.
Data archiving
Cloud object storage is excellent for long-term data retention. You can use it to replace on-
premises tape and disk archive infrastructure with solutions that provide enhanced data
durability, immediate retrieval times, better security and compliance, and greater data
accessibility for advanced analytics and business intelligence. You can also cost-effectively
archive large amounts of rich media content and retain mandated, regulatory data for
extended periods of time.
Rich media
Accelerate applications and reduce the cost of storing rich media files such as videos, digital
images, and music. With object storage you can create cost-effective, globally replicated
architecture to deliver media to distributed users by using storage classes and replication
features.
Backup and recovery
You can configure object storage systems to replicate content so that if a physical device fails,
duplicate object storage devices become available. This ensures that your systems and
applications continue to run without interruption. You can also replicate data across multiple
datacenters and geographical regions.
ML
In machine learning (ML), you “teach” a computer to ma e predictions or inferences. You use
algorithms to train models and then integrate the model into your application to generate
inferences in real time and at scale. Machine learning requires object storage because of the
scale and cost efficiency, as a production model typically learns from millions to billions of
example data items and produces inferences in as little as 20 milliseconds.
File storage
Many applications need shared file access. This has been traditionally served by network-
attached storage (NAS) services. Common file level protocols consist of Server Message Block
(SMB) used with Windows servers and Network File Systems (NFS) found in Linux instances.
File storage is suited for unstructured data, large content repositories, media stores, home
directories and other file-based data.
The primary differences between object and file storage are data structure and scalability.
File storage is organized into hierarchy with directories and folders. File storage also follows
strict file protocols, such as SMB, NFS, or Lustre. Object storage uses a flat structure with
metadata and a unique identifier for each object that makes it easier to find among potentially
billions of other objects.
With these differences in structure, file storage and object storage have different capacity to
scale. Object storage offers near-infinite scaling, to petabytes and billions of objects. Because
of the inherent hierarchy and pathing, file storage hits scaling constraints.
Block storage
Enterprise applications like databases or ERP systems often require dedicated, low-latency
storage for each host. This is analogous to direct-attached storage (DAS) or a storage area
network (SAN). Block-based cloud storage solutions are provisioned with each virtual server
and offer the ultra-low latency required for high-performance workloads.
Comparing object storage and block storage
Object storage is best used for large amounts of unstructured data, especially when durability,
unlimited storage, scalability, and complex metadata management are relevant factors for
overall performance.
Block storage provides low latency and high-performance values in various use cases. Its
features are primarily useful for structured database storage, VM file system volumes, and
high volumes of read and write loads.
How can AWS help with your cloud object storage needs?
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-
leading scalability, data availability, security, and performance. Customers of all sizes and
industries can use Amazon S3 to store and protect any amount of data for a range of use
cases, such as data lakes, websites, mobile applications, backup and restore, archive,
enterprise applications, IoT devices, and big data analytics. Amazon S3 provides management
features so that you can optimize, organize, and configure access to your data to meet your
specific business, organizational, and compliance requirements. The following are some
examples of Amazon S3 benefits.
Amazon S3 was built from the ground up to deliver 99.999999999% (11 9s) of data durability.
With Amazon S3, your objects are redundantly stored on multiple devices across a minimum
of three Availability Zones (AZs) in an Amazon S3 Region. Amazon S3 is designed to sustain
concurrent device failures by quickly detecting and repairing any lost redundancy, and it also
regularly verifies the integrity of your data using checksums.
Amazon S3 protects your data with security, compliance, and audit capabilities. Amazon S3 is
secure by default. Upon creation, only you have access to Amazon S3 buckets that you create,
and you have complete control over who has access to your data. Amazon S3 supports user
authentication to control access to data. You can use access control mechanisms such as
bucket policies to selectively grant permissions to users and groups of users. Additionally, S3
maintains compliance programs, such as PCI DSS, HIPAA/HITECH, FedRAMP, SEC Rule 17 a-4,
EU Data Protection Directive, and FISMA, to help you meet regulatory requirements. AWS
also supports numerous auditing capabilities to monitor access requests to your Amazon S3
resources.
Flexible management
AWS offers the most flexible set of storage management and administration capabilities.
Storage administrators can classify, report, and visualize data usage trends to reduce costs
and improve service levels. Objects can be tagged with unique, customizable metadata so you
can see and control storage consumption, cost, and security separately for each workload.
The S3 Inventory tool delivers scheduled reports about objects and their metadata for
maintenance, compliance, or analytics operations. Amazon S3 can also analyze object access
patterns to build lifecycle policies that automate tiering, deletion, and retention. Finally, since
Amazon S3 works with AWS Lambda, customers can log activities, define alerts, and invoke
workflows, all without managing any additional infrastructure.
Amazon S3 offers a range of storage classes that you can choose from based on data access,
resiliency, and cost requirements of your workloads. Amazon S3 storage classes are purpose-
built to provide the lowest cost storage for different access patterns. You pay only for what
you use. The rate you’re charged depends on the size of your objects, how long you stored
the objects during the month, and your chosen storage class. Find the best Amazon S3 storage
class for your workload.
Efficient analytics
Amazon S3 is the only cloud storage platform that lets customers run sophisticated analytics
on their data without requiring them to extract and move the data to a separate analytics
database. Customers with knowledge of SQL can use Amazon Athena to analyze vast amounts
of unstructured data in Amazon S3 on-demand. With Amazon Redshift Spectrum, customers
can run sophisticated analytics against exabytes of data in Amazon S3 and run queries that
span both the data you have in Amazon S3 and in your Amazon Redshift data warehouses.