Multimedia Security Handbook
Multimedia Security Handbook
Security
Handbook
Editors-in-Chief and Authors
Borko Furht
Darko Kirovski
CRC PR E S S
Boca Raton London New York Washington, D.C.
2004055105
This book contains information obtained from authentic and highly regarded sources. Reprinted
material is quoted with permission, and sources are indicated. A wide variety of references are
listed. Reasonable efforts have been made to publish reliable data and information, but the author
and the publisher cannot assume responsibility for the validity of all materials or for the
consequences of their use.
Neither this book nor any part may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, microfilming, and recording, or by any
information storage or retrieval system, without prior permission in writing from the publisher.
All rights reserved. Authorization to photocopy items for internal or personal use, or the personal
or internal use of specific clients, may be granted by CRC Press, provided that $1.50 per page
photocopied is paid directly to Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA
01923 USA. The fee code for users of the Transactional Reporting Services is ISBN 0-8493-2773-3/
05/$0.00 $1.50. The fee is subject to change without notice. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.
The consent of CRC Press does not extend to copying for general distribution, for promotion, for
creating new works, or for resale. Specific permission must be obtained in writing from CRC Press
for such copying.
Direct all inquiries to CRC Press, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks,
and are used only for identification and explanation, without intent to infringe.
This new book series presents the latest research and technological
developments in the field of Internet and multimedia systems and applications.
We remain committed to publishing high-quality reference and technical
books written by experts in the field.
If you are interested in writing, editing, or contributing to a volume in
this series, or if you have suggestions for needed books, please contact
Dr. Borko Furht at the following address:
Preface
Recent advances in digital communications and storage technologies
have brought major changes for consumers. High-capacity hard disks
and DVDs can store a large amount of audiovisual data. In addition, faster
Internet connection speeds and the emerging high-bit-rate DSL connections provide sufficient bandwidth for entertainment networks. These
improvements in computers and communication networks are radically
changing the economics of intellectual property reproduction and distribution. Intellectual property owners must exploit new ways of reproducing, distributing, and marketing their intellectual property. However,
a major problem with current digital distribution and storage technologies is the great threat of piracy.
The purpose of the Multimedia Security Handbook is to provide a
comprehensive reference on advanced topics in this field. The handbook
is intended both for researchers and practitioners in the field, and for
scientists and engineers involved in designing and developing systems
for the protection of digital multimedia content. The handbook can also
be used as the textbook for graduate courses in the area of multimedia
security.
The handbook addresses a variety of issues related to the protection of digital multimedia content, including audio, image, and video
protection. The state-of-the art multimedia security technologies are
presented, including protection architectures, multimedia encryption,
watermarking, fingerprinting and authentication techniques, and various
applications.
This handbook is comprised of 26 chapters divided into 6 parts. Part I,
General Issues, introduces fundamental concepts applied in the protection of multimedia content and discusses the vulnerability of various
protection schemes. Part II, Multimedia Encryption, includes chapters on
audio, image, and video encryption techniques. These techniques deal
with selective video encryption, which meet real-time requirements,
chaos-based encryption, and techniques for protection of streaming
media. Part III, Multimedia Watermarking, consists of chapters dealing
Editors-in-Chief
and Authors
Darko Kirovski received his Ph.D. degree in computer science from the
University of California, Los Angeles, in 2001. Since April 2000, he has
been a researcher at Microsoft Research. His research interests include
certificates of authenticity, system security, multimedia processing,
biometric identity authentication, and embedded system design and
debugging. He has received the 1999 Microsoft Graduate Research
Fellowship, the 2000 ACM/IEEE Design Automation Conference Graduate
Scholarship, the 2001 ACM Outstanding Ph.D. Dissertation Award in
Electronic Design Automation, and the Best Paper Award at the ACM
Multimedia 2002.
List of Contributors
Contents
Preface
Editors-in-Chief and Authors
List of Contributors
PART I GENERAL ISSUES
1.
2.
4.
5.
6.
8.
9.
10.
11.
12.
13.
14.
15.
16.
18.
19.
20.
21.
PART V APPLICATIONS
22.
23.
24.
25.
26.
Part I
General Issues
Protection of
Multimedia
Content in
Distribution
Networks
Ahmet M. Eskicioglu and Edward J. Delp
INTRODUCTION
In recent years, advances in digital technologies have created significant
changes in the way we reproduce, distribute, and market intellectual
property (IP). Digital media can now be exploited by IP owners to develop
new and innovative business models for their products and services. The
lowered cost of reproduction, storage, and distribution, however, also
invites much motivation for large-scale commercial infringement. In a
world where piracy is a growing potential threat, the rights of the IP
owners can be protected using three complementary weapons: technology, legislation, and business models. Because of the diversity of IP
(ranging from e-books to songs and movies) created by copyright
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
WHAT IS COPYRIGHT?
To guide the discussion into the proper context, we will begin with the
definition of copyright and summarize the important aspects of the
copyright law. Copyright is a form of protection provided by the laws of
the United States (title 17, U.S. Code) to the authors of original works of
authorship, including literary, dramatic, musical, artistic, and certain other
intellectual works [2]. Although copyright literally means right to copy,
the term is now used to cover a number of exclusive rights granted to
the authors for the protection of their work. According to Section 106
of the 1976 Copyright Act [3], the owner of copyright is given the
exclusive right to do, and to authorize others to do, any of the following:
The purpose and character of the use, including whether such use
is of a commercial nature or is for nonprofit educational purposes
The nature of the copyrighted work
The amount and substantiality of the portion used in relation to
the copyrighted work as a whole
The effect of the use upon the potential market for, or value of, the
copyrighted work
Copyrightable
Literary works
Musical works (including any
accompanying words)
Dramatic works (including
any accompanying music)
Pantomimes and choreographic works
Pictorial, graphic, and sculptural works
Motion pictures and other audiovisual
works
Sound recordings
Architectural works
Not Copyrightable
Works that have not been fixed in a
tangible form of expression
Titles, names, short phrases, and
slogans; familiar symbols or
designs; mere variations of
typographic ornamentation,
lettering, or coloring; mere
listings of ingredients or contents
Ideas, procedures, methods,
systems, processes, concepts,
principles, discoveries, or devices,
as distinguished from a description,
explanation, or illustration,
Works consisting entirely of
information that is common
property and containing
no original authorship.
Event
10
was more than three times as fast as the remainder of the U.S.
economy (5% vs. 1.5%).
In 2001, the U.S. core copyright industries estimated foreign sales
and exports was $88.97 billion, leading all major industry sectors
(chemical and allied products; motor vehicles, equipment and
parts; aircraft and aircraft parts; electronic components and
accessories; computers and peripherals).
Estimated losses
1322.3
2142.3
3539.0
1690.0
514.5
9208.1
11
12
There are three industries with vested interest in the digital content
protection arena: motion picture, consumer electronics, and information
technology. Table 1.4 lists some of the key players that represent
companies ranging from content owners to device manufacturers and
service providers.
In the last two decades, several protection systems have been
proposed and implemented in commonly used digital distribution
networks. These include:
14
Brief information
ATSC [12]
15
MPAA [17]
RIAA [18]
IETF [19]
Continued
Brief information
SCTE (Society of Cable Telecommunications Engineers) is a
nonprofit professional organization committed to advancing
the careers of cable telecommunications professionals and
serving the industry through excellence in professional
development, information, and standards. Currently, SCTE has
almost 15,000 members from the United States and 70 countries
worldwide and offers a variety of programs and services for
the industrys educational benefit.
MPAA (Motion Picture Association of America) and its
international counterpart, the Motion Picture Association (MPA),
represent the American motion picture, home video, and
television industries, domestically through the MPAA and
internationally through the MPA. Founded in 1922 as the
trade association of the American film industry, the MPAA has
broadened its mandate over the years to reflect the diversity of
an expanding industry. The MPA was formed in 1945 in the
aftermath of World War II to reestablish American films in the
world market and to respond to the rising tide of protectionism
resulting in barriers aimed at restricting the importation of
American films.
RIAA (Recording Industries Association of America) is the trade
group that represents the U.S. recording industry. Its mission is
to foster a business and legal climate that supports and promotes
the members creative and financial vitality. The trade groups
more than 350 member companies create, manufacture, and
distribute approximately 90% of all legitimate sound recordings
produced and sold in the United States.
IETF (Internet Engineering Task Force) is a large, open
international community of network designers, operators,
vendors, and researchers concerned with the evolution of the
Internet architecture and the smooth operation of the Internet.
The IETF working groups are grouped into areas and are
managed by Area Directors (ADs). The ADs are members of
the Internet Engineering Steering Group (IESG). Architectural
oversight is provided by the Internet Architecture Board, (IAB).
The IAB and IESG are chartered by the Internet Society (ISOC)
for these purposes. The General Area Director also serves as the
chair of the IESG and of the IETF and is an ex-officio member
of the IAB.
(Continued)
16
DVB [21]
Brief information
MPEG (Moving Pictures Expert Group) is originally the name given
to the group of experts that developed a family of international
standards used for coding audiovisual information in a
digital compressed format. Established in 1988, the MPEG Working
Group (formally known as ISO/IEC JTC1/SC29/WG11) is part
of JTC1, the Joint ISO/IEC Technical Committee on Information
Technology. The MPEG family of standards includes MPEG-1,
MPEG-2, and MPEG-4, formally known as ISO/IEC-11172,
ISO/IEC-13818, and ISO/IEC-14496, respectively.
DVB (Digital Video Broadcasting) Project is an industry-led
consortium of over 300 broadcasters, manufacturers,
network operators, software developers, regulatory bodies,
and others in over 35 countries committed to designing global
standards for the global delivery of digital television and
data services. The General Assembly is the highest body
in the DVB Project. The Steering Board sets the overall
policy direction for the DVB Project and handles its
coordination, priority setting and, management, aided by
three Ad Hoc Groups on Rules & Procedures, Budget, and
Regulatory issues. The DVB Project is divided in four main
Modules, each covering a specific element of the work
undertaken. The Commercial Module and Technical Module
are the driving force behind the development of the DVB
specifications, with the Intellectual Property Rights Module
addressing IPR issues and the Promotion and Communications
Module dealing with the promotion of DVB around the globe.
Many problems, some of which are controversial, are still open and
challenge the motion picture, consumer electronics, and information
technology industries.
In an end-to-end protection system, a fundamental problem is to
determine whether the consumer is authorized to access the requested
content. The traditional concept of controlling physical access to places
(e.g., cities, buildings, rooms, highways) has been extended to the digital
world in order to deal with information in binary form. A familiar example
is the access control mechanism used in computer operating systems to
manage data, programs, and other system resources. Such systems can
be effective in bounded communities [9] (e.g., a corporation or a
college campus), where the emphasis is placed on the original access to
information rather than how the information is used once it is in the
17
2.
3.
In the context of CA systems, scrambling is the process of content encryption. This term is
inherited from the analog protection systems where the analog video was manipulated using
methods such as line shuffling. It is now being used to distinguish the process from the
protection of descrambling keys.
18
4.
The program may come directly from the head-end or a local storage device. Protection of
local storage (such as a hard disk) is a current research area.
19
6.
7.
Packaging of content
Secure delivery and storage of content
Prevention of unauthorized access
Enforcement of usage rules
Monitoring the use of content
21
2.
3.
4.
22
The publisher packages the media file (i.e., the content) and
encrypts it with a symmetric cipher. The package may include
information about the content provider, retailer, or the Web
address to contact for the rights.
The protected media file is placed on a server for downloading or
streaming. It can be located with a search engine using the proper
content index.
The customer requests the media file from the server.
The file is sent after the client device is authenticated. The
customer may also be required to complete a purchase transaction. Authentication based on public-key certificates is commonly
used for this purpose. Depending on the DRM system, the usage
rules and the key to unlock the file may either be attached to the
file or need to be separately obtained (e.g., in the form of a license)
from the clearinghouse or any other registration server. The
attachment or the license are protected in such a way that only
5.
6.
7.
8.
TABLE 1.5.
DRM-related activities
Organization
MPEG-4: Latest compression standard designed specially for low-bandwidth (less than
1.5Mit/sec bitrate) video/audio encoding purposes. As a universal language for a range of
multimedia applications, it will provide additional functionality such as bitrate scalability,
object-based representation, and intellectual property management and protection.
MPEG-4 IPMP (version 1) is a simple hook DRM architecture standardized in 1999. As each
application may have different requirements for the protection of multimedia data, MPEG-4
allows the application developers to design domain-specific IPMP systems (IPMP-S). MPEG-4
standardizes only the MPEG-4 IPMP interface with IPMP Descriptors (IPMP-Ds) and
IPMP Elementary Streams (IPMP-ES), providing a communication mechanism between IPMP
systems and the MPEG-4 terminal.
MPEG-7: Formally named Multimedia Content Description Interface, the MPEG-7 standard
provides a set of standardized tools to describe multimedia content. The main elements of the
standard are description tools (Descriptors [D] and Description Schemes [DS]), a Description
Definition Language (DDL) based on the XML Schema Language, and system tools. The DDL
defines the syntax of the Description Tools and allows the creation of new Description
Schemes and Descriptors as well as the extension and modification of existing Description
Schemes. System tools enable the deployment of descriptions, supporting binary-coded
representation for efficient storage and transmission, transmission mechanisms, multiplexing
of descriptions, synchronization of descriptions with content, and management and
protection of intellectual property in MPEG-7 descriptions.
(Continued)
25
Recent efforts
TABLE 1.5.
Recent efforts
MPEG-21: MPEG-21 defines a normative open framework for multimedia delivery and
consumption that can be used by content creators, producers, distributors, and service
providers in the delivery and consumption chain. The framework is based on two essential
concepts: the definition of a fundamental unit of distribution and transaction (the Digital Item)
and the concept of Users interacting with Digital Items. Development of an interoperable
framework for Intellectual Property Management and Protection (IPMP) is an ongoing effort
that will become a part of the MPEG-21 standard.
IPMP-X (Intellectual Property Management and Protection Extension) [40] is a DRM
architecture that provides a normative framework to support many of the requirements of
DRM solution (renewability, secure communications, verification of trust, granular and
flexible governance at well-defined points in the processing chain, etc.). IPMP-X comes in
two flavors: MPEG-2 IPMP-X (applicable to MPEG-2 based systems) and MPEG-4 IPMP-X
(applicable to MPEG-4 based systems). The MPEG-4 IPMP extensions were standardized in
2002 as an extension to MPEG-4 IPMP hooks. IPMP Tools are modules that perform IPMP
functions such as authentication, decryption, and watermarking. In addition to specifying
syntax to signal and trigger various IPMP Tools, IPMP-X specifies the architecture to plug the
IPMP Tools seamlessly into IPMP-X terminal.
MPEG LA, LLC [41] provides one-stop technology platform patent licensing with a portfolio
of essential patents for the international digital video compression standard known as
MPEG-2. In addition to MPEG-2, MPEG LA licenses portfolios of essential patents for the
IEEE 1394 Standard, the DVB-T Standard, the MPEG-4 Visual Standard, and the MPEG-4 Systems
Standard. In October 2003, MPEG LA, LLC, issued a call for patents that are essential to
digital rights management technology (DRM) as described in DRM Reference Model v 1.0. The
DRM Reference Model does not define a standard for interoperability among DRM devices,
systems, or methods, or provide a specification of commercial products. It is an effort to
provide users with convenient, fair, reasonable, nondiscriminatory access to a portfolio of
essential worldwide patent rights under a single license. If the initial evaluation of the
submitted patents is completed by the end of 2003, a joint patent license may become
available in late 2004.
26
Organization
Continued
27
Its members consist of hardware and software companies, print and digital publishers,
retailers, libraries, accessibility advocates, authors, and related organizations. The OeBF
engages in standards and trade activities through the operation of Working Groups and
Special Interest Groups. The Working Groups are authorized to produce official OeBF
documents such as specifications and process documents (such as policies and procedures,
position papers, etc.). In the current organization, there are five Working groups:
Metadata & Identifiers WG, Publication Structure WG, Requirements WG, Rights & Rules
WG, and Systems WG. The mission of the Rights and Rules Working Group is to
create an open and commercially viable standard for interoperability of digital rights
management (DRM) systems.
Internet Digital Rights Management (IDRM) was an IRTF Research Group formed to
research issue and technologies relating to Digital Rights Management (DRM) on the
Internet. The IRTF is a sister organization of the Internet Engineering Task Force
(IETF). There were three IRTF drafts, formally submitted through IDRM, that carried
the IRTF title. The IDRM group is now closed.
ment schemes have been proposed in the last 1015 years. Four
classifications from the literature are:
1.
2.
3.
4.
2.
3.
30
Table 1.6.
Centralized group
control
Subgroup control
Member control
Scalable
Non-scalable
Ballardie, 96 [76]
Briscoe, 99 [77]
Boyd, 97 [82]
Steiner et al, 97 [83]
Becker and Willie, 98 [84]
31
Periodic batch rekeying: The key server processes both join and
leave requests periodically in a batch.
Periodic batch leave rekeying: The key server processes each join
request immediately to reduce the delay for a new member to
access group communications but processes leave requests in a
batch.
Periodic batch join rekeying: The key server processes each leave
request immediately to reduce the exposure to members who have
left but processes join requests in a batch.
2.
3.
4.
Access control lists: The sender maintains a list of hosts who are
either authorized to join the multicast group or excluded from it.
When a host sends a join request, the sender checks its identity
against the access control list to determine if membership is
permitted. The maintenance of the list is an important issue, as the
list may be changing dynamically based on new authorizations or
exclusions.
Capability certificates: Issued by a designated Certificate Authority,
a capability certificate contains information about the identity of
the host and the set of rights associated with the host. It is used to
authenticate the user and to allow group membership.
Mutual authentication: The sender and the host authenticate each
other via cryptographic means. Symmetric or public-key schemes
can be used for this purpose.
2.
36
A different version of video for each group member [69]: For a given
multicast video, the sender applies two different watermark
functions to generate two different watermarked frames, di,w0 and
di,w1, for every frame i in the stream. The designated group leader
assigns a randomly generated bit stream to each group member.
The length of the bit string is equal to the number of video frames in
the stream. For the ith watermarked frame in stream j, j 0, 1, a
different key Ki, j is used to encrypt it. The random bit stream
determines whether the member will be given Ki0 or Ki1 for
decryption. If there is only one leaking member, its identification is
made possible with the collaboration of the sender who can read
the watermarks to produce the bit stream and the group leader
who has the bit streams of all members. The minimum length of the
retrieved stream to guarantee a c-collusion detection, where c is the
number of collaborators, is not known. An important drawback of
the proposal is that it is not scalable and two copies of the video
stream need to be watermarked, encrypted, and transmitted.
Distributed watermarking (watercasting) [111]: For a multicast
distribution tree with maximum depth d, the source generates a
total of n differently watermarked copies of each packet such that
n d. Each group of n alternate packets is called a transmission
group. On receiving a transmission group, a router forwards all but
one of those packets to each downstream interface on which there
are receivers. Each last hop router in the distribution tree will
receive n - dr packets from each transmission group, where dr is the
depth of the route to this router. Exactly one of these packets will
be forwarded onto the subnet with receivers. The goal of this
filtering process is to provide a stream for each receiver with a
unique sequence of watermarked packets. The information about
the entire tree topology needs to be stored by the server to trace
an illegal copy. A major potential problem with watercasting is the
support required from the network routers. The network providers
may not be willing to provide a security-related functionality unless
video delivery is a promising business for them.
Watermarking with a hierarchy of intermediaries [112]: WHIM Backbone (WHIM-BB) introduces a hierarchy of intermediaries into the
network and forms an overlay network between them. Each
intermediary has a unique ID used to define the path from the
source to the intermediary on the overlay network. The Path ID is
embedded into the content to identify the path it has traveled. Each
intermediary embeds its portion of the Path ID into the content
before it forwards the content through the network. A watermark
embedded by a WHIM-BB identifies the domain of a receiver.
37
40
Baseline rekeying (BR): The member first leaves the group via area i
and then rejoins the group via area j. The data transmission is
halted during the distribution of the KEKs and the DEK. In BR, when
a member leaves the group, a notification is sent to its current AKD.
Immediate rekeying (IR): The member initiates a transfer by sending
one notification to AKDi and one notification to AKDj. Area i
performs a KEKi rekey and area j performs a KEKj rekey. The only
KEK held by a group member is for the area in which it currently
resides. Unlike the baseline algorithm, no DEK is generated and
data transmission continues uninterrupted. In IR, when a member
leaves the group, a notification is sent to its current AKD.
Delayed rekeying (DR): The member sends one notification to AKDi
and one notification to AKDj. Area j performs a KEKj rekey, but area
i does not perform a KEKi rekey. AKDi adds the member to the Extra
Key Owner List (EKOL). The EKOL is reset whenever a local rekey
occurs. A member accumulates KEKs as it visits different areas. If
the entering member has previously visited area j, no KEKj rekey
occurs for j. If the member is entering area j for the first time, a KEKj
rekey occurs for j. To limit the maximum amount of time that KEKi
can be held by a member outside area i, each AKDi maintains a
timer. At t Ti (a threshold value), the KEKi is updated and the
timer is set to zero. At this point, no group member outside of area i
has a valid KEKi. In DR, when a member leaves the group, a
notification is sent to all the AKDs.
2.
3.
4.
42
DVD players can receive updates from newer releases of prerecorded DVDs or other compliant devices.
Set-top boxes (digital cable transmission receivers or digital
satellite broadcast receivers) can receive updates from content
streams or other compliant devices.
Digital TVs can receive updates from content streams or other
compliant devices.
Recording devices can receive updates from content streams, if
they are equipped with a tuner or other compliant devices.
Optical
media
Magnetic
media
What is protected?
CSS [119]
Video on DVD-ROM
CPPM [120]
Audio on DVD-ROM
CPRM [121]
Video or audio
on DVD-R/RW/RAM
4C/Verance
Watermark
[122]
Audio on DVD-ROM
To be
determined
Video on DVD-ROM/
R/RW/RAM
HDCP [123]
Brief description
CSS-protected video is
decrypted during
playback on the
compliant DVD player
or drive.
CPPM-protected audio is
decrypted during playback
on the compliant DVD
player or drive.
A/V content is re-encrypted
before recording on a
DVD recordable disc.
During playback, the
compliant player derives
the decryption key.
Inaudible watermarks are
embedded into the audio
content. The compliant
playback or recording
device detects the CCI
represented by the
watermark and responds
accordingly.
Invisible watermarks are
embedded into the video
content. The compliant
playback or recording
device detects the CCI
represented by the
watermark and responds
accordingly. If a copy is
authorized, the compliant
recorder creates and
embeds a new watermark
to represent
no-more-copies.
Similar in function to the
Content Scrambling
System (CSS).
(Continued)
43
What is protected?
DTCP [124]
HDCP [125]
Digital Visual
Interface (DVI)
and High Definition
Multimedia Interface
(HDMI)
Brief description
The source device and the
sink device authenticate
each other, and establish
shared secrets. A/V content
is encrypted across the
interface. The encryption
key is renewed periodically.
Video transmitter authenticates
the receiver and establishes
shared secrets with it. A/V
content is encrypted across
the interface. The encryption
key is renewed frequently.
Each solution in Table 1.7 defines a means of associating the CCI with
the digital content it protects. The CCI communicates the conditions
under which a consumer is authorized to make a copy. An important
subset of CCI is the two Copy Generation Management System (CGMS)
bits for digital copy control: 11 (copy-never), 10 (copy-once), 01
(no-more-copies), and 00 (copy-free). The integrity of the CCI should be
ensured to prevent unauthorized modification. The CCI can be associated
with the content in two ways: (1) The CCI is included in a designated field
in the A/V stream and (2) the CCI is embedded as a watermark into the
A/V stream.
A CPRM-compliant recording device refuses to make a copy of content
labeled as copy-never or no-more-copies. It is authorized to create a
copy of copy-once content and label the new copy as no-morecopies. The DTCP carries the CGMS bits in the isochronous packet
header defined by the interface specification. A sink device that receives
content from a DTCP-protected interface is obliged to check the CGMS
bits and respond accordingly. As the DVI is an interface between a
content source and a display device, no CCI transmission is involved.
In addition to those listed in Table 1.7, private DRM systems may also
be considered to be content protection solutions in home networks.
However, interoperability of devices supporting different DRM systems is
an unresolved issue today.
44
Media protected
Prerecorded
media
Protection
type
Video on DVD-ROM
Encryption
Audio on DVD-ROM
Encryption
Watermarking
Video or audio on
DVD-R/RW/RAM
Encryption
Watermarking
Encryption
Device
authentication
Association
of digital rights
Licensed
technology
System
renewability
Mutual
between
DVD drive
and PC
Mutual
between
DVD drive
and PC
na
Metadata
CSS [119]
Device
revocation
Metadata
CPPM [120]
Device
revocation
Watermark
na
Mutual
between
DVD drive
and PC
na
na
Metadata
4C/Verance
Watermark
[122]
CPRM [121]
tbd
High Definition
Copy Protection
(HDCP) [123]
na
Device
revocation
Watermark
Metadata
Device
revocation
46
Table 1.8.
Digital
interface
Cable
transmission
Encryption
47
Metadata
DTCP [124]
Device
revocation
Metadata
HDCP [125]
Device
revocation
Metadata
Open standards
[129,130,131]
Service
revocation
Encryption
Mutual
between
source
and sink
Mutual
between
source
and sink
Mutual
between
host device
and removable
security
device
None
Encryption
Satellite transmission
(Privately defined by
service providers
and CA vendors.)
Metadata
Smartcard
revocation
Encryption
None
Metadata
Encryption
None
Metadata
Conditional access
system [132,133]
privately defined
by service
providers
Conditional access
system [131]
framework defined
by ATSC
Conditional
access system
[134] privately
defined by
OpenCable
Terrestrial transmission
Encryption
Smartcard
revocation
Smartcard
revocation
(Continued)
Broadcasting
IEEE 1394
Continued
Media protected
Protection
type
Internet
Encryption
Receiver
Metadata
DRM [135,136]
Software
update
Encryption
Sender and
receiver
(depends
on the
authentication
type)
Metadata
Group key
management
[137]
tbd
Unicast-based DRM
systems are privately
defined, and hence
are not interoperable.
Multicast-based DRM
systems are yet to
appear in the
market.
An Internet Draft
defines a common
architecture for
MSEC group key
management
protocols that
support a variety
of application,
transport and
internetwork
security protocols.
A few watermarking
schemes have been
proposed for
multicast data.
Device
authentication
Association
of digital rights
Licensed
technology
Watermarking
proposals [38]
System
renewability
48
Table 1.8.
Two WIPO treaties the WIPO Copyright Treaty and the WIPO
Performances and Phonograms Treaty obligate the member states to
prohibit circumvention of technological measures used by copyright
owners to protect their works and to prevent the removal or alteration of
copyright management information.
The international conventions that have been signed for the worldwide
protection of copyrighted works include [1,138]:
Since the end of the 1990s, we have seen important efforts to provide
legal solutions regarding copyright protection and management of digital
rights in the United States.
The most important legislative development in the recent years was
the Digital Millennium Copyright Act (DCMA). Signed into a law on
October 28, 1998, this Act implements the WIPO Copyright Treaty and
the WIPO Performances and Phonograms Treaty. Section 103 of the
DMCA amends Title 17 of the U.S. Code by adding a new chapter 12.
Section 1201 makes it illegal to circumvent technological measures that
prevent unauthorized access and copying, and Section 1202 introduces
prohibitions to ensure the integrity of copyright management information. The DCMA has received earnest criticism with regard to the
ambiguity and inconsistency in expressing the anticircumvention provisions [9,139]
49
BUSINESS MODELS
In addition to technical and legal means, owners of digital copyrighted
content can also make use of new, creative ways to bring their works to
the market. A good understanding of the complexity and cost of
50
Type of content
Duration of the economic value of content
Fixation method
Distribution channel
Purchase mechanism
Technology available for protection
Extent of related legislation
Single transaction
purchase
Subscription
purchase
Single transaction
license
Serial transaction
license
Site license
Examples
Relevance to
copyright protection
Payment per
electronic use
Combined subscription
and advertising
Advertising only
Free distribution
Free samples
Free goods with
purchases
Information in the
public domain
Electronic subscription
to a single title
Software for a
whole company
Information resource
paid per article
High sensitivity to
unauthorized use
Recent
Free distribution of music
because it enhances the
market for concerts,
t-shirts, posters, etc.
Antivirus software
Services or products
not subject to
replication difficulties
of the digital content.
Products have short
shelf life.
(Continued)
52
Continued
Traditional
Type
Extreme customization
of the product
Provide a large
product in small
pieces
Give away digital
content to increase
the demand for the
actual product
Give away one piece
of digital content to
create a market for
another
Allow free distribution
of the product but
request payment
Position the product
for low-priced, mass
market distribution
Examples
Personalized CDs
Online databases
Relevance to
copyright protection
No demand from
other people.
Difficulty in copying.
Shareware
Microsoft XP
Cost of buying
converges with
cost of stealing.
SUMMARY
We presented an overview of the complex problem of copyrighted
multimedia content protection in digital distribution networks. After an
introduction to copyright and copyright industries, we examined the
53
54
The DVD CCA is a not-for-profit corporation with responsibility for licensing CSS
(Content Scramble System) to manufacturers of DVD hardware, disks, and related products.
55
ACKNOWLEDGMENTS
The authors would like to acknowledge the permission granted by
Springer-Verlag, Elsevier, the Institute of Electrical and Electronics
Engineers (IEEE), the International Association of Science and Technology for Development (IASTED), and the International Society for Optical
Engineering (SPIE) for partial use of the authors following research
material published by them:
56
Eskicioglu, A.M., Protecting intellectual property in digital multimedia networks, IEEE Computer, 3945, 2003.
Lin, E.T., Eskicioglu, A.M., Lagendijk, R.L., and Delp, E.J., Advances
in digital video content protection, Proc. IEEE, 2004.
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
http://www.wipo.org.
http://www.loc.gov/copyright/circs/circ1.html.
http://www.loc.gov/copyright/title17/92chap1.html#106.
Strong, W.S., The Copyright Book, MIT Press, Cambridge, MA, 1999.
http://www.loc.gov/copyright/docs/circ1a.html.
http://arl.cni.org/info/frn/copy/timeline.html.
Goldstein, P., Copyrights Highway, Hill and Wang, 1994.
http://www.iipa.com.
National Research Council, The Digital Dilemma: Intellectual Property in the
Information Age, National Academy Press, Washington, DC, 2000.
Menezes, J., van Oorschot, P.C., and Vanstone, S.A., Handbook of Applied
Cryptography, CRC Press, Boca Raton, FL, 1997.
Schneier, B., Applied Cryptography, John Wiley & Sons, 1996.
Advanced Television Systems Committee, available at http://www.atsc.org.
Consumers Electronics Association, available at http://www.ce.org.
Copy Protection Technical Working Group, available at http://www.cptwg.org.
DVD Forum, available at http://www.dvdforum.org.
Society of Cable Telecommunications Engineers, available at http://www.scte.org.
Motion Picture Association of America, available at http://www.mpaa.org.
Recording Industries Association of America, available at http://www.riaa.org.
Internet Engineering Task Force, available at http://www.ietf.org/overview.html.
Moving Pictures Expert Group, available at http://mpeg.telecomitalialab.com.
Digital Video Broadcasting Project, available at http://www.dvb.org.
de Bruin, R. and Smits, J., Digital Video Broadcasting: Technology, Standards and
Regulations, Artech House, 1999.
Benoit, H., Digital Television: MPEG-1, MPEG-2 and Principles of the DVB System,
Arnold, London, 1997.
Guillou, L.C. and Giachetti, J.L., Encipherment and conditional access, SMPTE J.,
103(6), 398406, 1994.
Mooij, W. Conditional Access Systems for Digital Television, International Broadcasting Convention, IEE Conference Publication, 397, 1994, pp. 489491.
Macq, B.M. and Quisquater, J.J., Cryptology for digital TV broadcasting, Proc. IEEE,
83(6), 1995.
Rossi, G., Conditional access to television broadcast programs: Technical solutions,
ABU Tech. Rev. 166, 312, SeptemberOctober 1996.
Cutts, D., DVB conditional access, Electron. Commn., Engi. J., 9(1), 2127, 1997.
Mooij, W., Advances in Conditional Access Technology, International Broadcasting
Convention, IEE Conference Publication, 447, 1997, pp. 461464.
Eskicioglu, A.M., A Key Transport Protocol for Conditional Access Systems, in
Proceedings of the SPIE Conference on Security and Watermarking of Multimedia
Contents III, San Jose, CA, USA, 2001, pp. 139148.
International Standard ISO-IEC 13818-1 Information technology Generic coding of
moving pictures and associated audio information: Systems, First Edition, 1996.
EIA-679B National Renewable Security Standard, September 1998.
57
58
59
60
61
62
Vulnerabilities
of Multimedia
Protection
Schemes
Mohamed F. Mansour and Ahmed H. Tewfik
INTRODUCTION
The deployment of multimedia protection algorithms to practical systems has moved to the standardization and implementation phase. In the
near future, it will be common to have audio and video players that
employ a watermarking mechanism to check the integrity of the played
media. The publicity of the detectors introduces new challenges for
current multimedia security.
The detection of copyright watermarks is a binary hypothesis test
with the decision boundary determined by the underlying test statistic.
The amount of signal modification that can be tolerated defines a
distortion hyperellipsoid around the representation of the signal in the
appropriate multidimensional space. The decision boundary implemented by the watermark detector necessarily passes through that
hyperellipsoid because the distortion between the original and watermarked signals is either undetectable or acceptable. Once the attacker
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
63
65
wn2
VarflRg 1=L
XX
n
Ex*nxmw*n wm,
under H0 and H1
Efjxnj2 g wn2 ,
under H0 and H1
2:3
2:4
n,m
k 1, 2, . . . , L
2:5
For the above scheme, the embedding process can be undone if {k,
d 1(k), d 2(k)}k 1:L are estimated correctly. In Section Attack on
Quantization-Based Schemes, we provide an attack that unveils these
parameters and removes the watermark with minimum distortion.
Note that, in Reference 6, 2 d 2(k) is expressed in terms of d 1(k) and k
to maximize the robustness. In this case,
d 2 k d 1 k k =2
if d 1 k < 0
d 1 k k =2
if d 1 k 0
This further reduces the number of unknowns and, hence, reduces the
estimation complexity.
Quantized Projection Watermarking. This scheme is based on projecting
the signal (or some features derived from it) on an orthogonal set of
vectors. The projection value is then quantized using scalar quantization
and the quantization index is forced to be odd or even to embed 1 or 0,
68
Figure 2.1.
70
Figure 2.3.
72
Figure 2.5. LMS convergence when only the detector decisions are observed.
73
Note that, we do not need to find points exactly on the decision boundary because the confidence measure can be computed for any image, not
only the instances on the decision boundary. This significantly accelerates the algorithm.
The above attack works well when the confidence measure is linear
with the correlation. A nonlinear confidence measure does not fit the
linear model of the LMS and results in substantial reduction in the algorithm speed. In Figure 2.6, we show the algorithm convergence when 10
watermarked images are used. The nonlinear function in the figure is quadratic in the difference l(R) . By comparing Figures 2.5 and 2.6, we note
that the algorithm is accelerated only with the linear confidence measure.
For the nonlinear confidence measure, it is better to ignore the confidence
measure and run the attack with points only on the decision boundary,
as discussed earlier.
If the watermark is embedded in the transform coefficients, the
watermark will not be, in general, binary in the signal domain. In this
case, the actual estimated values of the watermark components are used
rather than their signs. However, if the attacker knows which transform
Figure 2.6.
available.
2.
Define a search range [0, M(k)] for the quantization step k. This
range is determined by the maximum allowable distortion M(k)
of each component.
Within this range, search is done for the best quantization step by
performing the following steps for each candidate value of k:
a.
3.
After estimating k, d 1(k) and d 2(k) are simply the means of the
positive and negative quantization errors, respectively. Note that,
the signs of d 1(k) and d 2(k) are chosen such that the different
components of d 1 or d 2 within each segment are consistent.
The estimation of L follows directly from the analysis of the quantization error. Note that for any quantization step (even if it is incorrect),
the quantization error pattern of similar components at different segments is quite similar. This is basically because the reconstruction points
of these segments are discrete. Recall that the kth component of each
segment is modified according to Equation 2.5. Assume that the quantization function with step is G(). The quantization error of the kth
component after using is
ei k Gqx i k d i k d i k qx i k d i k d i k
2:7
Let zk G(q(x i(k) d i(k)) d i(k)) q(x i(k) d i(k)). If d i(k), x i(k),
then zk is approximately distributed as an uniform random variable
(i.e., zk U[ /2, /2]). Hence, the autocorrelation function (ACF) of the
quantization error at lags equal to a multiple of L will be approximately
X
ACFek nL 1=2
d 1 k d 2 k
2:8
k
k 1, 2, . . . , L
2:9
where k is a positive quantity slightly larger than k/4. Note that adding
or subtracting k is equally good, as it makes the quantization error of the
incorrect dither vector smaller.
Rather than modifying the whole segment, it is sufficient to modify
some components such that the quantization error of the incorrect dither
vector is less than the quantization error of the correct dither vector.
This can be achieved if a little more than half of the components of each
segment is modified according to Equation 2.9. Moreover, not all of the
segments are modified. The segments are modified progressively until
the detector responds negatively.
In Figure 2.10, we give an example of the attack. In Figure 2.10, dither
modulation is used to modify the 8 8 DCT coefficients of the image. The
quantization step of each coefficient is proportional to the corresponding
JPEG quantization step. The PSNR of the watermarked image is 40.7 dB.
The attack as described above is applied with each component perturbed by an amount k, whose amplitude is slightly larger than k/4.
If we assume that the detector fails when half of the extracted bits
are incorrect, then only half of the blocks need to be modified. The
2:10
2:11
Now, the objective of the attack is to find q0 and remove the watermark
with the minimum distortion. The problem is that q0 is data dependent, as
noted from Equation 2.11. It has to be estimated within each block
and this is difficult in general. However, if q0 is averaged over all blocks,
the average quantization vector q0av will be a good approximation, as will
be discussed. Note that the all the entries of q are positive. However, the
entries of q0 , in general, may have negative signs. Estimating the signs
of the quantization steps in the signal domain is necessary to be able to
change the quantization values correctly in the transform coefficients.
The estimation of the block length is the same as described in
the previous subsection because of the similarity in the distribution of
the quantization noise at the corresponding samples of each block. This
is illustrated in Figure 2.11, where we plot the ACF of the quantization
error in the signal domain while the data are embedded by quantizing
80
Select one coefficient in the signal domain and assume that its sign
is positive.
Change the value of this coefficient by a sufficient amount such
that the detector changes its decision.
Return the sample to its original value and then perturb it by /2.
Perturb each other entry by /2 (one at a time). If the perturbation
results in changing the detector decision to its original value, then
the sign of the corresponding entry is negative, otherwise it is
positive.
Figure 2.13.
82
the original slot. This results in decreasing the error rate, as noted in
Figure 2.14.
Note that if multiple codebooks are used for data embedding (as described in Reference 6), with the entries of the codebooks replacing the
original data segments, then the estimation is much simpler. In this case,
each codebook can be constructed by just observing the segments of
watermarked data and the detector decisions. Once all codebooks are
estimated, the embedded data can be simply modified by moving each
segment to another codebook.
Comments on Attacks on Quantized-Based Schemes. The proposed attack
on quantization-based schemes is simpler than the LMS attack for linear
detectors because it does not iterate. We assumed that, the data are
embedded regularly in blocks of fixed size L. This is the typical scheme
proposed in the original works that employ quantization for embedding
(e.g., References 6 and 9). However, if the block boundaries are varying,
then the attacker may use multiple watermarked items and perform the
same statistical analysis on the different items. In this case, a unique
quantization step is assumed for each sample and it is evaluated from the
different watermarked items rather than blocks.
Discussion
The problem discussed in this subsection and the previous one
motivated the search for decision boundaries that cannot be parameterized. For such boundaries, even if the attacker can change individual
watermarked signals to remove the watermark, the modification will be
random and the minimum distortion modification cannot be attained as
earlier. The only choice for the attacker is to try to approximate the
83
2:12
where f() is a fractal function. The basic steps of the proposed algorithms are:
1.
2.
3.
4.
84
Figure 2.16.
T T1 T2
86
2:13
Under H0, E(T ) (0, 0), and under H1, E(T ) (1, 1) if the watermark
strength is assumed to be unity. In both cases, cov(T ) (2 2/L)I, where 2
is the average of the variances of the signal coefficients. The Gaussian
assumption of both T1 and T2 is reasonable if L is large by invoking
the central limit theorem. Moreover, if the original samples are mutually
independent, then T1 and T2 are also independent. The decision boundary in this case is a line (with slope 1 for the given means). If this line
is fractalized, as discussed in the previous subsection, then the
corresponding decision boundary in the multidimensional space will be
also nonparametric.
The detection process is straightforward in principle but nontrivial.
The vector T is classified to either hypothesis if it falls in its partition.
However, due to the fractal nature of the boundary, this classification
is not trivial. First, an unambiguous region is defined as shown in
Figure 2.17 (i.e., outside the maximum oscillation of the fractal curve,
which are the regions outside of the dotted lines). For points in the
ambiguous area, we extend two lines between the point and means of
both hypotheses. If one of the lines does not intersect with the boundary
curve, then it is classified to the corresponding hypothesis. It should be
emphasized that the boundary curve is stored at the detector and it
should be kept secret.
Algorithm Performance
The technique proposed in the previous subsection is quite general. It can
be applied to any watermarking scheme without changing the embedding algorithm. The algorithm performance is, in general, similar to
87
Figure 2.18.
the performance of the underlying watermarking algorithm with its optimal detector. However, for some watermarked items, we may need to
increase the watermark strength, as illustrated in Figure 2.16.
Note that, in practice, the variance of the two statistics is small, so that
the performances under the old and the new decision boundaries are
essentially the same. For example, for an image of size 512 512 and 8
bits/pixel, if we assume a uniform distribution of the pixel values, then
the variance of both statistics will equal to 0.042 according to Equation
2.3. Hence, the joint pdf of the two statistics is very concentrated around
the means of both hypotheses. Consequently, the degradation in the
performance after introducing the new boundary is slight.
The receiver operating characteristic (ROC) of the proposed algorithm
is close to the ROC of the optimal detector, and it depends on the
maximum oscillation of the fractal boundary around the original one.
In Figure 2.18, we illustrate the performance of the detector discussed in
the previous subsection, when the mean is (0, 0) under H0 and is (1, 1)
under H1 , and the variance for both hypotheses is 0.1. As noticed from
Figure 2.18, the performance of the system is close to the optimal
performance, especially for a small curve oscillation. The degradation
for a large perturbation is minor. This minor degradation can be even
reduced by slightly increasing the strength of the embedded watermark.
Attacker Choices
In the discussion in Section The Generic Attack, the estimation of
the watermark is equivalent to the estimation of the decision boundary
for correlator detector. After introducing the new algorithm, the two
88
Figure 2.19.
images.
Figure 2.20.
the MSE of the fractal detector is almost constant as expected from the
learning curve in Figure 2.19.
DISCUSSION
In this work, we analyzed the security of watermarking detection when
the detector is publicly available. We described a generalized attack
for estimating the decision boundary and give different implementations
of it. Next, we proposed a new class of watermark detectors based on a
nonparametric decision boundary. This detector is more secure than
traditional detectors especially when it is publicly available. The performance of the new algorithm is similar to the traditional ones because of
the small variance of the test statistics.
The proposed detector can work with any watermarking schemes
without changing the embedder structure. However, with some watermarking procedures, the strength of the watermark may need to be
increased slightly to compensate for the new boundary.
REFERENCES
1. Furon, T. and Duhamel, P., Robustness of Asymmetric Watermarking Technique,
in Proceedings of IEEE International Conference on Image Processing, 2000, Vol. 3,
pp. 2124.
2. Furon, T., Venturini, I., and Duhamel, P., An Unified Approach of Asymmetric
Watermarking Schemes, in Proceedings of SPIE Conference on Security and
Watermarking Multimedia Contents, 2001, pp. 269279.
90
91
Part II
Multimedia
Encryption
Fundamentals
of Multimedia
Encryption
Techniques
Borko Furht, Daniel Socek, and
Ahmet M. Eskicioglu
INTRODUCTION
The recent advances in the technology, especially in the computer
industry and communications, allowed a potentially enormous market
for distributing digital multimedia content through the Internet. However,
the proliferation of digital documents, multimedia processing tools,
and the worldwide availability of Internet access have created an ideal
medium for copyright fraud and uncontrollable distribution of multimedia content [1]. A major challenge now is the protection of the intellectual property of multimedia content in multimedia networks.
To deal with the technical challenges, two major multimedia security
technologies are being developed:
1.
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
95
Multimedia watermarking technology as a tool to achieve copyright protection, ownership trace, and authentication
AES-128
AES-192
AES-256
Blocksize
(Nb words)
Number of
rounds (Nr)
4
6
8
4
4
4
10
12
14
99
Figure 3.1.
TABLE 3.2. AES encryption S-box: substitution values for the byte xy (in
hexadecimal representation)
y
x
0
1
2
3
4
5
6
7
8
9
a
b
c
d
e
f
63
ca
b7
04
09
53
d0
51
cd
60
e0
e7
ba
70
e1
8c
7c
82
fd
c7
83
d1
ef
a3
0c
81
32
c8
78
3e
f8
a1
77
c9
93
23
2c
00
aa
40
13
4f
3a
37
25
b5
98
89
7b
7d
26
c3
1a
ed
fb
8f
ec
dc
0a
6d
2e
66
11
0d
f2
fa
36
18
1b
20
43
92
5f
22
49
8d
1c
48
69
bf
6b
59
3f
96
6e
fc
4d
9d
97
2a
06
d5
a6
03
d9
e6
6f
47
f7
05
5a
b1
33
38
44
90
24
4e
b4
f6
8e
42
c5
f0
cc
9a
a0
5b
85
f5
17
88
5c
a9
c6
0e
94
68
30
ad
34
07
52
6a
45
bc
c4
46
c2
6c
e8
61
9b
41
01
d4
a5
12
3b
cb
f9
b6
a7
ee
d3
56
dd
35
1e
99
67
a2
e5
80
d6
be
02
da
7e
b8
ac
f4
74
57
87
2d
2b
af
f1
e2
b3
39
7f
21
3d
14
62
ea
1f
b9
e9
0f
fe
9c
71
eb
29
4a
50
10
64
de
91
65
4b
86
ce
b0
d7
a4
d8
27
e3
4c
3c
ff
5d
5e
95
7a
bd
c1
55
54
ab
72
31
b2
2f
58
9f
f3
19
0b
e4
ae
8b
1d
28
bb
76
c0
15
75
84
cf
a8
d2
73
db
79
08
8a
9e
df
16
The SubBytes() transformation is essentially an S-box type of transform, which is a nonlinear byte substitution that operates independently
on each byte of the State. The simplest representation of this S-box
function is by the lookup table. The lookup table associated with
SubBytes() is shown in Table 3.2.
100
s0 [0,
s0 [1,
s0 [2,
s0 [3,
Here, the operation denotes multiplication in GF(28) modulo the polynomial x4 1, where as the denotes the usual bitwise XOR operation.
Finally, in the AddRoundKey() transformation, a Round Key is added
to the State by a simple bitwise XOR operation. Each Round Key consists
of Nb words from the key schedule (to be described in the AES Key Schedule section), each of which is each added into the columns of the State.
AES Key Schedule. In the AES algorithm, the initial input symmetric
key is expanded to create a key schedule for each round. This procedure,
given in Figure 3.2, generates a total of Nb(Nr 1) words that are used in
Figure 3.2. Pseudocode of the key schedule stage of the AES algorithm.
101
The InvShiftRows() is the inverse of the ShiftRows() transformation, which is defined as follows: s[r, c] s0 [r, (c shift(r, Nb)) mod Nb]
for 0 r < 4 and 0 c < Nb. Similarly, InvSubBytes() is the inverse of
the byte substitution transformation, in which the inverse S-box is
applied to each byte of the State. The inverse S-box used in the
InvSubBytes() transformation is presented in Table 3.3.
The InvMixColumns() is the inverse of the MixColumns() transformation. The transformation InvMixColumns() operates on the State
column by column, treating each column as a polynomial over the field
GF(28) modulo x4 1 with a fixed polynomial a1(x), given by a1(x)
{0b}x3 {0d}x2 {09}x {0e}. This can be accomplished by applying the
following four equations to the State columns:
1.
s0 [0, c] ({0e} s[0, c]) ({0b} s[1, c]) ({0d} s[2, c]) ({09}
s[3, c])
Figure 3.3.
102
0
1
2
3
4
5
6
7
8
9
a
b
c
d
e
f
52
7c
54
08
72
6c
90
d0
3a
96
47
fc
1f
60
a0
17
09
e3
7b
2e
f8
70
d8
2c
91
ac
f1
56
dd
51
e0
2b
6a
39
94
a1
f6
48
ab
1e
11
74
1a
3e
a8
7f
3b
04
d5
82
32
66
64
50
00
8f
41
22
71
4b
33
a9
4d
7e
30
9b
a6
28
86
fd
8c
ca
4f
e7
1d
c6
88
19
ae
ba
36
2f
c2
d9
68
ed
bc
3f
67
ad
29
d2
07
b5
2a
77
a5
ff
23
24
98
b9
d3
0f
dc
35
c5
79
c7
4a
f5
d6
38
87
3d
b2
16
da
0a
02
ea
85
89
20
31
0d
b0
26
bf
34
ee
76
d4
5e
f7
c1
97
e2
6f
9a
b1
2d
c8
e1
40
8e
4c
5b
a4
15
e4
af
f2
f9
b7
db
12
e5
eb
69
a3
43
95
a2
5c
46
58
bd
cf
37
62
c0
10
7a
bb
14
9e
44
0b
49
cc
57
05
03
ce
e8
0e
fe
59
9f
3c
63
81
c4
42
6d
5d
a7
b8
01
f0
1c
aa
78
27
93
83
55
f3
de
fa
8b
65
8d
b3
13
b4
75
18
cd
80
c9
53
21
d7
e9
c3
d1
b6
9d
45
8a
e6
df
be
5a
ec
9c
99
0c
fb
cb
4e
25
92
84
06
6b
73
6e
1b
f4
5f
ef
61
7d
2.
3.
4.
s0 [1, c] ({09} s[0, c]) ({0e} s[1, c]) ({0b} s[2, c]) ({0d}
s[3, c])
s0 [2, c] ({0d} s[0, c]) ({09} s[1, c]) ({0e} s[2, c]) ({0b}
s[3, c])
s0 [3, c] ({0b} s[0, c]) ({0d} s[1, c]) ({09} s[2, c]) ({0e}
s[3, c])
Type of
data
Image
Frequency
domain
Spatial
domain
Video
Frequency
domain
Proposal
Encryption algorithm
What is encrypted?
No algorithm is
specified
DES, triple DES, and
IDEA
AES
No algorithm
specified
Xor
Quadtree structure
AES
DES, RSA
DES
Permutation, DES
xor, permutation,
IDEA
xor
106
TABLE 3.4.
Speech
Compressed
domain
107
IDEA
DES
Permutation, RC4
DES, AES
Wu and Mao,
2002 [27]
No algorithm
specified
Multiple Huffman tables,
multiple state indices in
the QM coder
DES, AES
No algorithm
specified
Permutation, xor
(Continued)
Spatial
domain
Entropy
codec
Type of
data
Audio
Continued
Domain
Compressed
domain
Proposal
Encryption algorithm
Servetti and
De Martin, 2002 [30]
Not specified
Not specified
Thorwirth, Horvatic,
Weis, and Zhao,
2000 [32]
Not specified
What is encrypted?
Source: Adopted from Liu, X. and Eskicioglu, A.M., IASTED International Conference on Communication, International and Information Technology
(CIIT 2003) 2003. With permission.
108
TABLE 3.4.
The authors choice was to encrypt the even sequence, creating the ciphertext
c1c2. . .cnEKeyE(a2a4. . .a2n).
112
Figure 3.5. Selective video encryption algorithm by Qiao and Nahrstedt. (From
Qiao, L. and Nahrstedt, K., Proceedings of the 1st International Conference on
Imaging Science, Systems and Technology (CISST 97), 1997. With permission.)
114
Figure 3.7. The bit selection order in RVEA. (From Bhargava, B., Shi, C., and
Wang, Y. (http://raidlab.cs.purdue.edu/papers/mm.ps)
Generate a random key K {(s0, s1, s2, s3), (p0, p1, . . ., pn 1), (o0, o1,
. . ., on 1)}, where each si is a 4-bit integer, whereas each pi and oi is
a 2-bit integer.
Figure 3.8. The Huffman tree mutation process. (From Wu, C.-P. and Kuo,
C.-C.J. in SPIE International Symposia on Information Technologies 2000,
pp. 285295; Wu, C.-P. and Kuo, C.-C.J., in Proceedings of SPIE Security and
Watermarking of Media Content III, 2001, vol. 4314.)
122
Initialize four state indices (I0, I1, I2, I3) to (s0, s1, s2, s3).
To encode the ith bit from the input, we use the index Ipi (mod n)
to determine the probability estimation value Qe.
If the state update is required after encoding the ith bit from the
input, all state indices are updated except for Ioi (mod n).
mendation G.723.1 compression standard. It has a very low bit rate and
it is extremely suitable for voice communications over the packetswitching-based networks. It is also a part of the ITU H.324 standard for
videoconferencing/telephony over the regular public telephone lines.
The compression is based on the analysis-by-synthesis method. The
encoder incorporates the entire decoder unit, which synthesizes a segment of speech according to given input coefficients. The encoder then
changes these coefficients until the difference between the original
speech and the synthesized speech is within the acceptable range. The
decoding is performed with three decoders: the LSP decoder, the pitch
decoder, and the excitation decoder; the G.723.1 coefficients can be
categorized depending on the decoder to which they are fed. In addition,
this codec can work in either 6.3-Kbps or 5.3-Kbps modes.
In Reference 28, Wu and Kuo suggest applying selective encryption to
the most significant bits of all important G.723.1 coefficients. They identified the following coefficients as the important ones: the LSP codebook
indices, the lag of pitch predictor, the pitch gain vectors, the fixed codebook gains, and the VAD mode flag. The total number of selected bits for
encryption is 37 in each frame, which is less than one-fifth of the entire
speech stream at the 6.3-Kbps rate and less than one-fourth of the entire
speech stream at the 5.3-Kbps rate.
Perception-Based Partial Encryption Algorithm by Servetti and De Martin,
2002. In 2002, Servetti and De Martin published a perception-based
Figure 3.10. Partial encryption for G.729 speech codec (grayed bits are
selected for encryption): (1) low-security algorithm and (2) high-security
algorithm. (From Servetti, A. and De Martin, J.C., IEEE Trans. Speech Audio
Process., 637643, 10(8), 2002. With permission.)
REFERENCES
1. IEEE Transactions on Circuits and Systems for Video Technology (Special issue on
authentication, copyright protection, and information hiding), 13(8), 2003.
2. Eskicioglu, A.M., Protecting intellectual property in digital multimedia networks,
IEEE Computer, (Vol. 36, No. 7), 3945, 2003.
130
131
132
Chaos-Based
Encryption for
Digital Images
and Videos
Shujun Li, Guanrong Chen, and Xuan Zheng
INTRODUCTION
Many digital services, such as pay TV, confidential videoconferencing,
medical and military imaging systems, require reliable security in storage
and transmission of digital images and videos. Because of the rapid
progress of the Internet in the digital world today, the security of digital
images and videos has become more and more important. In recent
years, more and more consumer electronic services and devices, such as
mobile phones and personal digital assistant (PDA), have also started to
provide additional functions of saving and exchanging multimedia
messages. The prevalence of multimedia technology in our society has
promoted digital images and videos to play a more significant role than
the traditional dull texts, which demands a serious protection of users
privacy. To fulfill such security and privacy needs in various applications,
encryption of images and videos is very important to frustrate malicious
attacks from unauthorized parties.
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
133
cipher. For public key ciphers, the encryption key Ke is published and the
decryption key Kd is kept private, for which no additional secret channel
is needed for key transfer.
According to the encryption structure, ciphers can be divided into
two classes: block ciphers and stream ciphers. Block ciphers encrypt
the plaintext block by block and each block is mapped into another
block with the same size. Stream ciphers encrypt the plaintext with
a pseudorandom sequence (called keystream) controlled by the encryption key.
A cryptographically secure cipher should be strong enough against
all kinds of attack. For most ciphers, the following four attacks should
be tested: (1) ciphertext-only attack attackers can get the ciphertexts
only; (2) known-plaintext attack attackers can get some plaintexts
and the corresponding ciphertexts; (3) chosen-plaintext attack attackers can choose some plaintexts and get the corresponding ciphertexts; (4) chosen-ciphertext attack attackers can choose some
ciphertexts and get the corresponding plaintexts. It is known that
many image and video encryption schemes are not secure enough
against known- or chosen-plaintext attack, as shown in the following
subsections.
2.
3.
4.
5.
136
Figure 4.2. An uncompressed plain image (a) containing many areas with fixed
gray levels and its corresponding cipher image (b) encrypted by 128-bit
Advanced Encrytion Standard (AES) running in the ECB mode.
6.
7.
position (i, j ), all cipher-pixels except for the ones at (i, j ) will be
identical in the ECB mode, and all cipher-pixels before (i, j ) will be
identical in the CBC mode. To maintain the avalanche property,
special algorithms should be developed.
New concepts of security and usability: The diverse multimedia
services need different security levels and usability requirements,
some of which should be defined and evaluated with human vision
capabilities. A typical example is the so-called perceptual encryption, with which only partial visible information is encrypted and
the cipher-image and cipher-video gives a rough view of the highquality services.
138
Generally speaking, there exist tight relationships among the abovediscussed features: (1) Transparency and the idea of selective encryption
(see section Selective Encryption for more details) are requirements of
scalability and perceptibility and (2) multilevel security is achieved by
providing perceptibility in some scalable encryption schemes.
IMAGE AND VIDEO ENCRYPTION: A COMPREHENSIVE SURVEY
Generally speaking, there are two basic ways to encrypt digital image: in
the spatial domain or in the transform domain. Because digital videos are
generally compressed in the DCT (discrete cosine transform) domain,
almost all video encryption algorithms work in DCT domain. Due to
the recent prevalence of the wavelet compression technique and the
adoption of the wavelet transform in JPEG2000 standard [26], in recent
years image and video encryption algorithms working in the wavelet
domain also attracted some attention [35,3943]. In addition, some novel
image and video compression algorithms have also been proposed to
realize joint compressionencryption schemes.
Although many efforts have been devoted to better solutions for image
and video encryption, the current security analyses of many schemes
are not sufficient, especially for the security against known-plaintext
and chosen-plaintext attacks. What is worse, many selective encryption
schemes are, indeed, insecure against ciphertext-only attack, due to the
visible information leaking from unencrypted data.
139
2.
140
3.
4.
5.
142
144
Note that almost all selective MPEG encryption schemes have the
defects mentioned in section Selective Encryption and the encryption
of the headers will cause the loss of format-compliance.
146
2.
The above two operations are similar to those used in the original
ScharingerFridrich schemes. This generalized encryption scheme has
two special features: (1) The decryption key is different from the encryption key and (2) the decryption key depends on both the encryption key
and the plain-image. In fact, the decryption key is a permutation of the
addresses containing in the encryption key. A defect of this scheme is
that its key size is too long.
Image Encryption Schemes Based on Fractallike Curves
The permutations defined by discretized 2-D chaotic maps can also be
generated from a large group of fractallike curves, such as the Peano
Hilbert curves [108]. Due to the tight relationship between chaos and
149
150
mod 256
Note that the generated sequence fbig is generally not balanced; that
is, the number of 0s is different from that of 1s, because the variant
density function of the Logistic map is not uniform[148]; that is, the
Logistic map is not a good choice for encryption, so it is better to use
other 1-D chaotic maps with uniform variant density. However, in the
following, one can see that Yen et al.s encryption schemes are not secure
even when fbig satisfies the balance property.
152
pixels. Apparently, it cannot resist known-plaintext and chosenplaintext attacks, but it is generally difficult to derive the secret key
by breaking the permutation matrix.
In References 103 and 104, the Logistic map is used as a chaotic
stream cipher (more than a chaotic PRNG) to mask the SPIHTencoding bit stream of a wavelet-compressed image. To resist
known-plaintext and chosen-plaintext attacks, three different
masking operations are used and the selected operation at each
position is dependent on previous cipher-bits.
154
video: most sign bits of DC coefficients, the AC coefficients of Imacroblocks, sign bits of AC coefficients of P-macroblocks, and sign
bits of motion vectors. The proposed scheme is a stream cipher
based on three chaotic maps: the skew tent map, the skew
sawtooth map, and the discretized Logistic map. The outputs of
the first two chaotic maps are added and then the addition is scaled
to be an integer between 0 and 255. Each scaled integer is used as
the initial condition of the third map to generate a 64-size key
stream to mask the plaintext with XOR operation. To further
enhance the security against known-plaintext chosen-plaintext
attacks, it was suggested to change the key every 30 frames.
CVES (Chaotic Video Encryption Scheme): In Reference 159, a chaosbased encryption scheme called CVES using multiple chaotic
systems was proposed. CVES is actually a fast chaotic cipher that
encrypts bulky plaintext frame by frame. The basic idea is to
combine a simple chaotic stream cipher and a simple chaotic block
cipher to construct a fast and secure product cipher. In Chapter 9
of Reference 91, it was pointed out that the original CVES is not
sufficiently secure against the chosen-plaintext attack, and an
enhanced version of CVES was proposed by adding ciphertext
feedback. Figure 4.4 gives a diagrammatic view of the enhanced
CVES, where CCS and ECS(1) to ECS(2n ) are all piecewise linear
chaotic maps, and m-LFSR1 and m-LFSR2 are the perturbing PRNG of
CCS and ECS respectively. The encryption procedure of CVES can
158
159
160
161
162
163
164
165
166
167
Key Management
and Protection for
IP Multimedia
Rolf Blom, Elisabetta Carrara, Fredrik
Lindholm, Karl Norrman, and Mats Na slund
INTRODUCTION
IP multimedia services have, as the name indicates, the Internet Protocol
(IP) Suite as a common communication platform. However, the open
Internet has achieved a poor reputation with respect to security and,
today, security problems end up in media headlines and users have
become much more security-aware. Thus, strong security of multimedia
applications will be an absolute requirement for their general and
widespread adoption. This chapter describes a general framework for
how to secure multimedia services.
The popularity of multimedia applications is not only a growing market
for the fixed-access Internet users, but it is also an increasing and promising market in the emerging third generation (3G) of mobile communications. This calls for more adaptive multimedia applications, which
can work both in a more constrained environment (such as cellular or
modem access) and broadband networks. Environments with a mixture
of different types of network are usually referred to as heterogeneous
networks.
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
169
SCENARIOS
There are many different types of multimedia application, all with different characteristics and requirements. The targeted environments, with
all the assumptions that can be made of the underlying network and end
devices, also affect these applications. When considering security, it
may not always be possible to use one single solution for all types of
multimedia application; instead, the scenario and environment may force
different types of solution. As an example, an application may use TLS to
secure downloadable content. However, the application would not be
able to reuse the TLS session to also secure the streaming session that
uses RTP over UDP (instead it could use SRTP to secure the RTP session).
173
Alice
Mallory
Signaling traffic
Bob's
SIP proxy
Media traffic
Internet
Bob
Home network
Access network
Figure 5.2.
174
Figure 5.3.
Alice's
SIP proxy
Alice
Bob
Mallory
Signaling traffic
Carol's
SIP proxy
Internet
Media traffic
Carol
Home network
MCU
Access network
Figure 5.4.
unit.
176
Figure 5.5 Example of a streaming session with one server and two clients.
177
Figure 5.7.
direction) and two video streams (also one in each direction) between
the involved parties. To protect these media sessions, a security protocol
needs to be applied for each separate session (e.g., SRTP in the case of
RTP traffic; see also section The Secure Real-Time Transport Protocol).
It would be inconvenient to separately set up a security association for
each protected media session, as that would increase both computational cost and the required number of message exchanges between the
parties. MIKEY allows setting up security associations for multiple
protected media sessions in one run of the protocol. In MIKEY, a
protected media session is generally referred to as a Crypto Session (CS).
A collection of CSs is called a Crypto Session Bundle (CSB) (e.g.,
corresponding to a multimedia session).
The main target of MIKEY (see Figure 5.7) is to establish a SA and a
corresponding set of Traffic Encryption Keys (TEK) for each CS. This is
accomplished by first running the key transport and exchange mechanism, which establishes the SA and so-called TEK Generation Keys (TGK)
at all entities involved in the communication. When the TGK is in place,
individual TEKs for each CS can be derived from these TGKs using a
predetermined key derivation function.
The SA includes parameters required to protect the traffic for the security protocol such as algorithms used and key lifetimes. What parameters
actually go into the SA is dependent on the security protocol used. Note
that different media streams within the same multimedia session could
for example use completely different protection algorithms or different
key sizes.
180
Figure 5.8.
Figure 5.9.
182
Figure 5.10.
Figure 5.11.
another masking layer of HMACs, which produces the final bit string
of the P-function.
THE SECURE REAL-TIME TRANSPORT PROTOCOL
The Secure Real-Time Transport Protocol (SRTP) [2] is an application
layer security protocol designed by the IETF community for RTP. SRTP
secures RTP as well as the control protocol for RTP (RTCP). This section
goes through the building blocks of SRTP and its applications.
The SRTP is not the only mechanism that can be used to secure RTP.
For example, IPsec may be used as network security protocol. However,
as mentioned, applying security at the application level allows tuning of
the security protocol to the need of the application within a particular
scenario (e.g., to reduce bandwidth consumption).
Protocol Overview
The SRTP is a profile of the Audio-Video Profile (AVP) of RTP (called
Secure AVP [SAVP]), as it defines extensions to RTP and RTCP specifically to secure the applications. SRTP is built as a framework, following
general practice to allow for future extensions of the protocol. This
makes it possible to add new, possibly even more efficient algorithms
in the future and it ensures that it is possible to replace the current
cryptographic algorithms if they reveal weaknesses in the future.
186
Figure 5.13.
ROC always starts at zero and is, therefore, only communicated explicitly
when a receiver joins an ongoing session. The ROC is updated locally by
observing wrap-arounds in the RTP sequence number space.
Protecting RTCP. The SRTP also provides confidentiality, integrity, and
replay protection of the RTCP traffic (see Figure 5.13). Although optional
for RTP, integrity protection is mandatory for RTCP. This is because
RTCP is a control protocol, and, as such, it can inflict actions like
termination of the session. The integrity of certain RTCP messages is
therefore critical.
Figure 5.14.
190
There are other practical reason why rekeying may be needed. For
the currently defined transforms in SRTP, rekeying has to be triggered
192
Figure 5.15.
194
Figure 5.16. RTSP signaling, using MIKEY to set up the media security.
196
Streaming Media
Encryption
Heather Yu
Imagine being able to use your home PC as a control center from which you
can direct audio or video content (music, movies, and so on) from the
Internet or your hard drive to play on your stereo or TV. Further imagine
sitting on your couch with friends and family viewing your latest vacation
pictures on your TV a slide show streamed directly from your PC. Digital
content, broadband access, and wired and wireless home networks are
ushering in a new digital media age that will make such things possible.
M. Jeronimo and J. Weast, UPnP Design by Example, Intel Press,
U.S. 2003
197
Figure 6.1.
Figure 6.3.
which key is the starting key. Packet IDs by themselves cannot determine
the home base because they contain no information that points to the
starting key.
Figure 6.3 illustrates both the initialization and the media transmission
processes of Encryptionite. Since the packets do not contain any keys,
they can be streamed at higher rates (lower bit-rate increase) compared
to those systems that place the decryption key inside the packets. In
addition, in Encryptionite, packets do not need to be received in any
particular order (partially loss resilient) because packet encryption is not
based on a packets relationship to other packets.
When the client receives an encrypted packet, the packet ID is used to
determine which key should be used to decrypt the packet. All keys
located within the encryption index are unique, random, and not
mathematically related. Each packet in a stream has its own unique
encryption key. In the rare case that one packet is decrypted by a
hacker, that person would only have access to a single packet. Hacking
additional packets would be a time-consuming and difficult process. The
likelihood of a successful hack is further complicated by extrastrong
encryption keys from 612 bits to over 2200 bits (application-adequate
security level).
208
5.
Client and server key indexes are synchronized (see Figure 6.3).
Each packet of data is encrypted by keys that are determined
based on a relationship between the starting key and the packet ID.
Each encrypted packet is sent to the client without key information.
The client receives the encrypted packet and decrypts it using the
appropriate key as determined by the relationship between the
packet ID and the starting key.
The packet of media is ready to be displayed.
Figure 6.4.
Figure 6.5.
Stream cipher.
any error, even if it is just 1 bit, in either the ciphertext bit stream or the
key stream, a corresponding error will occur in the deciphered plaintext
bit stream without error propagation. However, when there is a
ciphertext bit loss, the subsequent ciphertext bit stream will decrypt
incorrectly.
In a self-synchronizing SC, the key stream is a function of a fixed number
of previous ciphertext bits; that is, the output ciphertext at encryption
is fed back to the encryption engine to encrypt a subsequent bit.
Note that it is important to choose a SC (e.g., choose the use of
synchronization mode, feedback or forward mode, and the key length)
based on the security requirement and time constraint of each application.
Next, let us look at how CBC and SC can be used to facilitate secure
transcoding for encrypted streaming media through a sample scheme.
A Scalable Streaming Media Encryption Scheme That Enables Transcoding
Without Decryption. Wee and Apostolopoulos [15] proposed a scalable
Algorithm I:
Figure 6.6.
0
At the decoder, assume descriptions Dj0 , Dj1
, . . ., Dj0 (J j) < M, j > 0,
0
0
J > 0, Qj Qbm are received, where Qj denotes the number of errorless
packets, which include all base layer packets of Dj0 and Qj0 Qbm
enhancement layer packets, received for description Dj0 :
Assume a base-layer packet Dq0 bm, j < m < J, is lost during transmission,
0
0
reconstruction is done using packets Dj0 , . . . , Dm1
, Dm1
, . . . , DJ0 . The
reconstruction quality is proportional to the total number of descriptions
J j 1 received.
Alternatively, a media stream can be partitioned into base layer and
enhancement layer first. The base layer can be further partitioned into
multiple descriptions. In this way, base-layer error resilient capability is
achieved through multiple description encoding and encryption whereas
the enhancement layer error resilient capability may be achieved similar
to that was proposed in Reference 16.
Assume a base-layer packet Dm, 1 m M, is lost during transmission and a base layer is reconstructed using packets D01 , D02 , . . . , D0m1 ,
214
216
217
Part III
Multimedia
Watermarking
Survey of
Watermarking
Techniques and
Applications
Edin Muharemagic and Borko Furht
INTRODUCTION
A recent proliferation and success of the Internet together with the
availability of relatively inexpensive digital recording and storage devices
have created an environment in which it became very easy to obtain,
replicate, and distribute digital content without any loss in quality. This
has become a great concern to the multimedia content (music, video, and
image) publishing industries, because technologies or techniques that
could be used to protect intellectual property rights for digital media and
prevent unauthorized copying did not exist.
Although encryption technologies can be used to prevent unauthorized access to digital content, it is clear that encryption has its limitations
in protecting intellectual property rights: Once a content is decrypted,
there is nothing to prevent an authorized user from illegally replicating
digital content. Some other technology was obviously needed to help
establish and prove ownership rights, track content usage, ensure
authorized access, facilitate content authentication, and prevent illegal
replication.
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
221
Figure 7.1.
added to the luminance values of the cover image pixels. The embedding
strength factor is used to impose a power constraint in order to ensure
that once embedded, the watermark will not be perceptible. Note that
once the embedding strength factor s is selected, it is applied globally to
all cover images that need to be watermarked. Also note that this
embedding procedure creates the watermark W independently of the
cover image CO.
According to the model of Digital Watermarking System depicted in
Figure 7.2, the watermark detector will work on a received image C, which
could be represented either as C C^ W CO Wm N, if the image was
watermarked, or as C CO N otherwise, where N is a noise caused by
normal signal processing and attacks.
To detect the watermark, a detector has to detect the presence of the
signal W in the received, possibly watermarked, image C. In other words,
the detector has to detect the signal W in the presence of noise caused by
CO and N. Assuming that both CO and N are AWGN, the optimal method
of detecting watermark W in the received image C is based on computing
the linear correlation between W and C:
1
1 X
W C
LCW , C
wij cij
7:1
I J
I J i, j
where wij and cij represent pixel values at location i, j in W and C and
I and J represent the image dimensions.
If the received image C was watermarked (i.e., if C CO Wm N), then
LCW , C
1
W CO W Wm W N
I J
7:2
225
Because we assumed that CO and N were AWGN and we have created the
watermark W as AWGN, the additive components of linear correlation,
W CO and W N, are expected to have small magnitudes and the
component W Wm sW W is expected to have a much larger magnitude.
This is illustrated in Figure 7.4, where it is shown that AWGNs generated
as pseudorandom patterns using different keys (i.e., seeds) have a very
low correlation with each other, but a high correlation with itself.
Therefore, if a calculated linear correlation LC(W, C ) between the received
image C and watermark W is small, then a conclusion can be made that
the image C was not watermarked. Otherwise, the image C was
watermarked. This decision is usually made based on a threshold T,
so that if LC(W, C ) < T, the watermark W is not detected in C, and if
LC(W, C) > T, the watermark W is detected in C.
A watermark detection procedure based on threshold is illustrated in
Figure 7.5. Two curves represent distribution of linear correlation (LC)
values calculated for the set of unmarked images (the curve that peaks
for the detection value 0), and for the set of watermarked images (the
curve that peaks for the detection value 1). For a selected threshold
value T, the portion of the curve for the unmarked images to the right of
the threshold line T, represents all tested unmarked images, which will
be erroneously detected as marked images, and the portion of the curve
for the marked images to the left of the threshold line T represents
watermarked images which will erroneously be declared as unmarked.
The former error is called a false-positive error and the latter is called
a false-negative error. The false-negative error rate can also be seen
as a measure of efficiency of the watermarking system because it
226
7:3
7:4
where the e
ci components are extracted from the received, possibly
watermarked, image and the ci components are extracted from the
original cover image. The watermark is said to be present in the received
image if sim(W, W 0 ) is greater than the given threshold.
Because the original image is needed for calculation of the extracted
watermark W 0 , which is used as part of the watermark presence test, this
watermarking system falls into the category of systems with informed
detectors.
The authors used an empirically determined value of 0.1 for the
embedding strength factor s and chose to spread the watermark across
1000 lowest-frequency non-DC DCT coefficients (n 1000). Robustness
tests showed that this scheme is robust to JPEG compression to the
quality factor of 5%, dithering, fax transmission, printingphotocopying
scanning, multiple watermarking, and collusion attacks.
Watermarking in the Wavelet Domain. With the standardization of JPEG2000 and a decision to use wavelet-based image compression instead of
DCT-based compression, watermarking techniques operating in the
wavelet transform domain have become more attractive to the watermarking research community. The advantages of using the wavelet
transform domain are an inherent robustness of the scheme to the JPEG2000 lossy compression and the possibility of minimizing computation
time by embedding watermarks inside of a JPEG-2000 encoder. Additionally, the wavelet transform has some properties that could be exploited
by watermarking solutions. For example, wavelet transform provides a
multiresolution representation of images, and this could be exploited to
build more efficient watermark detection schemes, where watermark
detection starts from the low-resolution subbands first, and only if
detection fails in those subbands, it explores the higher-resolution
subbands and the additional coefficients it provides.
7:5
It provides one-on-one mapping between (u,v) 2 <2 , and (, ), 2 <,
2 (0, 2), spaces and scaling and rotation in the (u, v) space convert
into a translation in the (, ) space. The (, ) space is converted into
the DFT magnitude domain to achieve translation invariance, and the
Figure 7.7.
detection.
Figure 7.8. Dirty paper channel studied by Costa. There are two noise sources,
both AWGN. The encoder knows the characteristics of the first noise source
(dirty paper or the original cover) before it selects (watermark) W.
236
Figure 7.9.
detection.
Table 7.1.
Criteria
Watermark
embedding
Categories
Blind
Informed embedding
Watermark and
cover merging
Addition
Quantization
241
Informed coding
Characteristics
Continued
Criteria
Categories
Masking
Main technologies
Spread spectrum
Watermark
detection
Blind or oblivious
Informed
Characteristics
Informed embedding that takes advantage of the properties
of the HVS to optimize the watermark embedding operation.
Optimization could maximize watermark energy while
keeping a visual distortion of the watermarked cover
constant, or alternatively, it could minimize visual distortion
while keeping the watermark energy constant.
Addition-based watermarking method that uses spread
spectrum technology to maximize security and robustness
of the embedded watermark and minimize distortion of the
watermarked cover. The watermark energy is spread across
visually important frequency bands, so that the energy in
any one band is small and undetectable, making the embedded
watermark imperceptible. However, knowing the location and
the content of the watermark makes it possible to concentrate
those many weak watermark signals into a single signal with
high watermark to noise ration.
Quantization-based watermarking method which uses a set
of N-dimensional quantizers, one quatizer for each possible
message m (i.e., watermark W ) that needs to be transmitted.
Watermarking system that does not require the original
cover work to be able to detect the embedded watermark.
Watermarking system that uses the original cover work in
the watermark detection process.
242
Table 7.1.
Workspace domain
Spatial
Transform
DCT
DFT
243
Wavelet
Continued
Criteria
Multibit
watermarking
Categories
Direct message coding
Bit coding
Space division
Frequency division
Code division
Characteristics
Each multibit message is mapped to an individual,
uniquely detectable watermark.
Individual message bits are mapped into watermarks.
The cover work is divided in space into equal-sized
blocks, and watermarks representing individual message
bits are embedded into different blocks, one
watermark per block.
Individual message bits are mapped into watermarks,
and watermarks representing those individual message
bits are placed into disjoint frequency bands.
Individual message bits are mapped into watermarks,
and watermarks representing those individual message
bits are spread across the whole cover work. The
embedded watermarks will not interfere with each
other because they have been selected to be
mutually orthogonal.
244
Table 7.1.
7:7
vi , j
SLCi, j, k maxfSLi, j, k
CO i, j, k
SLi, j, k1 vi, j g
7:8
7:9
ei, j, k
SLCi, j, k
7:10
1X
cw i co i 2
N N
7:12
7:14
4xy xy
x2 y2 x 2 y2
7:15
Figure 7.10. The Lena image used as a test image on the left and the cropped
part of the original image, which identifies the copyright owner, Playboy
Enterprises, Inc. on the right.
Watermark Purpose
Application Scenarios
Protection of intellectual
property rights
Conveys information
about content ownership
and intellectual property rights
Ensures that the original
multimedia content has not
been altered and helps
determine the type and
location of alteration
Represents the side
channel used to carry
additional information
Copyright protection
Copy protection
Fingerprinting
Authentication
Integrity checking
Content verification
Side-channel information
Broadcast monitoring
System enhancement
authentic image. The middle one is the modified version of the original
image, and the right one shows the image region that has been tampered
with. Because it is so easy to interfere with a digital content, there is a
need to be able to verify the integrity and authenticity of the content.
A solution to this problem could be borrowed from cryptography,
where a digital signature has been studied as a message authentication
method. A digital signature essentially represents some kind of summary
of the content. If any part of the content is modified, its summary, the
signature, will change, making it possible to detect that some kind of
tampering has taken place. One example of digital signature technology
being used for image authentication is the trustworthy digital camera
described in Reference 29.
Digital signature information needs to be somehow associated and
transmitted with a digital content from which it was created. Watermarks
can obviously be used to achieve that association by embedding a
signature directly into the content. Because watermarks used in the
content authentication applications have to be designed to become
invalid if even slight modifications of digital content take place, they are
called fragile watermarks. Fragile watermarks, therefore, can be used to
confirm authenticity of a digital content. They can also be used in
applications where it is important to figure out how the digital content was
modified or which portion of it has been tampered with. For digital images,
this can be done by dividing an image into a number of blocks and creating
and embedding a fragile watermark into each and every block.
Digital content may undergo lossy compression transformation, such
as JPEG image conversion. Although resulting JPEG compressed image
still has an authentic content, the image authenticity test based on the
fragile watermark described earlier will fail. Semifragile watermarks can be
used instead. They are designed to survive standard transformations,
such as lossy compression, but they will become invalid if a major
change, such as the one in Figure 7.12, takes place.
256
259
260
Robust
Identification of
Audio Using
Watermarking and
Fingerprinting
Ton Kalker and Jaap Haitsma
INTRODUCTION
There are a large number of (audio) applications where audio identification plays a large role in the feasibility and profitability of the overall
systems. One of the better known applications in this context is
broadcast monitoring. It refers to the automatic playlist generation of
radio, television, or Web broadcasts for, among others, purposes of
royalty collection, program verification, advertisement verification, and
people metering. Currently, broadcast monitoring is still a manual
process, that is, organizations interested in playlist, such as performance
rights organizations, currently have real people listening to broadcasts
and filling out scorecards.
Connected audio is another interesting (consumer) applications
where music is somehow connected to additional and supporting
information. Using a mobile phone to identify a song is one of these
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
261
AUDIO FINGERPRINTING
Audio Fingerprint Definition
An audio fingerprint is defined as (a representation of ) a perceptual
summary of an audio object. More formally, a fingerprint function F
maps an audio object X, consisting of a large number of bits, to a
fingerprint F(X) of only a limited number of bits, such that F(X)
captures most of the perceptually relevant aspects. Here, we can draw
an analogy with well-known cryptographic hash functions. A cryptographic hash function H maps an (usually large) object X to a (usually
small) hash value (a.k.a. message digest). A cryptographic hash function
allows comparison of two large objects X and Y by just comparing their
respective hash values H(X ) and H(Y ). Strict mathematical equality of
263
2.
3.
Figure 8.1.
shown in Figure 8.1. The audio signal is first segmented into overlapping
frames. The overlapping frames have a length of 0.37 sec and are
weighted by a Hanning window with an overlap factor of 31/32. This
strategy results in the extraction of one subfingerprint for every
11.6 msec. In the worst-case scenario, the frame boundaries used during
identification are 5.8 msec off with respect to the boundaries used in the
database of precomputed fingerprints. The large overlap assures that
even in this worst-case scenario, the subfingerprints of the audio clip to
be identified are still very similar to the subfingerprints of the same clip
in the database. Due to the large overlap, subsequent subfingerprints,
have a large similarity and are slowly varying in time. Figure 8.2a is an
example of an extracted fingerprint block and the slowly varying
character along the time axis.
The most important perceptual audio features live in the frequency
domain. Therefore, a spectral representation is computed by performing
a Fourier transform on every frame. Due to the sensitivity of the phase of
the Fourier transform to different frame boundaries and the fact that the
Human Auditory System (HAS) is relatively insensitive to phase, only the
absolute value of the spectrum (i.e., the power spectral density) is
retained.
In order to extract a 32-bit subfingerprint value for every frame, 33
nonoverlapping frequency bands are selected. These bands lie in the
range from 300 to 2000 Hz (the most relevant spectral range for the HAS)
and have a logarithmic spacing. The logarithmic spacing is chosen
because it is known that the HAS operates on approximately logarithmic
bands. Experimentally, it was verified that the sign of energy differences
(simultaneously along the time and frequency axes) is a property that is
very robust to many kinds of audio processing step. If we denote the
268
Figure 8.2. (a) Fingerprint block of original music clip, (b) fingerprint block of
a compressed version, and (c) the difference between (a) and (b) showing the
bit errors in black (bit error rate [BER] 0.078).
False-Positive Analysis
Two 3-sec audio signals are declared similar if the Hamming distance (i.e.,
the number of bit errors) between the two derived fingerprint blocks is
below a certain threshold T. This threshold value T directly determines
the false-positive rate Pf (i.e., the rate at which audio signals are
incorrectly declared equal): The smaller T, the smaller the probability Pf
will be. On the other hand, a small value of T will negatively affect
the false-negative probability Pn (i.e., the probability that two signals are
equal, but not identified as such).
In order to analyze the choice of this threshold T, we assume that the
fingerprint extraction process yields random iid (independently and
identically distributed) bits. The number of bit errors will then have a
binomial distribution (n,p), where n equals the number of bits extracted
and p ( 0.5) is the probability that a 0 or 1 bit is extracted. Because
n ( 8192 32 256) is large in our application, the binomial distribution
can be approximated bypa
Normal distribution with a mean np and
standard deviation np1 p. Given a fingerprint block F1, the
270
erfc
Pf p
e
dx
n
8:2
2
2 12pn
2
where denotes the bit error rate (BER).
However, in practice the subfingerprints have a high correlation along
the time axis. This correlation is due not only to the inherent time
correlation in audio but also to the large overlap of the frames used in
fingerprint extraction. A higher correlation implies a larger standard
deviation, as shown by the following argument.
Assume a symmetric binary source with alphabet {1,1} such that the
probability that symbol xi and symbol xi 1 are the same and equal to q.
Then, one may easily show that
Exi xik ajkj
8:3
8:4
For N large, the probability density function of the average ZN over N
consecutive samples of Z can be approximately described by a Normal
distribution with mean 0 and standard deviation equal to
s
1 a2
8:5
N 1 a2
Translating the above back to the case of fingerprints bits, a correlation
factor a between subsequent fingerprint bits implies an increase in
standard deviation for the BER by a factor
r
1 a2
8:6
1 a2
To determine the distribution of the BER with real fingerprint blocks, a
fingerprint database of 10,000 songs was generated. Thereafter, the BER
of 100,000 randomly selected pairs of fingerprint blocks were determined.
The standard deviation of the resulting BER distribution was measured to
be 0.0148, approximately three times higher than the 0.0055 one would
expect from random independent and identically distributed (i.i.d.)
sources.
271
Figure 8.3. Comparison of the probability density function of the BER plotted
as and the normal distribution.
Figure 8.3 shows the log probability density function (pdf ) of the
measured BER distribution and a Normal distribution with mean of 0.5
and a standard deviation of 0.0148. The pdf of the BER is a close
approximation to the Normal distribution. For BERs below 0.45, we
observe some outliers, due to insufficient statistics. To incorporate the
larger standard deviation of the BER distribution, Equation 8.2 is modified
by inclusion of a factor 3:
1
1 2 p
p
Pf erfc
n
8:7
2
3 2
The threshold for the BER used during experiments was 0.35. This
means that out of 8192 bits, there must be less than 2867 bits in error in
order to decide that the fingerprint blocks originate from the same song.
Using Equation 8.7, we arrive at a very low false-positive rate of
3.6 1020.
Table 8.1 the experimental result for a set of tests based on the SDMI
specifications for audio watermark robustness. Almost all the resulting
bit error rates are well below the threshold of 0.35, even for global system
for mobile communication (GSM) encoding.1 The only degradations that
lead to a BER above threshold are large linear speed changes. Linear
speed changes larger then 2.5% or 2.5% generally result in bit error
1
Recall that a GSM codec is optimized for speech, not for general audio.
272
Processing
Orff
Sinead
Texas
AC/DC
MP3@128Kbps
MP3@32Kbps
Real@20Kbps
GSM
GSM C/I = 4dB
All-pass filtering
Amp. Compr.
Equalization
Echo Addition
Band Pass Filter
Time Scale +4%
Time Scale -4%
Linear Speed +1%
Linear Speed -1%
Linear Speed +4%
Linear Speed -4%
Noise Addition
Resampling
D/A A/D
0.078
0.174
0.161
0.160
0.286
0.019
0.052
0.048
0.157
0.028
0.202
0.207
0.172
0.243
0.438
0.464
0.009
0.000
0.088
0.085
0.106
0.138
0.144
0.247
0.015
0.070
0.045
0.148
0.025
0.183
0.174
0.102
0.142
0.467
0.438
0.011
0.000
0.061
0.081
0.096
0.159
0.168
0.316
0.018
0.113
0.066
0.139
0.024
0.200
0.190
0.132
0.260
0.355
0.470
0.011
0.000
0.111
0.084
0.133
0.210
0.181
0.324
0.027
0.073
0.062
0.145
0.038
0.206
0.203
0.238
0.196
0.472
0.431
0.036
0.000
0.076
Figure 8.4. Bit errors per subfingerprint for the MP3 @ 128 Kbps version of
excerpt of O Fortuna by Carl Orff.
274
in the real fingerprint lists where the respective 32-bit subfingerprints are
located. In practical systems with limited memory,2 a LUT containing 232
entries is often not feasible, or not practical, or both. Furthermore, the
LUT will be sparsely filled, because only a limited number of songs can
reside in the memory. Therefore, in practice, a hash table [16] is used
instead of a lookup table.
Let us again do the calculation of the average number of fingerprint
block comparisons per identification for a 10,000-song database. Because
the database contains approximately 250 million subfingerprints, the
average number of positions in a list will be 0.058 ( 250 106/232). If we
assume that all possible subfingerprints are equally likely, the average
number of fingerprint comparisons per identification is only 15
( 0.058 256). However, we observe in practice that, due to the
nonuniform distribution of subfingerprints, the number of fingerprint
comparisons increases roughly by a factor of 20. On average, 300
comparisons are needed, yielding an average search time of 1.5 msec on a
modern PC. The LUT can be implemented in such a way that it has no
impact on the search time. At the cost of a LUT, the proposed search
algorithm is approximately a factor of 800,000 times faster than the brute
force approach.
The observing reader might ask: But, what if your assumption that
one of the subfingerprints is error-free does not hold? The answer is that
the assumption almost always holds for audio signals with mild audio
signal degradations. However, for heavily degraded signals, the assumption is, indeed, not always valid. An example of a plot of the bit errors per
2
For example a PC with a 32-bit Intel processor has a memory limit of 4 GB.
275
Figure 8.6. Bit errors per subfingerprint (dotted line) and the reliability of the
most reliable erroneous bit (solid line) for the MP3 @ 32 Kbps version of
O Fortuna by Carl Orff.
subfingerprint for a fingerprint block that does not contain any error-free
subfingerprints is shown in Figure 8.6. There are, however, subfingerprints that contain only one error. Therefore, instead of only checking
positions in the database where 1 of the 256 subfingerprints occurs, we
can also check all the positions where subfingerprints occur that have a
Hamming distance of 1 (i.e., 1 toggled bit) with respect to all the 256
subfingerprints. This will result in 33 times more fingerprint comparisons,
which is still acceptable. However, if we want to cope with situations that,
for example, the minimum number of bit errors per subfingerprint is
three (this can occur in the mobile phone application), the number of
fingerprint comparisons will increase with a factor of 5489, which leads to
unacceptable search times. Note that the observed nonuniformity factor
of 20 is decreasing with increasing number of bits being toggled. If, for
instance, all 32 bits of the subfingerprints are used for toggling, we end up
with the brute force approach again, yielding a multiplication factor of 1.
Because randomly toggling bits to generate more candidate positions
results very quickly in unacceptable search times, we propose using a
different approach that uses soft decoding information; that is, we
propose to estimate and use the probability that a fingerprint bit is
received correctly.
The subfingerprints are obtained by comparing and thresholding
energy differences (see bit derivation block in Figure 8.1). If the energy
difference is very close to the threshold, it is reasonably likely that the bit
was received incorrectly (an unreliable bit). On the other hand, if the
276
278
Audio watermarking and audio fingerprinting are both signalprocessed identification technologies. From their definitions, the
most important difference is easily deduced: Watermarking
involves (host) signal modifications, whereas audio fingerprinting
does not. Although watermarks are designed to be imperceptible,
there are, nonetheless, differences between original and watermarked versions of a signal. The debate whether or not these
differences are perceptible very often remains a point of contention. Practice has shown that for any watermarking technology,
audio clips and human ears can be found that will perceive the
difference between original and watermarked versions. In some
applications, such as archiving, the slightest degradation of the
original content is sometimes unacceptable, ruling out audio
watermarking as the identification technology of choice.
Obviously, this observation does not apply to audio fingerprinting.
3.
4.
5.
6.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
281
High-Capacity
Real-Time Audio
Watermarking with
Perfect Correlation
Sequence and
Repeated Insertion
Soo-Chang Pei, Yu-Feng Hsu, and Ya-Wen Lu
INTRODUCTION
Because of the maturization of efficient and high-quality audio compression techniques (MPEG I-Layer III or MP3 in brief) and the booming
of the internet connection, copyrighted audio productions are spread
widely and easily. This enables pirating and illegal usage of unauthorized data and intellectual property problems become serious. To deal with
this problem, a sequence of data can be embedded into the audio creation, and in the case of authority ambiguity, only those with the correct
key can extract the embedded watermark to declare their ownership.
Digital audio watermarking remains a relatively new area of research
compared with digital image and text watermarking. Due to its special
features, watermark embedding is implemented in very different ways
other than image watermarking.
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
283
N k k,
9:1
N
1
X
sp nsp n m
9:2
n0
Then,
m E m
E,
0,
m0
m 6 0
9:3
N
1
X
s2p n
9:4
n0
N
1
X
sp nej2nk=N ,
0 k<N
9:5
n0
By combining Equations 9.1, 9.2, and 9.4, and taking the absolute value of
Sp(k), the DFT of the PACF is given by
m E m $ jSp kj2 E
9:6
Therefore, the
magnitude
of the spectrum of a perfect sequence is always
p
the constant E .
For most applications, perfect sequences should possess a good energy
efficiency , as given by
N
1
X
s2p n
N maxs2p n
n0
9:7
1 2
9:8
287
Figure 9.1. (a) Perfect sequence of length 10,000, (b) its autocorrelation,
(c) amplitude frequency spectrum, and (d) phase frequency spectrum.
Ai, j 0
if i 0
if j 0, i 6 0
if Cr iCs j 1
otherwise
9:9
if Ai, j 1
if Ai, j 0
9:10
Ai, jGi k, j p
rs 1
2
9:11
Uniformly redundant arrays with size (7,5) and (13,11) are shown in
Figure 9.2.
Correlation Property. The circular correlation function of A and G is
a two-dimensional (2-D) delta function with the element in the intersection of the first column and the first row proportional to the number
of 1s in A, which is the value (rs 1)/2 as indicated in Equation (9.11)
and Figure 9.3 and the rest all zeros. For example, the URA of size (7,5)
in Figure 9.2a contains in each matrix (7 5 1)/2 18 of 1s; therefore,
the circular correlation function in Figure 9.3a is a matrix of all 0s
except that the upper-left-most element is 18. It is similar with the URA
of size (13,11) in Figure 9.2b and Figure 9.3b.
Figure 9.2.
Figure 9.3.
290
9:12
where stands for circular correlation. The scaling factor k is typically 0.1.
The block diagrams of watermark embedding and extraction are
shown in Figure 9.4 and Figure 9.5.
When I is added into the host audio A, repeated insertion is adopted;
that is, each sample in I is repeatedly added into L consecutive samples of
each block with size L in A. This concept is illustrated in Figure 9.6 [22].
Watermark Extraction
The process of watermark extraction is simply the inverse of watermark
embedding. Because the scrambled watermark is inserted repeatedly, the
average of each repeating block must be computed to determine the
original added signal.
It is necessary to refer to the original host audio A when the watermark
is to be extracted. The received stego audio S is subtracted from A,
Figure 9.5.
Figure 9.6.
obtaining the noiselike signal I, which is then correlated with the perfect
sequence P to restore the watermark W:
W 0 S A P
kI P
kW P P
kW P P
kW
kW
292
9:13
2.
3.
4.
Figure 9.7.
5.
6.
1
1
eAD4:6877
9:14
The logistic functions L(AD) for AD 010 and AD 0100 are shown in
Figure 9.7. If AD closer to 0 and L(AD) closer to 1, the two audio signals
are of higher perceptual similarity. For two identical signals, AD is 0 and
L(AD) is 0.9909 [25,26].
The correlation of this measurement with subjective test results is
compared with other objective test results [26]. Both of the two structures of MNB yield high correlation values of 0.986 and 0.959, whereas
the L( BSD) (logistic function of Bark spectral distortion) is only 0.368 and
L( ND) (logistic function of noise disturbance) is 0.793, as shown in
Figure 9.8.
EXPERIMENTAL RESULTS
In this experiment, watermark robustness is tested against the MPEG
I-Layer III (MP3) compression attack. The stego audio is compressed into
MP3 format and then decompressed into WAV format again. To increase
its robustness, the precompression process is implemented in Figure 9.9.
Before the watermark embedding, both the host audio and the watermark
clips are compressed and then decompressed to remove any residual
data. This precompression process erases the information that is beyond
consideration and preserves only the meaningful parts in MP3 encoding,
minimizing the impact that MP3 attack may have on the stego audio.
294
Figure 9.10.
delayed, the first sample of the stego audio is no longer the first in the
host audio and the resulting difference by subtracting the host audio
from the stego audio will be completely wrong. The temporal delay is
shown in Figure 9.10. From our experiment, the compressed audio is
delayed by 1,058 samples when the audio clip is sampled with a sample
rate of 44,100 Hz, equaling 0.024 sec. It implies that if the synchronization
is not recovered before watermark extraction, the 1059th sample of stego
audio will be subtracted an amount equal to the 1059th sample of
the host audio, whereas the desired result is the 1059th sample of stego
audio will be subtracted the amount of the first sample of host audio,
if the waveform is carefully observed and subtraction is carefully
observed. This will damage the extraction process severely because
the subtraction is carefully observed and sutraction results will be
completely wrong.
There are several methods to overcome this problem. What we use in
this research is the simplest one: adding a number, say, 10, of 1s as the
starting tag of host audio, as shown in Figure 9.11. In the watermark
extraction process, the ten 1s are searched; after we successfully find the
ten consecutive 1s in the 1059th to 1068th samples of the stego audio,
the 1st to 1058th samples are removed, retaining the data beginning at
the 1059th sample. Further computations are done after the synchronization is recovered.
296
PIANO
SPMLE
STREN
SPFLE
Clip Content
Clip Length
13.531 sec
14.315 sec
17.914 sec
17.580 sec
Table 9.2. Audio clips and PN binary data used as watermark (sample
rate 7350 Hz)
Clip No. Clip Name
1
2
3
4
GUITR
SPFSE
VOILN
PN sequence
Clip Content
Clip Length
2.900 sec
5.538 sec
4.232 sec
30,000 bits
297
Table 9.3. Stego audio and extracted watermark quality: music clips in piano
music; audio similarity acoustic distance measure using AD: 01 and L(AD):
01, and higher perceptual similarity means AD 0, L(AD) 1 (host audio:
PIANO, watermark: GUITR)
Watermark : Music
Host : Music
Channel 1
Channel 2
Perfect Sequence
298
AD
L(AD)
AD
L(AD)
0.3216
2.2553
0.2448
0.1789
0.3846
2.7722
1.6150
0.9875
0.9193
0.9884
0.9891
0.9867
0.8716
0.9558
0.3515
2.3489
2.0587
2.0005
2.1495
2.8355
2.0612
0.9871
0.9120
0.9327
0.9363
0.9268
0.8644
0.9325
Host : Speech
Channel 1
Channel 2
Perfect Sequence
AD
L(AD)
AD
L(AD)
0.5159
2.6243
0.2121
0.1550
0.3730
2.7817
2.0568
0.9848
0.8873
0.9887
0.9894
0.9868
0.8706
0.9328
0.7301
2.9198
2.0706
2.0188
2.1544
2.8463
2.3456
0.9812
0.8542
0.9320
0.9352
0.9264
0.8631
0.9123
Table 9.5. Stego audio and extracted watermark quality: English speech in
piano music; audio similarity acoustic distance measure using AD: 01 and
L(AD): 01, and higher perceptual similarity means AD 0, L(AD) 1 ( host
audio: PIANO, watermark: SPFSE)
Watermark : Speech
Host : Music
Channel 1
Channel 2
Perfect Sequence
AD
L(AD)
AD
L(AD)
0.1909
4.1408
0.3574
0.2910
0.4964
4.9556
3.5377
0.9890
0.6334
0.9870
0.9878
0.9851
0.4334
0.7595
0.1682
4.1542
0.9173
0.8842
1.0296
4.9680
3.5444
0.9892
0.6303
0.9775
0.9782
0.9749
0.4304
0.7583
Host : Speech
Channel 1
Channel 2
Perfect Sequence
AD
L(AD)
AD
L(AD)
0.2926
4.2222
0.2350
0.2147
0.4028
4.9703
3.8327
0.9878
0.6143
0.9885
0.9887
0.9864
0.4298
0.7016
0.3651
4.2521
0.8666
0.8648
0.9854
4.9757
3.8481
0.9869
0.6072
0.9786
0.9786
0.9759
0.4285
0.6984
Table 9.7. Stego audio quality (watermark: PN data); audio similarity acoustic
distance measure using AD: 01 and L(AD): 01, and higher perceptual
similarity means AD 0, L(AD) 1
Host
Music (PIANO)
Speech (SPMLE)
Channel 1
Channel 2
AD
L(AD)
AD
L(AD)
0.9612
1.3566
0.9765
0.9655
0.8343
1.3195
0.9792
0.9667
MP3
Cropping (20%)
Downsampling (50%)
Echo (delay 40)
Time stretch (2%)
Quantization
(16!8 bits)
300
Repeat
(times)
Music
(PIANO)
(bits)
Detection
Rate
Speech
(SPMLE)
(bits)
Detection
Rate
5
10
15
5
10
60/6,000
0/3,000
0/2,000
3,078/30,000
0/30,000
34/30,000
14,951/30,000
14,858/30,000
2,036/6,000
2/3,000
99%
100%
100%
89.8%
100%
99.9%
50.2%
50.2%
66.1%
99.9%
20/6,000
0/3,000
0/2,000
3,079/30,000
0/30,000
34/30,000
14,955/30,000
13,404/30,000
2,255/6,000
16/3,000
99.7%
100%
100%
89.8%
100%
99.9%
50.2%
55.4%
62.5%
99.5%
Host : Music
Channel 1
Channel 2
URA
302
AD
L(AD)
AD
L(AD)
0.4142
2.1438
0.9284
0.2347
2.3018
2.3392
1.8426
0.9863
0.9272
0.9772
0.9885
0.9157
0.9128
0.9451
0.4030
2.0706
0.8398
0.1492
2.2319
2.3851
1.7457
0.9864
0.9320
0.9791
0.9894
0.9210
0.9091
0.9499
Host : Speech
Channel 1
Channel 2
URA
AD
L(AD)
AD
L(AD)
0.6994
2.2325
0.9191
0.1990
2.2978
2.3395
1.9464
0.9818
0.9209
0.9774
0.9889
0.9161
0.9128
0.9394
0.7428
2.1569
0.8470
0.1580
2.2329
2.3842
1.9909
0.9810
0.9263
0.9790
0.9893
0.9209
0.9092
0.9368
Table 9.11. Stego audio and extracted watermark quality: English speech in
piano music; audio similarity acoustic distance measure using AD: 01 and
L(AD): 01, and higher perceptual similarity means AD 0, L(AD) 1 ( host
audio: PIANO, watermark: SPFSE)
Watermark : Speech
Host : Music
Channel 1
Channel 2
URA
AD
L(AD)
AD
L(AD)
0.2700
4.1208
3.5394
3.2272
3.7214
4.3002
3.8005
0.9881
0.6380
0.7592
0.8116
0.7244
0.5957
0.7083
0.2470
4.1994
0.9192
0.2676
2.7445
4.2771
3.6250
0.9883
0.6197
0.9774
0.9881
0.8747
0.6012
0.7432
Discussions
The above-measured stego audio qualities imply that this technique can
guarantee watermark transparency to an extent, but not good enough. It
should be the ultimate goal to achieve zero acoustic distances between
the stego and host audio clips.
If the extracted watermark qualities under all kinds of combination
are carefully investigated, for MP3 attack it is clear that URA outperforms
the perfect sequences no matter what types of audio clip are under
303
Host : Speech
Channel 1
Channel 2
URA
AD
L(AD)
AD
L(AD)
0.4740
4.2248
0.9584
0.3172
2.7280
4.2417
4.0618
0.9854
0.6137
0.9766
0.9875
0.8765
0.6097
0.6516
0.4643
4.1630
0.9583
0.3101
2.7444
4.2799
4.0600
0.9856
0.6283
0.9766
0.9876
0.8747
0.6006
0.6520
Music (PIANO)
Speech (SPMLE)
Channel 1
Channel 2
AD
L(AD)
AD
L(AD)
0.7357
1.1333
0.9811
0.9722
0.6403
1.1001
0.9828
0.9731
Repeat
(times)
Music
(PIANO)
(bits)
Detection
Rate
Speech
(SPMLE)
(bits)
Detection
Rate
MP3
5
10
15
5
10
1,013/6,000
268/3,000
76/2,000
4,790/30,000
0/30,000
861/30,000
14,950/30,000
136/30,000
37/6,000
9/3,000
84.1%
92.1%
96.2%
84.0%
100%
97.1%
50.2%
99.5%
99.4%
99.7%
462/6,000
99/3,000
30/2,000
4,787/30,000
0/30,000
860/30,000
14,953/30,000
1,451/30,000
355/6,000
76/3,000
92.3%
96.7%
98.5%
84.0%
100%
97.1%
50.2%
95.2%
94.1%
97.5%
Cropping (20%)
Downsampling (50%)
Echo (delay 40)
Time stretch (2%)
Quantization
(16!8 bits)
304
Figure 9.15. Similarity values of music clips extracted from a piano solo.
Table 9.15. Extracted watermark quality using difference block sizes; audio
similarity acoustic distance measure using AD: 01 and L(AD): 01, and higher
perceptual similarity means AD 0, L(AD) 1 ( host audio: PIANO, watermark:
GUITR)
Watermark : Music
Host : Music
Channel 1
Channel 2
URA
Block size
1
5
10
14
16
AD
L(AD)
AD
L(AD)
0.3080
2.2381
2.3130
2.1618
2.1438
0.7989
0.9205
0.9149
0.9259
0.9272
3.1766
2.2356
2.3002
2.1793
2.0706
0.8192
0.9207
0.9159
0.9247
0.9320
Music_music P
Music_speech P
Speech_music P
Speech_speech P
Channel 1
1-D
2-D
1-D
2-D
1-D
2-D
1-D
2-D
Channel 2
AD
L(AD)
AD
L(AD)
2.2553
2.0981
2.6243
2.3117
4.1408
4.0692
4.2222
4.1005
0.9193
0.9302
0.8873
0.9150
0.6334
0.6499
0.6143
0.6427
2.3489
2.2149
2.9198
2.4622
4.1542
4.0832
4.2521
4.1018
0.9120
0.9222
0.8542
0.9025
0.6303
0.6467
0.6072
0.6424
Channel 1
Music_music
Music_speech
Speech_music
Speech_speech
Table 9.18.
sequences
Perfect
sequence
PN data
(repeat times)
Repeat 5
Repeat 10
Repeat 15
Table 9.19.
URA
PN data
(repeat times)
Repeat 5
Repeat 10
Repeat 15
Channel 2
AD
L(AD)
AD
L(AD)
2.1438
2.2325
4.1208
4.2248
0.9272
0.9209
0.6380
0.6137
2.0706
2.1569
4.1994
4.1630
0.9320
0.9263
0.6197
0.6283
1-D
Speech
(bits)
Music
(bits)
Speech
(bits)
Speech
(bits)
1013/6,000 (84.1%)
268/3,000 (92.1%)
76/2,000 (96.2%)
462/6,000 (92.3%)
99/3,000 (96.7%)
30/2,000 (98.5%)
307
308
309
310
10
Multidimensional
Watermark for
Still Image
Parallel Embedding
and Detection
Bo Hu
INTRODUCTION
Digital data access and operation became easier because of the rapid
evolution of digital technology and the Internet. Security of multimedia
data has been a very important issue.
One approach to data security is to use cryptography. However, it
should be noted that a cryptosystem restricts access to the data. Every
person who wants to access the data should know the key. Once the data
are decrypted, the protection of data are invalidated. Unauthorized
copying and transmission of the data cannot be prevented.
The digital watermark has been proposed as an effective solution to
the copyright protection of multimedia data. Digital watermark is a
process of embedding information or signature directly into the media
data by making small modifications to them. With the detection of the
signature from the watermarked media data, the copyright of the media
data can be resolved.
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
311
Figure 10.1.
c(k) with length Lc 1. This is performed by the block named Mod-2
adder, which is defined as
bnTb kTc anTb ckTc ,
k 0, 1, . . . , Lc 1
10:1
where 1/Tb is the bit rate R of the information sequence, 1/Tc is the bit rate
R0 RLc of the spread spectrum sequence, and is the Exclusive OR.
Because Lc 1, to transmit b(n, k) in the same period, the bit rate R0 is
much larger than R and needs more bandwidth. However, the spread
spectrum technique is widely used in digital communication because of
its many desirable attributes and inherent characteristics, especially in
an environment with strong interference. The three main attributes are:
The use of the pseudorandom code to spread the original signal allows
easier resolution at the receiver of different multipaths created in
the channel.
The spreading follows a technique where the signal is spread into a
much larger bandwidth, decreasing the ability for any outside party
to detect or jam the channel or the signal.
The last and the most important feature is the forward error correction
coding; when done with the spreading and the despreading
operations, it provides robustness to noise in the channel.
At the receiver, the received signal y(n,k) is despread with the same
pseudorandom sequence c(k) and the transmitted information sequence
a0 (n) is decided by
8
LP
c 1
>
>
1,
ynTb kTc ckTc Lc =2
>
>
<
K0
10:2
a0 nTb
>
LP
>
c 1
>
>
ynTb kTc ckTc < Lc =2
: 0,
K0
P c 1
If there is no error in transmission, Lk0
ynTb kTc ckTc will be
Lc for a(n) 1 or 0 for a(n) 0. The distance between 0 and 1 information
314
bit is much larger than that without spreading. From the theory of
communication, it means greater ability for combating the interference. A
detailed discussion of spread spectrum signal can be found in the book
by Proakis [7].
In 1996, Cox et al. introduced the spread spectrum signal into the
digital watermark to improve its robustness [2]. As shown in Figure 10.2,
digital watermark can be thought as a spread spectrum communication
problem. The watermark information to be sent from the sender to the
receiver is spread by a pseudorandom sequence and transmitted through
a special channel. The channel is composed of the host image and noise
introduced by signal processing or attack on the watermarked image.
Similar to the spread spectrum communication, the watermark
information is represented by a random sequence with a much larger
length. Then, the power of the spread spectrum sequence could be very
small when embedded into the image, but the robustness of the
watermark is kept.
In our method, the watermark information is represented by four
spread spectrum sequences and embedded into the image independently. By joint detection discussed in Section Joint Detection of a
Multidimensional Watermark, the robust of the watermark is much
improved.
Watermark in the DCT Domain Considering HVS
In order to increase the compression ratio, we must take advantage of the
redundancy in most image and video signals, including spatial redundancy and temporal redundancy. The transform coding methods, such as
DFT, DCT, and wavelet transform, belong to the spatial domain methods.
In the transform domain, the energy of signals is mostly located in the
low-frequency components. With carefully selected quantization thresholds on different frequency bands and entropy coding, the image or video
can be efficiently compressed.
315
u, v, x, y 0, 1, . . . , 7
10:3
where
8 q
< 1
8
au q
: 1
for u 0
10:4
for u 1, 2, . . . , 7
and similarly for a(v) f(x, y) is the value of image pixel (x, y) and T(u, v) is
that of the component (u, v) in the DCT domain.
After transform, the frequency components are usually Zig-Zag
ordered as shown in Figure 10.3. The DC coefficient is indexed as 0 and
the highest frequency component is indexed as 63.
As specified earlier, the watermark should be embedded in the lowfrequency band to improve its robustness. However, the low-frequency
coefficients generally have much higher power than others. A small
change of them will results in a severe degradation of image.
Figure 10.3.
316
14
15
27
28
13
16
26
29
42
12
17
25
30
41
43
11
18
24
31
40
44
53
10
19
23
32
39
45
52
54
20
22
33
38
46
51
55
60
21
34
37
47
50
56
59
61
35
36
48
49
57
58
62
63
I2
14
15
27
28
I3
13
16
26
29
42
I1 I4
12
17
25
30
41
43
11
18
24
31
40
44
53
10
19
23
32
39
45
52
54
20
22
33
38
46
51
55
60
21
34
37
47
50
56
59
61
35
36
48
49
57
58
62
63
Calculating the corresponding masking thresholds: Based on measurements of the human eyes sensitivity to different frequencies, an
image-independent 8 8 matrix of threshold can be obtained,
denoted as Tf (u, v), u, v 1, 2, . . . , 8.
LhX
Lv
1
X0, 0, b
Lh Lv b1
10:6
10:7
10:9
10:10
It can be proved that the mean value of Wj(i ) is zero. Then, the
watermark signal is embedded into the host image in the DCT domain in
parallel:
Ij0 i Ij i Wj i,
i 1, . . . ; Lv Lh , j 1, . . . , 4
10:11
f x, y
7 X
7
X
u0 v0
2u 1x
2v 1y
T u, vaxay cos
cos
10:12
16
16
10:13
Under hypothesis H0, the image does not contain the claimed watermark,
whereas it does under hypothesis H1 N(i ) is the interference possibly
resulting from signal processing. The correlation detector outputs the
test statistics qj:
Pn
p
My n
i1 Yj i
p
10:14
qj
Vy
Vy n
Yj i Xj iWj0 i
10:15
where n is Lh Lv, the size of test vector Xj. My and Vy are the mean value
and variance of Yj(i ), respectively.
322
i1
Wj iWj0 i N iWj0 i
p
Vy n
10:16
2.
Figure 10.8.
Joint Detection
For n-dimension random variables, if we have
Fx1 x2 xn x1 , x2 , . . . , xn PfX1 x1 , X2 x2 , . . . , Xn xn g
10:17
10:19
10:20
If we want perr1 to be less than 0.0001, the Perr1 for each qj needs to be
less than 0.00011/4 0.1, that means Tj 1.28 for Normal distribution
N(0,1). To ensure that the total error probability of Type 2 is not
large than a, the correct detection probability of qj must satisfy
Perr2 > (1 0.0001)1/4 0.9999975; that is, m > T 4.05 5.33. Therefore,
the requirement of m is decreased; in other words, the error detection
probability is lower than the single-dimension watermark when equal
watermark energy is embedded.
For the 512 512 Lena image into which we embedded the 4-D
watermark, the pdf of the detection output q1 under hypotheses H0 and
H1 is shown in Figure 10.9, which obey the Normal distributions N(0,1)
and N(m,1). The mean values of q1 are 0.0015 and 11.9382, respectively.
The four detection outputs are similar; their corresponding detection
threshold can be chosen to be the same T. The choice of T will affect the
hypothesis testing error probability, Table 10.1 shows the two types of
Type 2 Error
6.0996e4
4.0549e5
1.5471e6
...
...
3.6204e5
y2 =2 2
10:21
327
Figure 10.12.
Mean of q
SNR (dB)
11.9382
10.4648
42.0210
41.4656
9.6853
9.3468
40.9581
40.2361
1 watermark
2 watermarks
3 watermarks
Mean of q
SNR (dB)
11.9448
10.9527
10.3564
42.0210
38.9410
37.1568
Scaling
Before watermark detection, the test image will be changed to its initial
size. Image scaling results in noise to the image. As shown in Table 10.2,
the effect of detection is knee high to a mosquito.
Multiple Watermarking
Multiple watermarks can be contained in the same image simultaneously.
By using spread spectrum signals, the different watermarks are mutually
independent. We embed 27,621 different three watermarks in 512 512
Lena; Table 10.3 shown the mean SNR multiple watermarks introduced.
Three watermarks only introduce 37.1568 dB noise, so multiple watermarks can be detected correctly.
CONCLUSION
Image watermarking for resolving copyright protection is a private
watermark; its aim is high robustness, not large capability. Multidimensional watermarking in the low to mid-band of DCT coefficients
improves the watermark robustness. It is robust to JPEG compression,
low-pass filtering, and so forth. Experiments shows that this scenario has
high robustness.
REFERENCES
1. Voyatzis, G., Nikolaidis, N., and Pitas, I., Digital Watermarking: an Overview, Eus98,
1998.
329
330
11
Image
Watermarking
Method Resistant
to Geometric
Distortions*
Hyungshin Kim
INTRODUCTION
With the advances in digital media technology and proliferation of
internet infrastructure, modification and distribution of multimedia data
became a simple task. People can copy, modify, and reformat digital
media and transmit them over the wireless high-speed Internet with no
burden. Companies owning multimedia contents wanted to secure their
property from illegal usage. For this purpose, the digital watermark has
attracted attention from the market. The digital watermark is the invisible
message embedded into the multimedia content. Since its introduction
into the field of multimedia security and copyright protection, responses
from the researchers were overwhelming.
*Some material in this chapter are from Rotation, scale, and translation invariant image
watermark using higher order spectra, Opti. Eng., 42(2), 2003 by H. S. Kim, Y. J. Baek, and
H. K. Lee with the permission of SPIE.
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
331
11:1
x2
y2
t
x1
x
ty
y1
11:2
11:4
where I(u, v) is the Fourier transform of the image i(x,y). Hence, it is clear
that the magnitude of the Fourier transform is invariant to translation in
the spatial domain. This translation property of the Fourier transform is
very useful, as we can use the Fourier magnitude spectrum whenever
we need the translation invariance. Figure 11.3 shows the magnitude
spectrum of the original Lena image and the translated Lena image. It
verifies that the magnitudes of the Fourier transform of translated image
are unchanged after translation.
Scaling of an image with a scaling factor s is expressed as
i0 x, y isx, sy
11:5
Note that scaling factors in the row and column directions are the
same. It is because we are dealing with rigid-body distortion. Their
vector-space representation is given as
x2
s 0 x1
11:6
y2
0 s y1
Figure 11.3. Magnitude Fourier transform of (a) original Lena image and
(b) translated Lena image.
336
y e sin
11:7
The sampling grid of the LPM is shown as circles in Figure 11.4. The
points are sampled at exponential grids along the radial direction.
Figure 11.4b shows the LPM result of the Lena image. The vertical
direction corresponds and the horizontal direction is the -axis.
Figure 11.4c and Figure 11.4d show the LPM of two scaled Lena images
with different scale ratio.
As expected, the LPM representation of the original Lena is shifted
along the radial direction. After simple math, the scale in the Cartesian
Figure 11.4. Log-polar mapping examples: (a) LPM sampling grid; (b) LPM of
Lena; (c) LPM of 50% enlarged Lena; (d) LPM of 50% reduced-size Lena.
337
Figure 11.4.
Continued.
11:8
For the spectral aspects of the scaled image, we first look at the 1-D
signal model and then we simply extend our discussion into 2-D image.
The scale-change of an image can be viewed as the sampling rate
change in 1-D sequences. For scaling down (s < 1), decimation is the
related operation, and for scaling up (s > 1), reconstruction is the related
1-D operation. If x(t) is the 1-D signal and its Fourier transform is X(e jw),
the decimation and reconstruction can be expressed in the spectrum
domain as
xt , Xejw
xst , sXejw=s
11:9
becomes very small and, hence, if the resulting image shows aliasing,
exact recovery of the original signal is impossible. When we reconstruct a
1-D signal at a finer sampling grid, the spectrum is contracted as shown in
Figure 11.5c. The spectrum of the scaled 2-D image is the straightforward
extension of the 1-D case and it is expressed as
isx, sy , s2 I
u v
,
s s
11:10
11:11
in vector notation,
x2
y2
cos
sin
sin
cos
x1
y1
11:12
11:13
11:14
Figure 11.7a shows the rotated Lena with 30 and Figure 11.7b
shows its Fourier magnitude spectrum. Note artifacts in the spectrum on
the bright cross. As explained in Reference 12, this phenomenon
results from the image boundaries, and any method using the rotation
Figure 11.6. Log-polar-mapped images of (a) original, (b) 92 , (c) 183 , and
(d) 275 rotated Lena.
340
11:15
11:16
Figure 11.8 shows the line integral procedure and an example the
Radon transform of Lena. The projection slice theorem [14] states that
the Fourier transform of the projection of an image onto a line is the 2-D
Fourier transform of the image evaluated along a radial line. From the
theorem, we can use 2-D Fourier transform instead of the Radon
transform during implementation.
342
C (I , G )
E (N , O )
x 10
5
4.5
150
4
100
3.5
50
3
0
2.5
2
50
1.5
100
1
150
0.5
0
50
100
150
Figure 11.8. The Radon transform: (a) illustration of line integral; (b) Radon
transform of Lena.
343
p
Z
0:5
Bf1 , f1 df1
f1 0
Z
0:5
2
I f , I 2f , df
11:17
f1 0
11:18
Z
0:5
2
I f , I 2f , df
0
p
11:19
f
,
s
11:20
p0
Z
0:5
s3 I 2
Z
0:5=s
Z
f
2f
, I
, df
s
s
f
2f
, I
, df
I
s
s
2
0:5
I 2 f , I 2f , df
p
11:21
11:22
11:23
p
Hence, the vector p will be circularly shifted by .
If1 , f2 DFTfix, yg
11:24
The M N polar map Ip(f, ) is created from I(f1, f2) along N evenly
spaced s in 0 , . . . ,180 and it is shown as
Ip f , If cos , f sin
11:25
where
f
q
f12 f22
arctanf2 =f1
11:26
11:27
Figure 11.9.
procedure.
Z
0:5
f1 0
Z
0:5
f1 0
Ip f1 Ip f1 Ip f1
pw
f1 df1
11:28
347
11:29
11:30
EXPERIMENTAL RESULTS
For valid watermark generation, r1and r2 are determined empirically
using unmarked images. The similarity s is measured between unmarked
test images, and the smallest s is chosen for r2. For the determination
of r1, robustness of the defined feature vector is tested. We used the
Stirmark [2] to generate attacked images. Similarly, s is measured
between the original image and attacked images. The largest s is
chosen for r1. For our implementation, we set r1 4.5 and r2 20.
Feature vectors are modified with 5 7 at randomly selected
angles. The number of insertion angles is randomly determined between
1 and 3. A threshold T 4.5 is used for the detection threshold.
Watermarks are generated using the iterative procedure described in
section Watermark System Design. During the iteration, parameters are
adjusted accordingly. Figure 11.10a shows the watermarked Lena image
and Figure 11.10b shows the amplified difference between original and
watermarked images. The watermarked image shows a PSNR of 36 dB and
the embedded signal is invisible. During the watermark insertion, we
maintained the PSNR of the watermarked images higher than 36 dB.
Figure 11.10.
watermark.
Figure 11.10.
Continued.
Scaling
Small-angle rotation and cropping
Random geometric distortion
JPEG compression
Proposed Approach
Digimarc
Suresign
1.0
0.95
0.93
1.0
0.72
0.94
0.33
0.81
0.95
0.5
0
0.95
Distortion
Rotation
Scaling
Random geometric attack
Compression
Gaussian noise
False-Positive Probability
False-Negative Probability
3.36 102
3.5 106
7.89 102
2.8 103
6.64 1015
2.3 103
2.21 106
2.90 103
2.2 1020
2.85 103
DISCUSSIONS
Rotation Invariance
Because we use the rotation property of DFT for rotation invariance, we
need to employ methods that can compensate for the problems identified
in the literature [7,15]. For algorithms that use the Fourier magnitude
353
CONCLUSION
In this chapter, we have proposed a new RST-invariant watermarking
method based on an invariant feature of the image. We have overviewed
the properties of RST distortion in various aspects. Based on understanding of geometric distortions, we have designed a watermarking
system that is robust against geometric distortions.
A bispectrum feature vector is used as the watermark and this
watermark has a strong resilience to RST attacks. This approach shows
the potential in using a feature vector as a watermark. An iterative
embedding procedure is designed to overcome the problem of inverting
the watermarked image. This method can be generalized for other
embedding functions that do not have an exact inverse function.
In the experiments, we have shown the comparative Stirmark benchmark performance and the empirical probability density functions with
histograms and the ROC curves. Experimental results show that our
scheme is robust against the designed attacks. The use of the bispectrum
feature as an index for an efficient watermarked image database search
may offer new application possibilities. Various embedding techniques
and capacity issues for the generic feature-based watermark system
should be further researched.
Throughout this chapter, we have looked at the geometric distortions
in 2-D images and their impact on watermarking system. Those rigid-body
distortions explained in this chapter are only a small fraction of the whole
class of geometric distortion. Nonrigid distortions such as shear,
projection, and general linear distortions are more difficult and yet to
be solved. We believe that there will be no single method that could
survive all of the known distortions. Instead, watermark developers
should tailor each method according to their application. Only in that
way will sufficiently robust solutions be provided into the commercial
market.
REFERENCES
1. Hartung, F., and Kutter, M., Multimedia watermarking technique, Proc. IEEE 87, 1079,
1999.
2. Petitcolas, F.A.P., Anderson, R.J., and Kuhn, M.G., Attacks on Copyright Marking
Systems, in Proceedings 2nd International Workshop on Information Hiding, 1998,
p. 218.
356
357
12
Fragile
Watermarking
for Image
Authentication
Ebroul Izquierdo
INTRODUCTION
In an age of pervasive electronic connectivity, hackers, piracy and
fraud, authentication and repudiation of digital media is becoming
more important than ever. Authentication is the process of verification
of the genuineness of an object or entity in order to establish its full or
partial conformity with the original master object or entity, its origin, or
authorship. In this sense, the authenticity of photographs, paintings,
film material, and other artistic achievements of individuals have been
preserved, for many years, by recording and transmitting them using
analog carriers. Such preservation is based in the fact that the reproduction and processing of analog media is time-consuming, involves a heavy
workload, and leads to degradation of the original material. This means
that content produced and stored using analog devices has an in-built
protection against unintentional changes and malicious manipulations.
In fact, conscious changes in analog media are not only difficult, but they
can be easily perceived by a human inspector. As a consequence, the
authenticity of analog content is inherent to the original master picture.
In this electronic age, digital media has become pervasive, completely
substituting its analog counterpart. Because affordable image processing
tools and fast transmission mechanisms are available everywhere, visual
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
359
Figure 12.1.
360
Figure 12.3. Fragile watermarking for complete image verification: the image
is authenticated after each editing, transcoding, and transmission stage.
12:1
12:3
x^
X uTA b
i
vi
s
A
i
s A 6 0
12:4
12:6
12:7
This parametric family of matrices Bs^r determines the linear illposed operator used in the tamper detection procedure according
to Proposition 1 and the next two conditions.
Uniqueness: For a predefined large real number N, there exist a unique
value s^r A, so that the L2 norm solution of the least squares
problem
min kBx bk22
x 2 <p
12:8
Figure 12.4.
370
Watermarking process.
min
8
q
<X
s^r 2 H0 , H1 :
4.
i1
uTBi b
si Bs^r
!2
N2
9
=
;
12:9
where uBi is the ith column of the matrix formed with the right
singular vectors of B, si B are the singular values of B, b is the
right-hand-side vector given in Equation 12.8 and N is a large real
number.
Estimation of the watermarked block A^ U S^V T by setting
S^ diags1 A, . . . , sr 1 A, sr A
The last equation shows how to chose the value s^r A in Equation 12.5
namely by setting s^r A sr A, where sr A is the result of the minimization problem 13.9. Like K, the number N in Equation 12.9 is also secret.
Although it can be assumed that N depends on K, or vice versa, higher
security is achieved when N and K are chosen independently. Thus, the
security of the proposed approach resides in the secrecy of set of keys
fK, N g. Obviously, the feasibility and effectiveness of this watermarking procedure is based in the existence and uniqueness of the value
sr A 2 H0 , H1 obeying the imperceptibility, fragility, and uniqueness
conditions stated earlier. The corresponding analysis to proof the existence of sr A is given next.
The feasibility and effectiveness of the proposed watermarking and verification algorithms is based on two specific conditions: the ill-posedness
of the linear operator to be minimized in Equation 12.8 or Equation 12.10
and the existence of a unique sr A 2 H0 , H1 , minimizing Equation 12.9
for a fix value N. If we show that B is highly ill-conditioned, the validity
of the first condition becomes evident. This can be done using the
following propositions.
Lemma 1: Let A and W be two square matrices with the same
dimension and let sk A and sk W their kth singular values, respectively.
Then,
sij1 AW si Asj W for all integers i and j
For the proof of this lemma, the reader is referred to Proposition 2.3.12
of Reference 22.
371
for t r
12:10
x 2 <p
12:13
Figure 12.5.
Verification process.
According to Proposition 1, if the watermarked block has been manipulated, the difference jN N j becomes large. This behavior is due to illconditioning of B . Because the smallest singular value of B is very close
to zero, any modification of A^ will be reflected in B . Thus, the norm of the
least squares solution of Equation 12.13 will be strongly magnified.
Consequently, becomes huge and, with certainty, larger than .
A fundamental difference between this watermarking approach and
others from the literature is that the watermark W is embedded in A by
transforming W according to Equation 12.6 and using the result of this
operation to estimate s^r A. Consequently, the information contained in
the watermark W is first concentrated in s^r A and then spread in A
according to Equation 12.5. A major advantage of this scheme is that
the distortion or change in the original image can be strictly controlled
using Equation 12.2. Because the distortion induced by the watermarking
374
Figure 12.6. (a) Original image Lena and (b) watermarked version using blocks
of dimension 8 8 and d 0:2, and (c) result of the verification procedure after
the intensity value of four single pixels have been distorted by a factor of 0.1.
375
Figure 12.6.
Continued
h 0.001
h 100
Tattoo Manipulation
0.4812e 000
2.0011e 000
5.1869e 000
0.1427e 000
1.8862e 000
0.2368e 000
2.9367e 000
0.1990e 000
5.0366e 005
2.3448e 004
3.0000e 004
1.4094e 003
2.9563e 004
2.4147e 003
2.9994e 004
2.0223e 003
3.0000e 004
3.0000e 004
7.5929e 005
3.0000e 004
3.0000e 004
7.9900e 006
3.0000e 004
3.0000e 004
Figure 12.8. Response of the verification algorithm for the tampered image
shown in Figure 12.1a.
Figure 12.9. The vector quantization attack: (a) original fake image,
(b) counterfeit created using blocks of size 16 16, (c) 8 8 and (d) 4 4.
379
Figure 12.9.
Continued.
Akw Z k
for k 1
k
Ak1
w Z
else
This simple strategy increases the difficulty of successfully undertaking a vector quantization attack. Basically, using the proposed scheme,
the only way to mount this attack is by replacing large image areas
containing several authenticated blocks. Even so, the blocks at the
border of the swapped area will be recognized as fake blocks.
Swapping Attack
Another related attack consists of swapping blocks of a watermarked
image. This attack will be recognized by the algorithm described previously. As shown in Figure 12.10, even by swapping pairs of consecutive
blocks that have been watermarked together, the verification procedure
detects the changes. In Figure 12.10, the dashed blocks labeled with a
X have been swapped. Because the first tampered block X3 was
authenticated together with V3 (and not with V1), the verification procedure will mark block X3 as tampered. In the first pass, block X4 will be
marked as genuine, because the pairs (X1, X2) and (X3, X4) are watermarked together. However, block V4 will appear as tampered because
it was authenticated together with X2 (not with X4). Consequently, block
X4 will be relabeled as tampered. Finally, block V5 will appear as
authentic, showing that V4 has to be authentic as well. Thus, at the end of
the verification process, only X3 and X4 will be labeled as tampered.
Likewise, tampering by swapping identically positioned blocks from
several authenticated images can be detected.
Cropping
Cropping is probably the simplest form of image manipulation.
For instance, the tampered image in Figure 12.1a was created by using
cropped areas from similar images. Given a cropped image, it is
381
Figure 12.10.
detected.
12:15
REFERENCES
1. Diffie, W. and Hellman, M.E., New directions in cryptography, IEEE Trans. Inf. Theory,
22(6), 644654, 1976.
2. Signal Processing, 66(3), 1998, (special issue on watermarking).
3. IEEE Transactions on Circuits and Systems of Video Technology, 13(8), 2003,
(special issue on authentication, copyright protection and information hiding).
4. Feng, Y. and Izquierdo, E., Robust Local Watermarking on Salient Image Areas, in
Proceedings International Workshop on Digital Watermarking, Seoul, 2002.
5. Proceedings of the IEEE, 87(7), 1999, special issue on identification and protection of
multimedia information.
6. Cox, I.J., Kilian, J., Leighton, T., and Shamoon, T., Secure spread spectrum
watermarking for images, audio and video, IEEE Trans. Image Process., 6(12), 1673
1686, 1997.
7. Wolfgang, R.B., Podilchuk, C., and Deip, E. J., Perceptual watermarks for digital
images and video, Proc. IEEE, 87(7), 11081126, 1999.
8. Ting-Hsu, C. and Ling-Wu, J., Hidden digital watermarks in images, IEEE Trans. Image
Process., 8(1), 5868, 1999.
9. Barni, M., Bartolini, F., Cappellini, V., Lippi, A., and Piva, A., DWT-Based Technique
for Spatio-Frequency Masking of Digital Signatures, in Proceedings SPIE, Security
Watermarking Multimedia Contents, SPIE, 1999, pp. 3139.
10. Kundur, D. and Hatzinakos, D., Towards a Telltale Watermarking Technique for
Tamper Proofing, in Proceedings ICIP, Chicago, 1998.
11. Lin, C.-Y. and Chang, S.-F., Semi-Fragile Watermarking for Authenticating JPEG
Visual Content, in Proceedings SPIE, Security and Watermarking of Multimedia
Contents, San Jose, CA, SPIE, 2000, pp. 140151.
12. Wolfgang, R.B. and Delp, E.J., Fragile Watermarking Using the VW2D Watermark, in
Proceedings SPIE, Security and Watermarking of Multimedia Contents, San Jose, CA,
SPIE, 1999, pp. 204213.
13. Fridrich, J., Security of Fragile Authentication Watermarks with Localization, in
Proceedings SPIE, SPIE, 2002.
14. Lin, C.-Y. and Chang, S.-F., A robust image authentication method distinguishing JPEG
compression from malicious manipulation, IEEE Trans. Circuits Syst. Video Technol.,
11(2), 153168, 2001.
385
386
13
INTRODUCTION
In recent years, digital watermarking has become a very popular topic
that interests more and more persons coming from institutes or
companies to take part in its research and application developments.
A well-known reason for this is that the rapid growth of the Internet and
the widespread use of digital contents create an urgent need for the
protection of intellectual property. Although there are still many issues
to be solved technically and legally before digital watermarking
technology can be applied in the real world, more and more digital
watermarking products have entered the market for watermarking
applications. Moreover, currently, watermarking techniques are expected
to be used for more and more kinds of media, such as the printed
materials, screen images, cloth materials, and even the images painted on
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
387
X S1 W
13:1
where S and X are the original signal and the watermarked signal,
respectively, which can be the pixel value in the spatial domain or the
element value in the frequency domain. W is the watermark information,
which can be represented as a rectangular array of numeric elements,
called the watermarking plane. is a scaling factor to be adjusted
between 0 and 1 to provide a good trade-off between imperceptibility and
robustness.
Equation 13.1 seems uncomplicated in the academic environment and
it may be partly due to this academic simplicity that allows many people
coming from different research fields to take part in the watermarking
research easily without need of special discipline knowledge, as long as
he has a some knowledge of digital signal processing. However, as
mentioned earlier, copyright protection by digital watermarking technology does not allow much optimism because there are still many
crucial issues to be solved technically and politically before practical
applications, and so far, in proposed watermarking methods, any
proposal which is robust to most of the common signal manipulations
does still not exist.
Applications of Digital Watermarking
Although, digital watermarking technology was originally used to protect
the copyright for digital content, they have some other important
applications in practie [1,6]. In general, there are three major roles for
using digital watermarking technology in practical applications as
follows.
390
1.
2.
3.
To establish the content copyright by embedding the copyrightrelated information as well as some attached information into digital
contents, such as the content ID, the standard time, and so forth
To embed digital contents with copy control information (CCI) to
indicate the status of the contents, such as never copy, one
copy allowed and copy freely
To embed the digital contents with a unique user ID code to specify
the authorized users of the contents
1.
2.
1.
2.
H
?
E H ?
>
E H ?
2.
3.
4.
5.
In contrast to nonblind watermarking techniques, the blind watermarking techniques do not need the original signals for watermark
detection, which makes the method more feasible. Therefore, most of the
watermarking products are using blind watermarking techniques.
However, the cost of blind watermarking techniques is that the
robustness of a watermarking scheme will decrease.
Robust and Fragile Watermarking. The objective of robust watermarking
techniques is to embed a watermark into the host signal as robust as
possible to endure any possible signal manipulations to prevent
unauthorized users from destroying or removing the watermark. Therefore, the robust watermarking techniques are mostly used for copyright
protection. To the contrary, the objective of fragile watermarking
methods is to embed a watermark into the host signal in such a way
that it will be destroyed immediately when that watermarked signal is
modified or tampered with. Therefore, the fragile watermarking is often
used for tamper-proofing of images.
Reversible and Inseparable Watermarking. Currently, most of the
proposed watermarking approaches belong to inseparable watermarking
techniques: that is, the watermark is embedded in such a way that it will
be difficult for unauthorized users to separate or remove the watermark
from the watermarked signal, because only in this way can the watermark
be effectively used for copyright protection and authentication. However,
in some situations, the watermarked signals are expected to be
reversible, that is, the original signal can be retrieved from a watermarked signal. In practice, there are two kinds of application of reversible
watermarking. First, when the host signals are valuable data, such as a
medical image, military image, artistic work, and so forth, it is usually
required that the original signal can be retrieved completely from the
395
1.
2.
Printed images
Printed textual images
1.
2.
3.
1.
2.
Before the wide use of digital office tools, such as Microsoft Word
and so on, the paper documents were usually generated by a word
processor. To restore them as digital content, the important text
documents are usually redigitized by scanning.
In general, the text documents need to be printed on paper.
1.
2.
Geometric distortions
Signal distortions
Geometric distortions are invisible to the human eye, which belong the
one of most problematic attacks in digital watermarking, because a small
distortion, such as rotation or scale, will not cause much change in image
quality but will dramatically reduce the robustness of the embedded
watermark. Therefore, the issue for geometric distortions is always an
active topic in digital watermarking, which attracts many researchers to
this area. Generally, the major geometric distortions occurring in printing
and scanning are rotation, scale, and translation (RST).
400
1.
2.
2.
1.
2.
2.
3.
With the above arguments, Cox et al. [9] proposed a method using
spread spectrum watermarking techniques, in which the watermark
represented using a pseudorandom sequence is embedded into the
middle range in the DFT domain. Another well-known scheme for
watermark embedding is to embed the watermark according to the
masking criterion based on the model of the human visual system (HVS),
which exploits the limited dynamic range of the human eye to guarantee
that the watermark is embedded imperceptibly with the most robustness.
The details about watermarking using the HVS model can be found in
References 1013.
The idea for the methods of type 1 is to identify what the distortion
was and to measure the exact amount of the distortion in order to
restore an undistorted watermarked image by inverting the distortion
before applying watermark detection. This can be accomplished by
embedding an additional template along with the general watermark
[1417]. The template contains the information of geometric transformations undergone by the image and is used to detect the distortion
information used for image geometric revisions. With the image
geometrically restored from the scanned image, it is possible to extract
the general watermark correctly from the geometrically restored image.
However, the cost of these methods is a reduction in image fidelity,
because it is required to embed the watermark with additional template
information.
Some methods have been proposed based on the idea of type 1, in
which the watermark was embedded into the mid-frequency range in the
DFT domain [1416] or in the DWT domain [17] in the form of a spread
spectrum signal. The template consisted of a number of peaks random
arranged in the mid-frequency range in the DFT domain as well. However,
some researchers have complained that these templates may be easy to
remove.
Instead of using an additional template, Kutter [18] has proposed
a method based on an autocorrelation function (ACF) of a specially
designed watermark. In the method [18], the watermark is replicated in
the image in order to use the autocorrelation of the watermark as a
reference point. Voloshynovskiy et al. [19] have proposed a method
based on the shift-invariant property of the Fourier transform. In the
method [19], the watermark is embedded into a period block allocation in
order to recover watermark pattern from geometric distortion.
The idea for the methods of type 2 is that the watermark should be
embedded into a domain that is invariant to geometric distortion.
Theoretically, the FourierMellin domain is the place that is invariant to
RST distortion. ORuanaidh and Pun [20] first proposed the watermarking scheme based on the FourierMellin transform and showed that
the method can be used to produce watermarks that are resistant to
RST distortions. Figure 13.3 is a diagram of a prototype RST-invariant
404
watermarking scheme. In the proposed method, the watermark embedding is accomplished in the process as follows:
1.
2.
3.
4.
5.
Figure 13.4. The improved FourierMellin method, which avoids mapping the
original image into the RST-invariant domain: (a) the watermark embedding
process; (b) the watermark detecting process.
32
16
44
28
35
19
47
31
8 3 8 Threshold matrix
8
56
4
52
11
59
7
55
40
24
36
20
43
27
39
23
2
50
14
62
1
49
13
61
34
18
46
30
33
17
45
29
10
58
6
54
9
57
5
53
42
26
38
22
41
25
37
21
13:3
1,
qx, y 0:5
0,
13:4
13:5
1.
2.
3.
410
p(x, y 1)
Past
d(x, y)
Present
e(x, y 1) d(x, y) 5/16
Future
p(x 1, y 1)
Past
e(x 1, y) d(x, y) 7/16
Future
e(x 1, y 1) d(x, y) 1/16
Future
Figure 13.6.
412
Figure 13.8. The design of an optical lens for decoding. The incident light will
focus on shifting dots through the optical lens at the given angle. (From Li, J.Y.,
Chou, T.R., and Wang, H.C., presented at IPPR Conference on Computer Vision,
Graphics and Image Processing, Kinmen, Taiwan, 2003.)
detected with an optical lens that filters the printed image to exhibit the
hidden one.
Figure 13.8 shows the designed optical lens for decoding. The number
of lenticules per inch is decided by the screening resolution in the
halftoning process. For example, if the screening resolution is 150 dpi, the
watermarked image probably will probably be decoded with an optical
lens of 75 dpi or 150 dpi. Therefore, the embedded watermark will become
apparent as the refracted light is aimed the shifted dots and when the
lens is rotated in the appropriate direction.
Watermarking for Printed Textual Images
Roles of Watermarking Printed for Textual Images. The demand for
document security is increasing higher in recent years, because with
the fast developments in hardcopy techniques, the photocopy infringements of copyright are always important issues concerning publishers.
Especially, with the spread of the Internet, an electronic document can
be easily sent to other persons by e-mail with far less cost than the
hardcopy by copy machines. Therefore, the copyright protection and
413
1.
2.
Line-shift coding
Word-shift coding
Feature coding
Figure 13.9. Window patterns; the top four patterns represent bit 1 and the
lower four patterns represent bit 0.
416
1.
2.
3.
Figure 13.11.
Figure 13.12.
a coupon.
Figure 13.13.
embedded.
Using
Using
Using
Using
the
the
the
the
telephone function
bar code function
distributor code function
digital watermarking function
The last method is usually considered the best way of using mobile
phone to make commodity applications, because with this method,
customers can make the applications for their chosen commodities by
pushing only one bottom on the mobile phone; meanwhile, it does not
require the sample images to be printed with an attached pattern. For
example, compare the two business cards shown in Figure 13.15 and
Figure 13.16, where the information about the Web site address was
422
Figure 13.16. An example of a business card on which the 2-D code contained
the information about the Web site address of the company.
inserted in the photo on the card of Figure 13.15 and the 2-D code on the
card of Figure 13.16. Needless to say, the card printed with the
watermarked face photo is better. Therefore, this is may be the most
reason why advertisers choose watermarking technology as the method
for applications using mobile phones.
Watermarking Techniques of Using Mobile Cameras
Major Problems for Watermarking Using Mobile Cameras. Similar to
the general process of watermarking printed materials, watermarking
using a mobile camera involves the processes of printing and image
capturing. In the printing process, the image is embedded with the
concerned information, such as the Web site address, and then the
watermarked image is printed on paper. In the image capturing process, a
customer can capture the printed image using a mobile camera and then
connect to a Web site with the information extracted from the captured
image. Therefore, the major problem of watermarking using a mobile
camera is the geometric distortion, such as rotation, scale, and
423
1.
2.
3.
424
1.
2.
Local decoding
Center decoding
425
On June 17, 2003, Kyodo Printing Company [77] announced that it has
succeeded in developing a system for extracting watermark information
from the printed image using a mobile camera [78]. In its system, the
information about the Web site address is embedded in a design pattern
that is thinly spread over the image. The watermark extraction is accomplished by using center decoding. Figure 13.19 shows a sample from the
Kyodo Printing Company in which a sample of a zoo map where the block
images indicating the animal locations was inserted into the design
pattern (Figure 13.19a), a signboard showing how to use a mobile camera
to get information about the animals in local places (Figure 13.19b), a
enlarged block image where a design pattern can be clearly observed
(Figure 13.19c).
On July 7, 2003, NTTGroup [79] announced that it has succeeded in
developing a system for extracting watermark information from the
printed image using a mobile camera [80]. In its system, the information
about the Web site address is embedded into the noise pattern and
then the watermarked pattern is inserted as a background pattern.
The watermark extraction is accomplished by using center decoding.
426
Figure 13.19. A demo image from Kyodo Printing Company. (a) A sample of a
zoo map where the block images indicating the animal locations were inserted
with the design pattern; (b) a signboard showing how to use a mobile camera to
get information about local animals; (c) a enlarged block image in which the
design pattern can be clearly observed.
Figure 13.21. An example of the poster in which a block image with a white
frame was embedded with the information about a Web site address by using
M. Kens watermarking technique.
428
Machine
Decode time (sec)
F505i
P505i
D505i
SH505i
SO505i
N505i
4.2
42.2
5.6
4.1
45.1
70.8
the design pattern or noise pattern and (2) the captured images have to
be transmitted to a computer center for watermark extraction, which
will greatly reduced the effectiveness of using a mobile camera for
watermark extraction. Compared with the methods of Kyodo Printing and
NTTGroup, the advantages of the M. Kens method are that (1) the
watermark is embedded using the frequency domain scheme, thus it has
good image quality as well as high robustness, and (2) the watermark can
be extracted in the body of a mobile camera-phone, thus it has the merits
both of efficiency and effectiveness.
However, there are still critical problems for the practical application
of M. Kens method. Table 13.3 lists the speed test results of the
experiment using M. Kens method. In Table 13.3, the maximal ratio of the
fastest one to the lowest one is about 17. In other words, if it can be
accomplished within about 5 sec, watermark decoding using the mobile
phone may be an interesting experience for using the mobile camera to
capture the image. However, it will be worse if the decoding time is as
long as over 1 min. Therefore, unless the computation capability of a
mobile phone is developed to about same level as the general computer,
it is still a long way before practical application is possible for using a
mobile camera to extract watermark.
CONCLUSION
In this chapter, we have introduced new intentions and challenges in the
research and application of watermarking technology for printed
materials, including watermarking techniques for extracting the watermark from a printed image using mobile camera-phones.
In the second section, we gave a brief overview of the current digital
watermarking technology and discussed the corresponding issues, which
are currently very popular topics because they are concerned with
copyright protection of the digital contents on the Internet, but also they
are controversial issues without any final conclusions.
In the third section, we have introduced the watermarking technology
used for printed materials, which is an important topic with challenges in
the DRM system. Figure 13.22 to Figure 13.24 outline this.
429
Figure 13.22.
materials (1).
Figure 13.23.
materials (2).
430
Figure 13.24.
materials (3).
Watermarking Scheme
Method 1: Frequency-domain
watermarking
Representative Application:
[64][81][82]
Method 2: Spatial-domain
watermarking
Representative Application:
[79][80]
Representative Application:
[77][78]
Watermarking System
Representative Researches:
[64][81][82]
Representative Researches:
[79][80][81][82]
REFERENCES
1. Liu, Z., Huang, H.C., and Pan, J.S., Digital watermarking backgrounds, techniques,
and industrial applications, Commn. CCISA, 10(1), 78, 2004.
2. Tirkel, A.Z., Rankin, G.A., Schyndel, R.M., Ho, V., W.J., Mee, N.R.A., and Osborne, C.F.,
Electronic water mark, Presented at International Symposium on Digital Image
Computing Techniques and Applications, Sydney, Australia, December 810, 1993,
p. 666.
3. Tirkel, A.Z. and Hall, T.E., A unique watermark for every image, IEEE Multimedia, 8(4),
30, 2001.
4. Petitcolas, F.A.P., Anderson, R.J., and Kuhn, M.G., Information hiding a survey,
Proc. IEEE, 87(7), 1062, 1999.
5. Lim, Y., Xu, C., and Feng, D.D., Web-Based Image Authentication Using Invisible
Fragile Watermark, in Proceedings of the Pan-Sydney Area Workshop on Visual
Information Processing 2001, Sydney, 2001, p. 31.
6. Liu, Z. and Inoue, A., Watermark for industrial application, in Intelligent Watermarking
Techniques, Pan, J.S., Huang, H.C., and Jain, L.C., Eds., World Scientific, Company,
Singapore, 2004, chap. 22.
7. Lin, C.Y. and Chang, S.F., Distortion Modeling and Invariant Extraction for Digital
Image Print-and-Scan Process, presented at ISMIP 99, Taipei, 1999.
8. Lin, C.Y., Public Watermarking Surviving General Scaling and Ccropping: An
Application for Print-and-Scan Process, presented at Multimedia and Security
Workshop at ACM Multimedia 99, Orlando, FL, 1999.
9. Cox, I.J., Kilian, J., Leighton, F.T., and Shamoon, T., Secure spread spectrum
watermarking for multimedia, IEEE Trans. Image Process., 6(12), 1673, 1997.
10. Swanson, M.D., Zhu, B., and Tewfik, A.H., Transparent Robust Image Watermarking,
in Proceedings of ICIP 96, IEEE International Conference on Image Processing,
Lausanne, 1996, 211.
11. Delaigle, J.F., Vleeschouwer, C.D., and Macq, B., Psychovisual approach to digital
picture watermarking, J. Electron. Imaging, 7(3), 628, 1998.
12. Delaigle, J.F., Vleeschouwer, C.D., and Macq, B., Watermarking algorithm based on a
human visual model, Signal Process.: Image Commn., 66(3), 319, 1998.
13. Wolfgang, R.B., Podilchuk, C.I., and Delp, D.J., Perceptual watermarks for digital
images and video, Proc. IEEE, 87(7), 1108, 1999.
14. Pereira, S. and Pun, T., Fast robust template matching for affine resistant image
watermarking, Lecture Notes in Computer Science, Vol. 1768, Dresden, 1999,
p. 200.
15. Csurka, G., Deguillaume, F., ORuanaidh, J.J.K., and Pun, T., A Bayesian approach to
affine transformation resistant image and video watermarking, in Lecture Notes in
Computer Science, Vol. 1, 1768, Springer-Verlag, Berlin, 1999, p. 270.
16. Pereira, S., ORuanaidh, J.J.K., Deguillaume, F., Csurka, G., and Pun, T., Template
Based Recovery of Fourier-Based Watermarks Using Log-Polar and Log-Log Maps, in
Proceedings of IEEE Multimedia Systems 99, International Conference on Multimedia
Computing and Systems, Florence, 1999, Vol. 1, p. 870.
17. Kang, X., Huang, J., Shi, Y.Q., and Lin, Y., A DWT-DFT composite watermarking
scheme robust to both affine transform and JPEG compression, IEEE Trans. Circuits
Syst. Video Technol., 13(8), 776, 2003.
432
433
434
435
Figure 14.1.
440
Watermark generation.
Figure 14.3. The classification masks that correspond to each one of the five
block classes.
T 00 m, n if jXm, nj > T 0 m, n
0
otherwise
14:1
14:3
Figure 14.5. (a) Original frame from the video sequence Susie, (b) watermarked frame, and (c) amplified difference between the original and the
watermarked frames.
444
Mean PSNR
for I-Frames Only
38.6 dB
33.1 dB
45.6 dB
35.6 dB
36.5 dB
30 dB
40.4 dB
33.2 dB
flowers
Mobile and calendar
Susie
Table tennis
of all the frames of some commonly used video sequences. In addition, Table 14.1 presents the mean of the peak to noise ratio (PSNR)
values of the I-frames (watermarked frames) of each video sequence.
MODELING OF QUANTIZED DCT DOMAIN DATA
It is well known from the literature, that the low- and mid-frequency
DCT coefficients carry the most information of an image or video frame.
Thus, they are more finely quantized than the high-frequency coefficients,
which often vanish after the quantization process. The probability
density functions ( pdfs) of these coefficients are similar to the Gaussian
pdf, as they remain bell-shaped, but their tails are quite heavier [27,28].
This is the reason why the low- and mid-frequency DCT coefficients
are often modeled by the heavy-tailed Laplacian, generalized Gaussian,
or Cauchy distributions. In the case of quantized data examined here,
the DCT coefficients become more discrete-valued, depending, of
course, on the degree of quantization. Nevertheless, their heavytailed nature is not significantly affected, as we show through statistical
fitness tests.
A model often used in the literature [29] with heavier tails than the
Normal pdf is that of the Laplacian distribution
fX x
b
expbjx j
2
14:4
2
:
varx
14:5
2
log t
Q
14:6
14:7
where
u
p
1 16h2
,
2h
vQ
Q2
14:8
et
=2
dt
14:10
2 expba
for a <
expba
otherwise
14:11
Figure 14.8.
14:12
14:13
1 X 2
jY j jY W j2
2
2
14:4
X WY >H1
1 X
W 2 2WY )
2
2
2 <H0
14:15
1 X W2
2
2
14:16
where
14:17
451
14:18
1 X 2 2
MQ X Q
4
14:20
In the case of the Laplacian likelihood ratio of Equation 14.17, the mean
and variance under H0 are similarly found [3] to be
p
X 2
1
14:21
m0
jXQ j jXQ MQ j jXQ MQ j
2
and
02
2
1X 2
M
j
jX
M
j
jX
Q
Q
Q
Q
4
2
14:22
It can be easily proven [3] that under H1, the mean is simply m1 m0
and the variance does not change (i.e., 12 02 ). With the mean and
variance of the Normally distributed likelihood ratio known, the detection
and false-alarm probabilities are respectively given by
m0
m1
, Pdet Q
14:23
Pfa Q
0
1
where is the threshold against which the data are compared and Q(x) is
defined in Equation 14.10. For a given Pfa , we can compute the required
threshold [31] for a watermark to be detected:
m0 0 Q1 Pfa
14:24
14:25
453
Susie
10,657 10,623
66,339
76,679
Susie with blur
6,050.5 6,119.7 57,973
55,195
Susie with more blur
4,652.4 4,607.6 34,316
27,702
Susie with Gaussian blur
7,313.4 7,290
71,277
66,593
Susie with median filter
5,515
5,493 114,510 125,140
Subregion Hair of Susie (cropping attack)
75.4
78.1
420.5
443.5
Subregion Eye of Susie (cropping attack)
316.4
316.7
2,648.8
2,485.4
Subregion Lips of Susie (cropping attack)
121.3
120.6
910.1
960.5
454
Susie
10,215 10,210
76,457
75,435
Susie with blur
8,092.4 80,79.2 7,165
8,131
Susie with more blur
5,420
5,407
6,737
6,610
Susie with Gaussian blur
9,062
9,077
9,593
9,668
Susie with median filter
5,044
5,050
6,259
6,531
Subregion Hair of Susie (cropping attack)
14.3
15.3
66.9
67.8
Subregion Eye of Susie (cropping attack)
157.8
156.9
203.9
229.7
Subregion Lips of Susie (cropping attack)
68.9
69.1
116.5
118.9
very close to the experimental ones for both detectors, thus verifying
the results of section Performance Analysis of Statistical Detectors.
Because the experimental results validate the theoretical expressions, it
is possible to evaluate the performance of the two detectors theoretically, before actually conducting experiments. Consequently, the suitability of the proposed detection schemes can be predicted a priori.
Detector Performance Under Attacks
Filtering Attacks. The proposed quantized Laplacian detector is expected to outperform the conventional Gaussian correlator, as it is based on
a more accurate statistical model of the quantized transform domain
data. It is also expected to exhibit increased robustness against various
image and video processing attacks, either malicious or nonmalicious
ones. Experiments are conducted to examine the validity of this expectation in the presence of four common image processing attacks that are
applied independently: A blurring filter is applied twice to the host data,
degrading it more in the second case, a Gaussian blurring operation is
also applied, and, finally, the data are passed through a median filter.
Cropping Attacks. The performances of the two detection schemes are
examined under cropping, a very common and usually nonmalicious
geometric attack. In a cropping attack, a region of the host data may be
removed if it contains information of specific interest. In general, cropping does not necessarily degrade the visual quality of the data, but it
creates quite a few problems in watermark detection. When an image is
cropped, its origin is shifted and synchronization is lost. We consider that
it can be regained by inserting a suitable synchronization signal or by
exhaustive search of the image origin, as proposed in References 32 and
33. However, image cropping creates other problems, apart from loss of
synchronization. In particular, the detector must extract the watermark
455
Figure 14.9.
456
Figure 14.12. Monte Carlo runs for the likelihood ratio of the Gaussian
detector for subregion Lips of Susie.
Figure 14.13. Monte Carlo runs for the likelihood ratio of the Laplacian
detector for subregion Lips of Susie.
458
where w2 is the watermark variance or energy and x2 is the image
variance. For the original data, WDR 3 dB, as Table 14.4 shows, which
indicates that the watermark is relatively strong because watermarks are
usually characterized by lower WDRs [2]. It must be emphasized that the
relatively high strength of the watermark in the quantized domain does
not affect its invisibility, as shown in section Imperceptible Watermarking in the Compressed Domain and Figure 14.5. This is due to
the embedding process, which takes into account the properties of the
HVS, the characteristics of each watermarked block, and the effect of
quantization on the data, thus leading to the embedding of quite high
watermark values that still remain invisible. In addition, it must be noted
that the WDR is not an objective measure of the watermark visibility and
is used as an indicative estimate of the watermark strength. Finally,
Table 14.4 shows that the WDR does not change significantly after the
blurring attacks.
As discussed in section Performance Analysis of Statistical Detectors, the ROC curves and, consequently, the performance of the detectors depend solely on the SNR m20 =02 , so this quantity is used to
compare the performance of the two detectors in various situations.
Table 14.4 depicts the values of the SNR for the two detectors as well as
TABLE 14.4.
Signal to Noise Ratio SNR m20 =02 for Susie under various attacks
Data
Susie
Susie
Susie
Susie
Susie
with
with
with
with
blur
more blur
Gaussian blur
median filter
Gaussian
SNR (dB)
Laplacian
SNR (dB)
WDR
(dB)
32.335
28.004
27.990
28.753
24.242
41.350
39.610
36.390
39.325
36.100
-2.962
3.762
4.594
3.905
5.160
459
TABLE 14.5
Signal to Noise Ratio SNR m20 =02 for Susie under cropping attacks
Data
Susie
Subregion Hair of Susie
Subregion Eyes of Susie
Subregion Lips of Susie
460
Gaussian
SNR (dB)
Laplacian
SNR (dB)
WDR
(dB)
32.335
6.507
5.657
8.220
41.350
3.468
5.894
10.480
2.962
4.990
6.493
6.030
Figure 14.15.
Figure 14.16.
462
REFERENCES
1. Swanson, M.D., Zhu, B., and Tewfik, A. H., Transparent Robust Image Watermarking, in
Proceedings IEEE International Conference on Image Processing, Lausanne, 1996,
pp. 211214.
2. Eggers, J. J., and Girod, B., Quantization effects on digital watermarks, Signal Process.,
81(2), 239263, 2001.
463
464
465
15
Image Watermarking
Robust to Both
Geometric
Distortion and
JPEG Compression
Portions reprinted with permission from X. Kang, J. Huang, Y. Q. Shi and Y. Lin,
A DWT-DFT composite watermarking scheme robust to both affine transform
and JPEG compression, IEEE Trans. on Circuits and Systems for Video Technology.
vol. 13, no. 8, pp. 776786, Aug., 2003.
INTRODUCTION
Digital watermarking has emerged as a potentially effective tool for
multimedia copyright protection, authentication, and tamper-proofing
[1]. Robustness of watermarking is one of the key issues for some
applications, such as intellectual property protection and covert communication. A serious problem constraining some practical exploitations
of watermarking technology is the insufficient robustness of existing
watermarking algorithms against geometrical distortions such as translation, rotation, scaling, cropping, change of aspect ratio, and shearing.
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
467
B
x ,
ty
y0
y
the matrix B can be determined by using a template as reference. The
template is embedded into the middle-frequency components in the
magnitude spectrum to avoid interfering with the informative watermark.
To determine the translation parameter, we embed a training sequence in
the DWT domain. To survive all kinds of attack, we use the concatenated
coding of the BCH and DSSS method to encode the message m {mi;
i 1, . . ., L, mi 2 {0,1}} (L 60 in our work). To cope with bursts of errors
possibly occurring with a watermark, a newly developed 2-D interleaving
[23,24] is exploited. The watermark embedding process is shown in
Figure 15.1.
The Message Encoding and Training Sequence Embedding
in the DWT Domain
The watermark embedding in DWT domain is implemented through the
following procedures (Figure 15.1):
472
DWT decomposition: By using Daubechies 9/7 biorthogonal wavelet filters, we apply a four-level DWT to an input image f(x, y)
(512 512 8 bits, in our work), generating 12 subbands of high
frequency (LHi, HLi, HHi, i 14) and one low-frequency subband
(LL4).
Figure 15.1. The watermark embedding process. (From X. Kang et al., IEEE
Trans. on Circuits and Systems for Video Technology. vol. 13, no. 8,
pp. 776786, Aug., 2003. With permission.)
DSSS coding
!
1 i < Lc
Figure 15.2. Training data-set. (From X. Kang et al., IEEE Trans. on Circuits
and Systems for Video Technology. vol. 13, no. 8, pp. 776786, Aug., 2003.
With permission.)
15:1
if xi 1
ft1 and ft2. The angles i and radii rij, where i 1, 2, j 1, . . .,7, may be
chosen pseudorandomly as determined by a key. We require at least two
lines in order to resolve ambiguities arising from the symmetry of the
magnitude of the DFT, and we choose to use only two lines because
adding more lines increases dramatically the computational cost of
template detection. We found empirically that seven points per line
are enough to lower the false-positive probability to a satisfactory
level during detection. However, to achieve more robustness against
JPEG compression than the technique reported in Reference 19, a
lower-frequency band, say, ft1 200 and ft2 305, is used for embedding
the template. This corresponds to 0.2 and 0.3, respectively, in the
normalized frequency, which is lower than the band of 0.350.37 used in
Reference 19. Because we do not embed the informative watermark in the
magnitude spectrum of DFT domain, to be more robust to JPEG
compression, a larger strength of the template points is chosen than in
Reference 19. Concretely, instead of the local average value plus two
times the standard deviation [19], we use the local average value of DFT
points plus five times standard deviation. According to our experimental
results, this higher-strength and lower-frequency band has little effect on
the invisibility of the embedded watermark (refer to Figure 15.5 and
Figure 15.6).
Correspondingly, another set of 14 points are embedded in the lower
half-plane to fulfill the symmetry constraint.
To calculate the inverse FFT, we obtain the DWTDFT composite watermarked image f 00 x, y. The PSNR of f 00 x, y vs. the original
image is 42.5 dB, which is reduced by 0.2 dB compared with the PSNR
of f 0 x, y vs. the original image due to the template embedding.
The experimental results demonstrate that the embedded data are
perceptually invisible.
476
Figure 15.5. Original images: (a) Baboon, (b) Lena, and (c) Plane. (From
X. Kang et al., IEEE Trans. on Circuits and Systems for Video Technology.
vol. 13, no. 8, pp. 776786, Aug., 2003. With permission.)
477
Figure 15.6. The watermarked images with PSNR > 42.5 dB. (From X. Kang
et al., IEEE Trans. on Circuits and Systems for Video Technology. vol. 13,
no. 8, pp. 776786, Aug., 2003. With permission.)
478
Template Detection
We first detect the template embedded in the DFT domain. By comparing
the detected template with the originally embedded template, we can
determine the affine transformation possibly applied to the test image.
To avoid high computational complexity, we propose an effective method
to estimate the affine transformation matrix.
A linear transform applied in spatial domain results in a corresponding
linear transform in the DFT domain; that is, if a linear transform B is
applied to an image in the spatial domain,
" #
" #
x
x
!B
15:3
y
y
then, correspondingly, the following transform takes place in the DFT
domain [19]:
" #
u
v
! B1
" #
u
T
15:4
3T 2 u0
11
7 6
.. 7 6
6 ...
. 7
7 6
7 6
6 0
u1l1
v1l1 7
7 6
7 6
6
7
u021
v21 7 6
7 6
6
7
.. 7 6
6 ..
. 7
.
5 6
4
v2l2
u02l2
v11
3T
2
7
.. 7
. 7
7
7
0 7
v1l1 7
7
7
0 7
v21 7
7
.. 7
7
. 7
5
v02l2
v011
15:5
u11
6 .
6 .
6 .
6
6
6 u1l1
6
6
6 u21
6
6 .
6 .
6 .
6
6
6 u2l2
6
6 0
6
6
6 .
6 ..
6
6
6 0
6
6
6 0
6
6
6 .
6 ..
4
0
v11
..
.
..
.
v1l1
v21
..
.
..
.
v2l2
u11
..
.
..
.
u1l1
u21
..
.
..
.
u2l2
u011
6 . 7
6 . 7
.. 7
7
6 . 7
. 7
7
6
7
6 0 7
7
7
6
u
0 7
6 1l1 7
7
7
6
7
6 u021 7
0 7
7
6
7
7
6
.. 7
.. 7
72 3 6
7
6
. 7 a
6 . 7
7
7
6
76 7
u02l2 7
0 76 b 7 6
7
6
76 7 6
7
6
7
7
0 7
6
v
v11 74 c 5 6 11 7
7
7
6
.. 7
6 .. 7
d
7
6
. 7
7
6 . 7
7
7
6
6 v01l 7
v1l1 7
7
1 7
6
7
6 0 7
6 v21 7
v21 7
7
7
6
7
7
6
.. 7
7
6
.
.
7
6
. 7
.
5
5
4
0
v2l2
v2l2
15:6
15:7
or
"
M1
M1
#"
A1
"
A2
N1
#
15:8
N2
15:9
M T1 M 1 A2 M T1 N 2 :
15:10
481
nummatches
MSE
15:11
1
1
1
1
512 M tx < 512 M; 512 N ty < 512 N
2
2
2
2
15:12
where tx and ty are the translation parameters in the spatial domain. This
method demands a heavy computational load when (512 M) > 16 and
(512 N) > 16. We dramatically reduce the required computational load
by performing DWT for at most 256 cases according to the dyadic nature
of the DWT; that is, if an image is translated by 16xt1 rows and 16yt1
columns (xt1, yt1 2 Z), then the LL4 subband coefficients of the image are
translated by xt1 rows and yt1 columns accordingly. This property is
482
Figure 15.7. Resynchronization: (a) the to-be-checked image g(x, y), which is
512 512 and experienced a rotation of 10 , scaling, translation, cropping, and
JPEG compression with a quality factor of 50; (b) the image g 0 (x, y), which is
504 504 and has been recovered from the linear transform applied; (c) the
image I(x, y), which has been padded with 0s to the size 512 512; (d) the
resynchronized image g x, y, which is 512 512 and has been padded with
the mean gray-scale value of the image g(x, y). The embedded message was
finally recovered without error. (From X. Kang et al., IEEE Trans. on Circuits
and Systems for Video Technology. vol. 13, no. 8, pp. 776786, Aug., 2003.
With permission.)
15:13
where xt1 and yt1 are the translation parameters in the LL4 subband,
T1 round1=2512 M=16 and T2 round1=2512 N =16. Each time,
we extract the data sequence S in row 16 and column 16 in the LL04t x, y.
483
Figure 15.8. Some original images used in our test. (From X. Kang et al., IEEE
Trans. on Circuits and Systems for Video Technology. vol. 13, no. 8,
pp. 776786, Aug., 2003. With permission.)
484
Figure 15.8. Continued. (From X. Kang et al., IEEE Trans. on Circuits and
Systems for Video Technology. vol. 13, no. 8, pp. 776786, Aug., 2003. With
permission.)
485
Figure 15.8. Continued. (From X. Kang et al., IEEE Trans. on Circuits and
Systems for Video Technology. vol. 13, no. 8, pp. 776786, Aug., 2003. With
permission.)
Table 15.1. Experimental results with StirMark 3.1. (From X. Kang et al., IEEE
Trans. on Circuits and Systems for Video Technology. vol. 13, no. 8, pp. 776
786, Aug., 2003. With permission.)
Lena Baboon Plane Boat Drop Pepper Lake Bridge
StirMark functions
JPEG 10 100
Scaling
Jitter
Cropping_25
Aspect ratio
Rotation (autocrop, scale)
General linear transform
Shearing
Gauss filtering
Sharpening
FMLR
2 2 median_filter
3 3 median_filter
4 4 median_filter
Random bending
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
Figure 15.9. The watermarked images with PSNR > 42.5 dB. (From X. Kang
et al., IEEE Trans. on Circuits and Systems for Video Technology. vol. 13,
no. 8, pp. 776786, Aug., 2003. With permission.)
487
Figure 15.9. Continued. (From X. Kang et al., IEEE Trans. on Circuits and
Systems for Video Technology. vol. 13, no. 8, pp. 776786, Aug., 2003. With
permission.)
less than 4 sec, whereas the extraction takes about 238 sec on a Pentium
PC of 1.7 GHz using C language.
Figure 15.10 shows a marked Lena image that has undergone JPEG
compression with a quality factor of 50 (JPEG_50) in addition to general
linear transform (a StirMark test function: linear_1.010_0.013_0.009_1.011;
Figure 15.10a) or rotation 30 (autocrop, autoscale; Figure 15.10b). In both
cases, the embedded message (60 information bits) can be recovered
with no error. This demonstrates that our watermarking method is
able to resist both affine transforms and JPEG compression. Table 15.1
shows more test results with our proposed algorithms by using StirMark
3.1. In Table 15.1, 1 represents that the embedded 60-bit message can
488
CONCLUSIONS
In this section, we propose a DWTDFT composite watermarking scheme
that is robust to affine transforms and JPEG compression simultaneously.
The watermarking scheme embeds a template in magnitude spectrum
in the DFT domain to resist affine transform and uses a training
sequence embedded in the DWT domain to achieve synchronization
against translation. By using the dyadic property of the DWT, the number
of the DWT implementation is dramatically reduced, hence lowering
489
ACKNOWLEDGMENT
The work on this chapter was partially supported by NSF of china
(60325208, 60133020, 60172067), NSF of Guangdong (04205407), Foundation of Education, Ministry of China, New Jersey Commission of Science
and Technology via NJWINS.
490
491
492
16
Reversible
Watermarks
Using a Difference
Expansion
Adnan M. Alattar
INTRODUCTION
Watermarking valuable and sensitive images such as artworks and
military and medical images presents a major challenge to most
watermarking algorithms. First, such applications may require the
embedding of several kilobytes of data, but most robust watermarking
algorithms can embed only several hundred bits of data. Second, the
watermarking process usually introduces a slight but irreversible
degradation in the original image. This degradation may reduce the
aesthetic and monetary values of artwork, and it may cause the loss of
significant artifacts in military and medical images. These artifacts may
be crucial for an accurate diagnosis from the medical images or for an
accurate analysis of the military images. Just as importantly, the
degradation may introduce new, misleading artifacts.
The demands of the aforementioned applications can be met by
reversible watermarking techniques. Unlike their robust counterparts,
reversible watermarking techniques are fragile and employ an embedding
process that is completely reversible. Furthermore, some of these
techniques allow the embedding of about a hundred kilobytes of data
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
493
Figure 16.1.
Figure 16.2.
498
N 1, 1
N 1, 2
N 1, N 2
N 1, N 1
16:1a
u D1 v ,
16:1b
define a GRIT.
499
1
1 a1 =c
1
a0 =c
In this case, the transform of the dyad u (u0, u1)T is the dyad
v (v0, v1)T, whose coefficients are given by
a0 u0 a1 u1
v0
a0 a1
16:2a
v1 u0 u1
a1 v 1
u0 v0
a0 a1
a0 v 1
u1 v0
a0 a1
16:2b
Because v0 is an integer,
a1 v 1
a0 a1 1
u0 v0
a0 a1
a0 a 1
a0 v 1
u1 v0
a0 a 1
16:2c
The above equations indicate that the first coefficient of the transformed dyad is the integer representation of the weighted average of the
elements of the original dyad. The other coefficient is the difference
between the second and the first elements of the original vector.
500
16:2d
u0 v0
16:2e
which is identical to the transform pair used by Tian [15]. Similar dyadbased transforms can also be derived by changing the values of a0 and a1
and subtracting u0 from u1 instead of subtracting u1 from u0.
Triplet-Based GRIT
A triplet-based GRIT can be easily derived from the transform pair given
in section The GRIT by setting N 3 and selecting a proper 3 3 D
matrix. The transform used by Alattar [16] can be obtained by setting
2
3
a0 =c a1 =c a2 =c
D 4 1
1
0 5
1
0
1
where c a0 a1 a2 . The inverse matrix is
2
D1
1
41
1
a1 =c
a0 a2 =c
a1 =c
3
a2 =c
5
a2 =c
a0 a1 =c
In this case, the transform of the triplet u (u0, u1, u2)T is the triplet
v (v0, v1, v2)T, whose coefficients
are given by
a0 u0 a1 u1 a2 u2
v0
a0 a1 a2
16:3a
v1 u1 u0
v2 u2 u0
a1 v 1
a2 v 2
u 0 v0
a0 a1 a2 a0 a1 a2
a0 a2 v1
a2 v 2
u 1 v0
a0 a1 a2 a0 a1 a2
a1 v 1
a0 a1 v2
u 2 v0
a0 a1 a2 a0 a1 a2
16:3b
501
a1 v1 a2 v2
u 0 v0
a0 a1 a2
a2 v2 a0 a2 v1
u 1 v0
a0 a1 a2
a1 v1 a0 a1 v2
u 2 v0
a0 a1 a2
16:3c
16:3d
v0
16:3e
v2 u2 u0
jv v k
1
2
3
u 1 v1 u 0
u 2 v2 u 0
u 0 v0
16:3f
u 2 v2 u 0
which is identical to the transform pair used by Alattar [16].
502
16:3h
RU G
BV G
P3
i0
2
1
6
61
6
6
61
4
1
a2 =c
a3 =c
a0 a2 a3 =c
a2 =c
a3 =c
a1 =c
a0 a1 a3 =c
a3 =c
a1 =c
a2 =c
a0 a1 a2 =c
3
7
7
7
7
7
5
503
a0 u0 a1 u1 a2 u2 a3 u3
v0
a0 a1 a2 a3
v1 u 1 u 0
v2 u 2 u 0
16:4a
v3 u 3 u 0
a1 v 1 a2 v 2 a3 v 3
a0 a1 a2 a3
a0 a2 a3 v1 a2 v2 a3 v3
u 1 v0
a0 a1 a2 a3
a1 v1 a0 a1 a3 v2 a3 v3
u 2 v0
a0 a1 a2 a3
a1 v1 a2 v2 a0 a1 a2 v3
u 3 v0
a0 a1 a2 a3
u 0 v0
16:4b
16:4c
v0
16:4d
v2 u2 u0
v3 u3 u0
jv v v k
1
2
3
4
u1 v1 u0
u0 v0
u2 v2 u0
u3 v3 u0
504
16:4e
P3
i0
ai ,
2
D1
1 c a0 =c
61
a0 =c
6
41
a0 =c
1
a0 =c
a2 a3 =c
a2 a3 =c
a0 a1 =c
a0 a1 =c
3
a3 =c
a3 =c 7
7
a3 =c 5
c a3 =c
16:4f
v2 u2 u1
v3 u3 u2
a1 a2 a3 v1 a2 a3 v2 a3 v3
u0 v0
a0 a1 a2 a3
u1 v1 u0
16:4g
u2 v2 u1
u3 v3 u2
16:5
jv k
1
b1
2
jv k
2
v^2 2
b2
2
..
.
jv k
N 1
v^ N 1 2
bN 1
2
v^1 2
16:6
2.
3.
arranged
into the set of N 1 vectors UR uRl , l 1, . . . , L using the
security key KR.
The pixel values in the green component,
I(i, j, 1), are
arranged
into the set of N 1 vectors UG uGh , h 1, . . . , H using the
security key KG.
The pixel values in the blue component, I(i, j, 2), are arranged
into the set of N 1 vectors UB fuBp , p 1, . . . , Pg using the
security key KB.
4.
5.
508
Form the set of vectors U from the image I(i, j, k) using the security
key K.
Calculate V using the forward GRIT f (see Equation 16.1a).
Use definitions 1 and 2 in sections Definition 1: Expandable and
Definition 2: Changeable, respectively, to divide U into the sets
S1, S2, and S3.
Form the location map, M; then, compress it using a lossless
compression algorithm, such as joint bi-level image experts group
(JBIG) or an arithmetic compression algorithm, to produce sub-bit
stream B1. Append a unique identifier, end-of-stream (EOS), symbol
to B1 to identify the end of B1. The EOS is optional because the
decompression process during image restoration can be stopped
once M is completely restored.
Extract the LSBs of v1 , v2 , . . . , vN 1 of each vector in S2. Concatenate
these bits to form sub-bit-stream B2. One may choose to losslessly
6.
7.
8.
9.
It should be noted here that the size of bit stream B must be less than
or equal to N 1 times the size of the set S4. To meet this condition, the
values of the threshold T1 , T2 , . . . , TN 1 must be set properly. Also, note
that the algorithm is not limited to RGB images. Using the RGB space in
the previous discussion was merely for illustration purposes, and using
the algorithm with other types of color images is straightforward.
Form the set of vectors U from the image I w(i, j, k) using the
security key K.
Calculate V using the forward GRIT, f (see Equation 16.1a).
Use definition 2 of section Definition 2: Changeable to divide
the vectors in U into changeable and nonchangeable vectors.
Let S^4 contains the changeable vectors and S3 contain the
nonchangeable vectors. S^4 has the same vectors as S4, which was
constructed during embedding, but the values of the entities
in each vector may be different. Similarly, S3 is the same set
constructed during embedding because it contains nonchangeable
vectors.
509
5.
6.
7.
8.
9.
10.
11.
e
v1
,
2
v2
e
v2
,...,
2
vN 1
e
vN 1
2
16:7
PAYLOAD SIZE
To be able to embed data into the host image, the size of the bit stream
B must be less than or equal to N1 times the size of the set S4. This
means that
kS1 k kS2 k
16:8
16:9
For Tians algorithm, the bit stream size is kB3 k kS1 k kB1 k, which
can be obtained from Equation 16.9 by setting N 2.
510
16:10
Equation 16.10 indicates that the algorithm is effective when N and the
number of selected expandable vectors are reasonably large. In this case,
it does not matter if the binary map, M, is difficult to compress (because
its size is very small). However, when each vector is formed from N
consecutive pixels (rowwise or columnwise) in the image and when N is
large, the number of expandable vectors may decrease substantially;
consequently, the values of the thresholds T1 , T2 , . . . , TN 1 must be
increased to maintain the same number of selected expandable vectors.
This increase causes a decrease in the quality of the embedded image.
Such a decrease can be ignored by many applications because the
embedding process is reversible and the original image can be obtained
at any time. In this case, the algorithm becomes more suitable for lowsignal-to-noise-ratio (SNR) embedding than for high-SNR embedding. To
maximize kB1 k for high-SNR embedding, either N must be kept relatively
small or each vector must be formed from adjacent pixels in the 2-D
area in the image. The quad (N4) structure given in the next section
satisfies both requirements simultaneously.
The maximum payload size can be achieved when N is extremely large
N N 1 and all vectors in the image are expandable ( 1). The
binary map, in this case, will be extremely compressible 0 because
it contains no zeros. Substituting these values of N, , and in Equation
16.10, we find that the maximum possible payload size equals the area of
the image. Hence, the maximum capacity of this algorithm is 1 bit/pixel
per color component.
When =N 1, the payload size in Equation 16.10 becomes
negative. In this case, nothing can be embedded into the image.
511
wh
2 2
16:11
16:12
where 0 < < 1 is a constant that controls the speed of convergence. T(0)
is a preset value that reflects the relative weights between the entities of
the vector used in the difference expansion transform.
512
Figure 16.5.
depicted in Figure 16.5 for quad vectors. Figure16.5 suggests four different
quad structures, each of which can be used in a different iteration, for a
total of four iterations. The vectors u0, u1, u2, and u3 are different permutations of the same vector u. For u0, the GRIT is performed based on u0, so
the closer u0 is to u1, u2, and u3, the smaller the difference is and, hence, the
smaller the embedding error is. Similarly, for u1, u2, and u3, the GRIT will be
based on u1, u2, and u3 components, respectively. This use of permutations
lets the algorithm exploit the correlation within a quad completely.
Cross-Color Embedding
To hide even more data, the algorithm can be applied across color
components after it is applied independently to each color component. In
this case, the vector u contains the color components (R, G, B) of each
pixel arranged in a predefined order. The GRIT for the cross-color
arrangement is given in Equation 16.3i and Equation 16.3j.
Although, the spirit of the payload size analysis of section Payload
Size applies to the cross-color vectors, the results must be slightly
modified to reflect the fact that the number of vectors, in this case, equals
the area of the location map, which equals the area of the original image.
Hence,
kB3 k 2kS1 k kB1 k
kB3 k 2 w h
514
16:13
EXPERIMENTAL RESULTS
Tian [15] implemented a special case of the algorithm we detailed
in section Algorithm for Reversible Watermark for the dyads vector
when a0 a1 1. However, we implemented the general form of the
algorithm when a0 a1 , . . . , aN 1 1 and tested it with spatial
triplets, spatial quads, cross-color triplets, and cross-color quads. In all
cases, we used a random binary sequence derived from a uniformly
distributed noise as a watermark signal. We tested the algorithm with the
512 512 RGB images: Lena, Baboon, and Fruits. In all of the experiments,
we set T1 T2 , . . . , TN 1 C and adjusted the value of C to produce
the desired peak SNR (PSNR). We used a payload that consists of pure
text obtained from a typical text document.
Spatial Triplets
A spatial triplet is a 1 3 or 3 1 vector formed from three consecutive
pixel values in the same color component rowwise or columnwise,
respectively. We applied the algorithm recursively to each color
component: first to the columns and then to the rows. The payload size
embedded into each of the test images (all color components) is plotted
against the PSNRs of the resulting watermarked image in Figure 16.6. The
plot indicates that the achievable embedding capacity depends on the
nature of the image itself. Some images can bear more bits with lower
distortion in the sense of PSNR than others. Images with many lowfrequency contents and high correlation, like Lena and Fruits, produce
Figure 16.6. Embedded payload size vs. PSNR for colored images embedded
using spatial triplet-based algorithm.
515
Figure 16.9. Embedded payload size vs. PSNR for colored images embedded
using the spatial quad-based algorithm.
embed 482 kbits (1.84 bits/pixel) at 24.73 dB and 87 kbits (0.33 bits/pixel)
at 36.6 dB.
The visual quality of the watermarked image is shown in Figure 16.10
and Figure 16.11 for Lena and Baboon embedded at low, medium, and
high SNRs. In general, the quality of the embedded images is better than
that obtained by the algorithm using spatial triplets. Also, the sharpening
effect is less noticeable.
Figure 16.12 combines Figure 16.9 and Figure 16.6. Figure 16.12 reveals
that the spatial quad-based and the spatial triplet-based algorithms seem
to have different operation ranges with some overlap. At higher PSNRs,
the spatial triplet-based algorithm was unable to generate many results,
but it can be observed from the tendency of the curves that the spatial
quad-based algorithm seems to have superior performance compared to
the spatial triplet-based algorithm. This result is because 2 2 spatial
quads have a higher correlation than 1 3 spatial triplets and because
the single location map used by the spatial quad-based algorithm is
smaller than each of the two location maps used by the spatial tripletbased algorithm (one location map for each pass).
On the other hand, although the spatial quad-based algorithm was
unable to generate many results at lower PSNRs, it can be observed from
the tendency of the curves that the spatial triplet-based algorithm seems
to have superior performance. This behavior is attributed to the fact
that the spatial triplet-based algorithm is applied to the image twice
(rowwise and columnwise), whereas the quad-based algorithm is applied
only once.
518
Figure 16.11. Baboon embedded using the spatial quad-based algorithm: (a)
original, (b) 24.73 dB embedded with 481,624 bits (1.84 bits/pixel ), (c) 30.19 dB
embedded with 258,053 bits (0.98 bits/pixel ), (d) 40.00 dB embedded with
39,829 bits (0.15 bits/pixel ).
Figure 16.13. Comparison between the performance of the spatial tripletbased algorithm and the spatial quad-based algorithm applied to the image
twice.
lower than those using spatial vectors. Hence, for a given PSNR level, it is
better to use spatial vectors than cross-color vectors.
Also, Figure 16.14 and Figure 16.15 clearly show that the cross-color
algorithm with equal weighting has almost the same performance as the
cross-color algorithm with different weightings with all test images
except Lena at PSNR greater than 30. Although the equal-weighting
Figure 16.14. Embedded payload size vs. PSNR for colored images embedded
using cross-spectral with equal-weighting GRITs.
521
Figure 16.15. Embedded payload size vs. PSNR for colored images embedded
using cross-spectral with different-weighting GRITs.
algorithm was able to embed small payloads at these higher PSNRs, the
different-weighting algorithm was not.
Upon closer inspection of the Lena image, we noted that the blue
channel of Lena is very close to the green channel. Also, upon further inspection of the equal-weighting and different-weighting GRIT transforms,
we noted that when the red or blue channel is close in value to the green
channel, the dynamic range of G after expansion according to Equation
16.15 becomes wider for the different-weighting transform than for the
equal-weighting transform. Hence, in this case, the equal-weighting
GRIT algorithm has the potential of producing more expandable vectors
and a location map of less entropy than the different-weighting GRIT
algorithm. Indeed, this was the case with the Lena image, as can be seen
in Figure 16.16 and Figure 16.17.
Comparison with Other Algorithms in the Literature
We also compared the performance of the proposed algorithm with that
of Tians described in Reference 15 using gray-scale Lena and Barbara
images. Recall that Tians algorithm uses spatial pairs rather than
spatial triplets and spatial quads. The results are plotted in Figure 16.18
for the spatial triplet-based algorithm and in Figure 16.19 for the spatial
quad-based algorithm. As expected, Figure 16.18 indicates that our spatial
triplet-based algorithm outperforms Tians at low PSNRs, but Tians
algorithm outperforms ours at high PSNRs. In contrast, Figure 16.19
indicates that our spatial quad-based algorithm outperforms Tians at
PSNRs higher than 35 dB, but Tians algorithm marginally outperforms
522
Figure 16.16. Payload size vs. PSNR for Lena colored image using crossspectral with equal-weighting and different-weighting GRIT transforms.
Figure 16.17. Size of compressed map vs. PSNR for Lena colored image using
cross-spectral with equal-weighting and different-weighting GRIT transforms.
We also compared our proposed algorithm with that of Celik [12] using
gray-scale Lena and Barbara images. The results are plotted in
Figure 16.20 for the spatial triplet-based algorithm and in Figure 16.21
for the spatial quad-based algorithm. Figure 16.20 indicates that our
spatial triplet-based algorithm also outperforms Celiks at low PSNRs, but
our algorithm has similar performance to Celiks at high PSNRs. In
contrast, Figure 16.21 indicates that our quad-based algorithm is superior
to Celiks at almost all PSNRs.
524
SUMMARY
In this chapter, we described a family of reversible watermarking
algorithms that has very high capacity and causes low distortion in
the image. This family is based on the expansion of the difference
coefficients of a GRIT of vectors of an arbitrary size. Test results of the
525
526
527
Part IV
Multimedia
Data Hiding,
Fingerprinting,
and
Authentication
17
Lossless
Data Hiding
Fundamentals, Algorithms,
and Applications
Yun Q. Shi, Guorong Xuan, and Wei Su
INTRODUCTION
Data hiding has recently been proposed as a promising technique
for the purpose of information assurance, authentication, fingerprint,
security, data mining, copyright protection, and so forth. By data hiding,
pieces of information represented by some data are embedded in a cover
medium in such a way that the resultant medium, often referred to as
marked medium or stego medium, is perceived with no difference from
the original cover medium. Many data hiding algorithms have been
proposed in the past several years. As will be shown, in most cases,
the cover medium will experience some permanent distortion due to
data hiding and cannot be inverted back to the original medium. In
the analogy to the classification of data compression algorithms, the
vast majority of data hiding algorithms can be referred to as lossy data
hiding.
The fact that most of the current data hiding algorithms reported in
the literature are not lossless can be shown as follows. For instance, with
the most popularly utilized spread spectrum watermarking techniques,
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
531
Figure 17.1.
artifacts.
534
Figure 17.2. Data embedding block diagram of the IWT-based lossless data
hiding algorithm (IIWT inverse IWT).
Finally, the gray scale of the peak point is either replaced by that of its
immediately neighboring point or kept intact to embed binary 1 and 0,
respectively. This algorithm has a quite large embedding capacity, ranging from 0.019 to 0.31 bpp, while keeping a very high visual quality for
all images (the PSNR of marked images vs. original images is guaranteed
to be higher than 48 dB). This is indeed a peculiar advantage of this
method. In addition, this algorithm is simple and can be implemented
rather quickly.
Celik et al. presented a high-capacity, low-distortion reversible data
hiding technique [14]. In the embedding phase, the host signal is
quantized and the residual is obtained. The algorithm adopts the
lossless image compression algorithm, with the quantized values as
side information, to efficiently compress the quantization residuals to
create a high capacity for the payload data. The compressed residual and
payload data are concatenated and embedded into the host signal via the
generalized LSB modification method. The experimental results show
that the PSNR and capacity are satisfactory.
Tian recently presented a new high-capacity, reversible data embedding algorithm in Reference 15. In the algorithm, two techniques are
employed (difference expansion and generalized LSB embedding ) to
achieve a very high embedding capacity while keep the distortion low.
The main idea of this technique is described next. For a pair of pixel
values x and y, the algorithm first computes the integer average l and
difference h of x and y, where h x y. Then, h is shifted to the left-hand
size by 1 bit and the to-be-embedded bit b is appended into the LSB. This
is equivalent to h0 2 h b, where h0 denotes the expanded difference,
which explains the term difference expansion. Finally, the new x and y
537
35 dB
44 dB
35 dB
44 dB
35 dB
44 dB
IWT-Based Method
0.50
0.15
0.45
0.12
0.70
0.30
bpp
bpp
bpp
bpp
bpp
bpp
are calculated based on the new difference values h0 and the original
integer average value l. In this way, the marked image is obtained. To
avoid overflow and underflow, the algorithm only embeds data into the
pixel pairs that will not lead to overflow and underflow. Therefore, a twodimensional binary bookkeeping image is losslessly compressed and
embedded as overhead.
It has been reported in Reference 15 that the embedding capacity
achieved by the difference expansion method is higher than that
achieved in both References 10 and 14. Our experimental works demonstrate that both the difference expansion method [15] and the IWT-based
method [12] have achieved similar high embedding capacity. Some
experimental results are listed in Table 17.1.
In summary, several lossless data hiding algorithms having large
embedding capacity have been presented and discussed in this
sub-section. The difference expansion method [15] and the IWT-based
method [12] may have achieved the highest embedding capacity in this
category, whereas the histogram-modification-based method [13] may
have kept the highest guaranteed visual quality of marked images in
terms of PSNR among all of the existing methods.
Figure 17.3. Embedding a binary 1: (a) mass center vector of zone A, (b) mass
center vector of zone B, (c) counterclockwise rotated mass center vector of
zone A, (d) clockwise rotated mass center vector of zone B. (From Y. Q. Shi et al.,
Proceedings of IEEE International Symposium on Circuits and Systems, Vol. II,
pp. 3336, Vancouver, Canada, May 2004. With permission.)
540
Figure 17.4. Medical picture 1: (a) original and (b) marked with severe
salt-and-pepper noise.
541
Figure 17.4.
Continued.
Figure 17.5. Medical picture 3: (a) original and (b) marked with some
salt-and-pepper noise (the letters on the four sides have become blurred).
542
Figure 17.6. Woman image (N1A): (a) original and (b) marked with severe color
distortion due to severe salt-and-pepper noise (the color of half of her hair has
become dark red and that of most of palm of her right hand has become green).
543
Figure 17.7. Cafe image (N2A): (a) original and (b) marked with severe color
distortion due to severe salt-and-pepper noise (the color of the table surfaces
has turned to yellow).
has become green. Figure 17.7 provides another such an example, where
the salt-and-pepper noise has turned the white coffee tables yellow.
Second, the marked images do not have a high enough PSNR. Table 17.2
contains test results for the eight medical images. The PSNR of marked
images is as low as 26 dB (as 476 information bits are embedded in image
of 512 512 8). Note that the salt-and-pepper noise exists in each of
544
Mpic
Mpic
Mpic
Mpic
Mpic
Mpic
Mpic
Mpic
1
2
3
4
5
6
7
8
Table 17.3.
N1A
N2A
N3A
N4A
N5A
N6A
N7A
N8A
Robustness
( bpp)
Salt-and-pepper Noise
9.28
4.73
26.38
26.49
26.49
5.60
9.64
5.93
1.0
2.0
0.8
0.6
0.6
1.6
0.8
2.8
Severe
Severe
Severe
Severe
Severe
Severe
Severe
Severe
Robustness
( bpp)
Salt-and-pepper Noise
17.73
17.73
23.73
19.67
17.28
23.99
20.66
14.32
0.8
2.2
0.6
1.2
1.2
0.6
1.4
1.4
Severe
Severe
Severe
Severe
Severe
Severe
Severe
Severe
546
547
18
Attacking
Multimedia
Protection Systems
via Blind Pattern
Matching
Darko Kirovski and Jean-Luc Dugelay
THE TARGET
Significantly increased levels of multimedia piracy over the last decade
have put the movie and music industry under pressure to deploy a
standardized antipiracy technology for multimedia content. Initiatives,
such as the Secure Digital Music Initiative (SDMI) [1] and the Digital
Versatile Disk ( DVD) Copy Control Association (CCA) [2], have been
established to develop open technology specifications that protect the
playing, storing, and distributing of digital music and video. Although
the demand for deploying and standardizing a digital rights management ( DRM) technology is strong from both media studios as well as
information technology companies, technically, a DRM system that can
provide a cryptographic level of multimedia protection has yet to be
developed.
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
549
Content Screening
In a typical content screening scenario, a copyright owner protects her
distribution rights simply by hiding a unique and secret watermark in
her multimedia clips. Whereas both the original multimedia content as
well as the key used to generate the secret must be safely guarded by
the owner, the marked copy can be distributed using a public
communication channel such as the Internet. In general, the marked
content can be distributed in plaintext over the communication channel.
The clients media player searches the distributed content for hidden
information without the presence of the original clip. We refer to such
watermark detection as blind. If the secret mark is detected, the
player must verify, prior to playback, whether it has a license to play the
content. Only when the license is authentic and valid, the media player
may play the protected clip. By default, unmarked content is considered
unprotected and is played without any barriers. Hence, a content
screening system consists of two subsystems: a watermark detector and
a DRM agent that handles license management via standard cryptographic tools. An example of a DRM agent is Microsofts Media Player
9 DRM system [3]. A content screening scenario is illustrated in
Figure 18.1.
A content screening system that relies on watermarks must fulfill
several requirements. First, watermarks must be imperceptive to the
human auditory system (HAS). For example, in the case of music, marked
audio clips should be perceptually indistinguishable from the original
audio signals by people with extraordinary auditory senses commonly
called golden ears. An additional requirement that relates to system
security is that the embedded secret is robust to attacks; that is, it cannot
be removed from the multimedia clip without the knowledge of the secret
550
Fingerprinting
In a typical scenario that uses multimedia marking for forensic purposes,
illustrated in Figure 18.2, studios create a uniquely marked content copy
for each individual user request. User-specific distinct watermarks are
551
Figure 18.2.
555
18:1
Attack Trade-offs
Considering the issues related to the BPM attack and presented in
section Rationale Behind the Blind Pattern-Matching Attack, we identify
several important trade-off decisions that the adversary needs to make
before applying the attack. The trade-offs, TO.1 to TO.4, reflect on the
following important performance metrics: reduction in correlation,
distortion, and speed.
TO.1.
TO.2.
TO.3.
TO.4.
558
ATTACK STEPS
Step I: Signal Partitioning
For improved perceptual quality of the resulting multimedia clip, the pro^ is partitioned into a set of blocks fp1 , . . . , pP g,
tected signal z x^ w
where each block pi fhj z1 i1N =2 j , j 1; . . . ; N g overlaps its neighbors and is windowed with an analysis windowing function h 2 fRgN that
yields perfect reconstruction with its synthesis counterpart. With no loss
^ is an one-dimensional signal.
of generality, we assume that x^ w
Step II: Search for the Substitution Base
Finding perceptually similar blocks of certain music or video content is a
challenging and computationally expensive task. For a given set of transforms T f1 , . . . , jT j g and for each point pi in the multimedia clip, we
want to find a set Bi of K best-matched blocks in fz, 1 z, . . . , jT j zg
denoted as Bi fb1 , . . . , bK g, with individual points denoted as bj
h fzsj , . . . , zsj N 1 g or bj h k fzsj , . . . , zsj N 1 g if a transform is used,
where sj indexes the location of bj in z or its transform k z. Before
we define the search process, we adopt a normalized and squared
Euclidean distance between two N-dimensional points a and b as a similarity metric:
PN
ak bk 2
18:2
a, b k1
a b N
where a and b are standard deviations of vectors a and b, respectively.
Because maximized normalized correlation corresponds to a minimal
Euclidean distance in L2, the search for top K matches in z against each
pi can be sped up as follows. We first compute the normalized block
convolution of the complex conjugate of pi with respect to z. This can be
done rather fast using the fast Fourier transform (FFT) and the overlapadd fast convolution method [25]. The same method can be used to
efficiently compute the standard deviation of every point pj in z. The
complexity of this step is OM log2 N assuming N is a power of 2. The
top K correlated blocks in z and its transforms that do not overlap pi
constitute the substitution base Bi for pi .
Step III: Computing the Replacement
This step of the algorithm is crucial, as it resolves the trade-offs related
to the selection of and the inherent d the two most important
metrics of the BPM attack. First, we review one restriction of the attack.
559
18:3
Bounds and " define the safe and perceptually valid distance,
respectively, that a sample of the replacement block must have. In the
remainder of this subsection, we review several algorithms for computing
a such that the above constraints are satisfied.
Algorithm A0 computes the replacement block a as follows. We assume
that it is given a set B of K points such that for each point bi 2 B,
the following relation holds: jbi pj ". The replacement block a
is computed from the selected blocks in B such that its similarity
with respect to p is maximized. More formally, we construct a
matrix s 2 fRgKN , where each row of this matrix represents one block
from B. We aim to compute a vector such that ks pk is minimized. The least square solution to this set of linear equations, commonly
called pseudoinverse of s, equals sT s1 sT p. A temporary replacement block a0 is now computed as a0 s. Per sample a0j , three cases can
occur:
1.
2.
3.
18:4
Fourier transform (DFT) filter bank, used in conjunction with analysis and
synthesis windows that provide perfect reconstruction. We consider
MCLT analysis blocks with 2048 transform coefficients and an 0:5
overlap. Each block of coefficients is normalized and psychoacoustically
masked using an off-the-shelf masking model [29]. Similarity is explored
exclusively in the audible part of the frequency subband where watermarks are hidden. In this case, we bound this subband within 200 Hz and
7 kHz [20]. Figure 18.3 illustrates the signal processing primitives used to
prepare blocks of audio content for substitution.
Watermark length is assumed to be greater than 1 sec. In addition, we
assume that watermark chips may be replicated along the time axis at
most for one second2 [20]. Thus, we restrict that for a given block its
potential substitution blocks are not searched within 1 sec.
In the experiments presented in this section, we considered the
following five audio clips:
clip 1: Ace of Base, Ultimate Dance Party 1999, Cruel Summer (Blazin
Rhythm Remix)
clip 2: Steely Dan, Gaucho, Babylon Sisters
clip 3: Pink Floyd, The Wall, Comfortably Numb
clip 4: Dave Matthews Band, Crash, Crash Into Me
clip 5: Unidentified classical piece, produced by SONY Ent., selected
because of exceptional perceptual randomness (available upon
request from the authors)
563
Figure 18.4. Music self-similarity: a similarity diagram for five different 2048long MCLT blocks taken from clip 1 with a substitution database of 240 MCLT
blocks taken from the same clip. Zero-similarity denotes equality. The abscissa x
denotes the index of a particular MCLT block. The ordinate denotes the
similarity /x, bi of the corresponding block x with respect to the selected five
MCLT blocks with indices bi ji f122, 127, 132, 137, 142g. Here, similarity is
computed in the MCLT domain.
564
Figure 18.6. Music self-similarity: result of the search process for the source
point pi fz560001 , . . . , z564096 g within the first million samples of clip 1.
The search was executed using the following MATLAB script: test
clip(560001:564096); plot(fftfilt(flipud(test),clip). /(fftfilt
(ones(4096,1),clip)*norm(test))).
Figure 18.7. Probability density function of the similarity function B, R 0 for
two different cases: K 1 (left) and K 10 (right).
K
K
K
K
K
K
1
10
20
30
50
100
clip 1
clip 2
clip 3
clip 4
clip 5
Average
K b, a0 K1 b, a0
3.5714
2.3690
2.2666
2.2059
2.1284
1.9512
5.2059
3.2528
3.0576
2.9255
2.6595
2.1331
5.3774
3.5321
3.3792
3.3061
3.1702
2.8631
5.3816
3.5193
3.3664
3.2762
3.1209
2.7439
5.8963
4.0536
3.7968
3.5613
3.0635
1.8719
N/A
1.741
1.914
2.032
2.253
2.775
Note: Results are reported on the dB scale. Average effective block dimensionality is
N 400.
Finally, Table 18.1 quantifies the improvement in the average distortion b, a0 as K increases from 1 to 100. We conclude that the BPM attack
in our experimental setup induces between 1.53 dB distortion noise with
respect to the marked copy.
Effect of the Attack on Watermark Detection
In order to evaluate the effect of the BPMAlgorithm A0 attack on direct
sequence spread spectrum (DSSS) watermarks, we conducted another set
of experiments. We used DSSS sequences with a 1dB amplitude that
spread over 240 consecutive 2048-long MCLT blocks (approximately
11 sec long), where only the audible frequency magnitudes in the 2- to
7-kHz subband were marked. We did not use chip replication, as its effect
on watermark detection is orthogonal with respect to the BPM attack.
Figure 18.8 shows how normalized correlation of a spread spectrum
watermark detector is affected by the increase of the parameter K. During
the attack, we replaced each target block p with its computed replacement block a following the recipe presented in section Step III: Computing the Replacement. In Figure 18.8, we show two results. First, we
show the average normalized correlation value (left ordinate) across 10
different tests for watermark detection within marked content (curves
marked WM) and within marked content attacked with our attack for
several values of K f1, 10, 20; 30, 50, 100g (curves marked RA). Second,
we show on the right ordinate the signal distortion caused by the BPM
attack: the minimal, average, and maximal distortion across all five audio
clips. We can conclude from the diagram that for small values of K,
its increase results in greatly improved distortion metrics, whereas
for large values of K, the computed replacement vectors are too similar
with respect to the target blocks, which results in a lesser effect on the
normalized correlation.
567
Figure 18.8. Response of a DSSS watermark detector to the BPM attack. The
abscissa quantifies the change in parameter K from 1 to 100 for a fixed
watermark amplitude of 1dB. The left ordinate shows the increase of the
normalized correlation as K increases. The results are obtained for five full
songs in different genres. The right ordinate shows the corresponding minimal,
maximal, and average distortions with respect to the set of benchmark clips due
to the BPM attack.
Figure 18.9. Illustration of the process of searching for similar blocks within
the original marked image and its transforms.
569
18:5
n X
n
2
X
s gx, y o f x, y
18:6
x1 y1
570
Only a given part of the domain block is substituted with the range
block. In our case, we used a circular mask inscribed in the block.
Overlapping range blocks have been used. Consequently, specific
care must be taken during the reconstruction. A simple substitution
The PSNR and the wPSNR are computed using the Y channel only of the YUV color space.
571
Figure 18.10. Attack against D*******. The figure illustrates, from left to
right, the original image, the image marked with D******* watermarking
software, and the image produced by the BPM attack.
Figure 18.11. Attack against S***I**. The figure illustrates, from left to right,
the original image, the image marked with S***I** watermarking software, and
the image produced by the BPM attack.
Figure 18.12. Attack against S***S***. The figure illustrates, from left to right,
the original image, the image marked with S***S*** watermarking software,
and the image produced by the BPM attack.
572
REFERENCES
1. The Secure Digital Music Initiative, http://www.sdmi.org.
2. The DVD Copy Control Association, http://www.dvdcca.org.
3. Architecture of Windows Media Rights Manager, http://www.microsoft.com/
windows/windowsmedia/wm7/drm/architecture.aspx.
4. Rivest, R.L., Shamir, A., and Adleman, L.A., A method for obtaining digital signatures
and public-key cryptosystems, Commn. ACM, 21(2), 120126, 1978.
5. Kirovski, D., Malvar, H., and Yacobi, Y., A dual watermarking and fingerprinting
system, ACM Multimedia, 2002, pp. 372381.
573
574
575
19
Digital Media
Fingerprinting
Techniques and Trends
William Luh and Deepa Kundur
INTRODUCTION
The ease at which digital data can be exactly reproduced has made
piracy, the illegal distribution of content, attractive to pirates. As illegal
copies of digital data, such as video, and audio proliferate over the
Internet, an emerging interest in protecting intellectual property and
copyright material has surfaced. One such method of protecting copyright material is called digital fingerprinting. Although other means of
digital data security exist, fingerprinting aims to deter pirates from distributing illegal copies, rather than actively preventing them from doing so.
Fingerprinting is a method of embedding a unique, inconspicuous serial
number (fingerprint) into every copy of digital data that would be legally
sold. The buyer of a legal copy is discouraged from distributing illegal
copies, as these illegal copies can always be traced back to the owner via
the fingerprint. In this sense, fingerprinting is a passive form of security,
meaning that it is effective after an attack has been applied, as opposed to
active forms of security, such as encryption, which is effective from the
point it is applied to when decryption takes place.
In fingerprinting a given dataset, all legal copies of the digital data are
similar, with the exception of the unique imperceptible fingerprints;
that is, all copies of the digital data appear to be visually or audibly
indifferent. A coalition of pirates can, therefore, exploit this weakness, by
comparing their digital data looking for differences and then possibly
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
577
19:1
Collusion Attacks
Codebook
Attacks
Linear
Average
Optimal
Linear
Estimation
(in the
MSE
sense)
Nonlinear Optimal
Estimation (in the
MSE sense)
FIR
filtering
Codebook
Attacks for
binary
alphabets
Set to 0
WhiteNoise
GMAC
Others
AND
Random
Majority
Voting
XOR
OR
Order Statistics Attacks
Min
Max
Median
Minmax
Modified
Negative
Randomized
Negative
In general, attacks that are strictly applied to codebooks are not practically useful for multimedia fingerprinting because they do not effectively model collusion directly applied to media files; however, they
are discussed because these attacks can serve as a basis for designing
collusion-resistant codes, as in References 57. If is binary, such as
f0, 1g, any binary operator such as AND, OR, and XOR can be used as
attacks. In the Set to 0 attack, 1 2 l is the codeword created by
colluders, where i 0 if ji 6 ik for some j 6 k, otherwise i ji where
i1 i2 iM . This attack sets any differing bits (in the case of a
binary alphabet) to 0 while preserving bits that are the same across all
users. This attack can be modified so that differing bits are always set to
one alphabet that appears in the colluders codewords.
583
19:3
dz
z
1
fC~ j x, y, C~ j x, y, ..., C~ j x, y c~ j1 x, y, c~ j2 x, y, . . . , c~ jM x, y
1
19:4
fC j x, y, C~ j x, y, C~ j x, y, ..., C~ j x, y is the joint probability density function (pdf)
1
M
2
M
of the jth frame, (x, y)th pixel random variables of c, and c~i i1 .
fC~ j x, y;C~ j x, y, ..., C~ j x, y is the joint of the jth frame, (x, y)th pixel random vari1
M
2
ables fc~i gM
i1 . To be practical, Equation 19.4 requires that the pdfs be
somehow estimated or known. Colluders must therefore resort to more
practical techniques, such as those found in the Linear category. The
simplest Linear attack averages the set of fingerprinted multimedia, as in
c j x, y
M
1X
c~ j x, y
M i1 i
19:5
As noted in Reference 5, Equation 19.5 is fair; all members of the coalition contribute equally, hence no member is at a higher risk of being
caught than his colleagues.
584
M
X
h
i
i c~ ji x, y E C~ ji x, y E C j x, y
19:6
i1
#
h
i
j
j
~
~
C k x, y E C k x, y
0
19:7
for k 1, 2, . . . , M.
Although Equation 19.7 may seem intimidating, it is a linear equation of
M equations (for k 1, 2, . . . , M ) and M unknowns (i , for i 1, 2, . . . , M ).
The expected values are more easily estimated than the pdfs in Equation
M
19.4. The Fourier infrared (FIR) filtering attack averages c~ j x, y j1 , then
applies a FIR filter as in Equation 19.8.
c j hx, y
M
1X
c~ j x, y
M i1 i
19:8
h(x, y) is an FIR two-dimensional (2-D) spatial filter and is the 2-D convolution operator. The goal of Equation 19.8 is to further attenuate the
fingerprint by filtering the average. Additive noise can be added to
Equation 19.8, as in the Gaussian medium access channel (GMAC) [8].
The Order Statistic attacks found in Reference 9, consist of the min,
max, median, minmax, modified negative, and randomized negative
defined in Equations 19.9 to Equation 19.14 respectively:
c j x, y min c~ j1 x, y, c~ j2 x, y, . . . , c~ jM x, y
19:9
c j x, y max c~ j1 x, y, c~ j2 x, y, . . . , c~ jM x, y
19:10
c j x, y median c~ j1 x, y, c~ j2 x, y, . . . , c~ jM x, y
1
min c~ ji x, y max c~ ji x, y
2
c j x, y min c~ ji x, y max c~ ji x, y median c~ ji x, y
c j x, y
19:11
19:12
19:13
585
with probability p
19:14
with probability 1 p
weight xjBs1
k
<
2
r
k
2n
log
2
"
The concept of unique intersections has a nice geometric interpretation. For fingerprinting codes that can detect at most two colluders,
the codewords that make up can be represented by the edges on
the triangle in Figure 19.5. Any two users have a unique intersection
point being a vertex of the triangle. Even if the users remove the detectable marks, the intersection will remain intact, revealing the identities of
the two colluders. If the colluders do not remove all of the detectable
marks (i.e. some leftover edge), then the leftover edge can be used
to detect the colluders. Hence it is in the best interest of colluders to
remove detectable marks. A possible attack that can cripple this system
is to remove leftover edges but leave the vertices intact. As will be
seen later, when the geometric shapes live in higher dimensions and
the colluders do not know where a vertex may be embedded, it is difficult
to generate this attack.
Figure 19.5.
592
User 2
User 1
User 3
Edge intersection
between users 2 and
3
Figure 19.6 depicts a tetrahedron, where the sides represent the codewords that can detect at most three colluders. The four sides (planes)
represent codewords for four users. When two users collude, they share a
unique edge. When three users collude, they share a unique vertex.
For codewords that can detect at most four colluders, a geometric
shape in four-dimensions is used. In general, codewords that can detect
at most n colluders require shapes in n dimensions, or O(n) bases are
required. The hyperplanes of the higher-dimension hypertetrahedron
represents the codewords. These hyperplanes are derived from finite
projective spaces PGd, q, where d is the dimension that the hypertetrahedron lives in and q 1 is the number of points on a hyperplane,
where q is a prime number. PGd, q is constructed from a vector space
of dimension d 1 over the Galois field GF(q). The details of this
construction can be found in References 20 and 21.
The tracing algorithm A detects the marked points that have not been
removed, and it determines the largest projective subspace spanned by
these points. In the theory of projective geometry, hyperplanes can be
593
4.
19:15
i1
!
In Equation 19.15, w j is the watermark created via modulation for user j,
! v
f u i gi1 is a set of v orthogonal basis signals, and b1j b2j bvj is an ACC for
user j, where bij 2 f1g. ACCs have the property that the length v Opn
and provides collusion resistance for up to n colluders. This is an
594
2
u i
19:16
!
y is the received vector that has been attacked by collusion and AWGN
with variance 2 T i is compared to a threshold e that is determined
empirically from simulations. The output is
T i e ,
ri 1
T i < e ,
ri 0
19:17
Fingerprinting in a Broadcast Channel Environment Problem Formulation. We next consider how fingerprinting is integrated into a broadcast
2.
Scrambled video signal: dc, c^>>T . The encrypted video c^ does not
resemble the unencrypted video c, which means that eavesdroppers will be watching an unintelligible video.
Unique fingerprinted videos: 8i 6 j, dc~i , c~j 6 0 and dc, c~i < T .
c~i should also contain codewords that are collusion resistant
(Definition 4) and the watermarking scheme should be robust
common feasible attacks found in the watermarking literature.
597
2.
3.
Figure 19.8.
600
REFERENCES
1. Cox, I. J., Kilian, J., Leighton, T. F., and Shamoon, T., Secure spread spectrum
watermarking for multimedia, IEEE Trans. Image Process., 6, 1673, 1997.
2. Deguillaume, F., Csurka, G., and Pun, T., Countermeasures for unintentional and
intentional video watermarking attacks, in IS&T/SPIEs 12th Annual Symposium,
Electronic Imaging 2000: Security and Watermarking of Multimedia Content II, 3971,
Wong, P. W. and Delp, E. J., Eds., SPIE Proceedings, Vol. 3971, SPIE, 2000.
3. Boneh, D. and Shaw, J., Collusion-secure fingerprinting for digital data, IEEE Trans.Inf.
Theory, 44, 1897, 1998.
4. Stinson, D. R., Cryptography Theory and Practice, CRC Press, Boca Raton, FL, 1995,
p. 24.
5. Trappe, W., Wu, M., Wang, Z. J., and Liu, K. J. R., Anti-collusion fingerprinting for
multimedia, IEEE Trans. Signal Process., 51, 1069, 2003.
6. Trappe, W., Wu, M., and Liu, K. J. R., Collusion-resistant fingerprinting for multimedia, presented at IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICAPSS 02), Orlando, FL, 2002, Vol. 4, p. 3309.
7. Trappe, W., Wu, M., and Liu, K.J.R., Anti-collusion Fingerprinting for Multimedia.
Technical Research Report, Institute for Systems Research, University of Maryland,
Baltimore, 2002.
8. Su, J. K., Eggers, J. J., and Girod, B., Capacity of Digital Watermarks Subjected to an
Optimal Collusion Attack, presented European Signal Processing Conference
(EUSIPCO 2000), Tampere, Finland, 2000.
9. Zhao, H., Wu, M., Wang, Z. J., and Liu K. J. R., Nonlinear Collusion Attacks on Independent Fingerprints for Multimedia, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 03), Hong Kong, 2003.
10. Pfitzmann, B. and Schunter, M., Asymmetric fingerprinting, in Advances in Cryptology
EUROCRYPT 96, International Conference on the Theory and Application of
Cryptographic Techniques, Maurer, U. M., Ed., Lecture Notes in Computer Science,
Vol. 1070, Springer-Verlag, Berlin, 1996, p. 84.
11. Pfitzmann, B. and Waidner, M., Anonymous Fingerprinting, in Advances in
cryptology EUROCRYPT 97, International Conference on the Theory and
Application of Cryptographic Techniques, Fumy, W., Ed., Lecture Notes in Computer
Science, Vol. 1233, Springer-Verlag, Berlin, 1997, p. 88.
12. Chor B., Fiat, A., Naor, M., and Pinkas, B., Tracing traitors, IEEE Trans. Inf. Theory, 46,
893, 2000.
13. Brown, I., Perkins, C., and Crowcroft, J., Watercasting: Distributed watermarking of
multicast media, in First International Workshop on Networked Group Communication
99, Risa, L. and Fdida, S., Eds., Lecture Notes in Computer Science, Vol. 1736,
Springer-Verlag, Berlin, 1999, p. 286.
14. Judge, P. and Ammar, M., WHIM: Watermarking multicast video with a hierarchy of
intermediaries, IEEE Commn. Mag., 39, 699, 2002.
15. Wang, Y., Doherty, J. F., and Van Dyck, R., E., A watermarking algorithm for
fingerprinting intelligence images, in Proceedings of Conference on Information
Sciences and Systems, Baltimore, MD, 2001.
16. Ergun, F., Kilian, J., and Kumar, R., A note on the limits of collusion-resistant
watermarks, in Advances in Cryptology EUROCRYPT 99, International Conference on
the Theory and Application of Cryptographic Techniques, Stern, J., Ed., Lecture Notes in
Computer Science, Vol. 1592, Springer-Verlag, Berlin, 1999, p. 140.
602
603
20
Scalable Image
and Video
Authentication*
Dimitrios Skraparlis
INTRODUCTION
Because digital multimedia data are constantly increasing in quantity and
availability, concerns are being expressed regarding the authenticity (by
means of origin authentication) and integrity of multimedia data; the ease
and efficiency of tampering with digital media has created a need for
media authentication systems based on cryptographic techniques. In
other words, the trustworthiness of digital media is in question.
Digital media are a key element to the consumer electronics industry.
As technology penetrates more and more into everyday life, security has
began to emerge as a primary concern and media authentication technology will definitely play an important role in the success of present and
future multimedia services. In any case, data origin and data integrity
verification would enhance the quality of multimedia services.
Traditional authentication methods (cyclic redundancy checks (CRC)
checks or cryptographic checksums and digital signatures) are not fit for
modern and future multimedia standards. Modern coding standards
introduce scalability and tend to be more content-oriented. Therefore, it
would be necessary to devise image and video authentication methods
* 2003 IEEE. Reprinted, with permission, from Skraparlis D., Design of an efficient
authentication method for modern image and video, IEEE Transactions on Consumer
Electronics, May 2003, pp. 417426, Vol. 49, issue 2.
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
605
Figure 20.1. How to do source and integrity authentication without the need
for feedback. MDC stands for message digest code (a hash function).
Figure 20.2.
feedback.
The contents of sections Scalability and Content-Based Multimedia constitute the new directions of research that are presented in
section New Directions in Authentication Method Design.
Design of the Authentication Method Summary
To summarize, the design methodology for the authentication system is
the following. For authentication via securely providing authentication
information, the designer needs to:
However, before tackling with how to deal with the above, we will review
prior art.
PRIOR ART
In the following subsections, we will review prior art in the form of
either commercial applications or in the open literature. These will form
an overview of prior art in the subject and present the shortcomings
and inefficiencies of the methods, creating the need for more complete
or advanced techniques. Moreover, some of the prior art techniques will
form a base for the novel advances presented in section New Directions
in Authentication Method Design.
Traditional (General Purpose and Multimedia)
Authentication Systems
The general problem of data integrity authentication has traditionally
been addressed by commercial picture authentication systems and image
authentication cameras, which are based on common digital signature
standards.
However, as digital signatures have proven to be rather inefficient for
multimedia data, and hash and sign methods (section Hash and Sign
Method) are still deemed inefficient and inappropriate for streams and
multicasting, other techniques based on stream signatures have been
presented recently in the literature and are analyzed in the subsection.
Efficient Data Authentication Techniques: Stream Signatures
Stream signatures aim to provide efficient source and data authentication for scalable data. For example, in modern image and video coding
standards such as JPEG2000 and MPEG-4, some receivers opt for not
receiving the multimedia data at the maximum rate (best quality). In that
case, traditional authentication systems utterly fail, because verification is only possible when the media has been fully received.
Hash and Sign Method. The simplest but most inefficient stream
signature is the hash and sign method (see Figure 20.3), which is used
in commercial authentication systems. The bit stream is first divided into
blocks of equal size and then the hash value of each block is signed by
610
As one can see in Figure 20.4, every node in the tree contains the hash
of its children. During transmission, every packet p has overhead equal to
the size of hash values that constitute the path in the tree that leads to
the packet. For data comprising k packets, the amount of hash values np
that must
P precede each packet for successful authentication is either
log2 k, kp 1 np
Pk log2 k, when packet-loss robustness is required, or
0 np log2 k, kp 1 np k, when a minimum number of transmitted
packets is required but by adding complexity to the receiver.
Offline or Recursive Method. The recursive stream authentication
method of Reference [9] has less overhead than the tree methods,
requiring less buffering at the receiver end. Looking at Figure 20.5, block
611
Figure 20.4.
signature.
Figure 20.5.
612
An important characteristic of watermarking is that watermarking techniques do modify the data, although these modifications are supposed to be imperceptible to the consumer; Quality
degradation and modification of the original artistic medium
may dissatisfy the candidate recipients of this technology artists
and consumers. Also, certain applications may strictly not allow
any modification of the data, as the example of evidential images.
The complexity and the computational power demand of stateof-the-art watermarking techniques are deemed to be unreasonable; this is a weak point when targeting consumer electronics and
portable devices technology. Watermarking is probably better
suited in the subject of intellectual property, rather than casual
data integrity control.
614
Labeling in the jpm and jpx File Formats. The jpx format is described as
an extension in JPEG2000ISO/IEC FCD15444-2, Annex L: extended file
format syntax. It allows the insertion of XML descriptors and MPEG-7
metadata. It also incorporates an Intellectual Property Rights (IPR) box
and a Digital Signature Box that uses certain hash and digital signature
algorithms in order to protect any data part in the .jpx file.
The term box refers to a binary sequence that contains objects and
has the general form jsizejtypejcontentsj. Unknown types of box are to be
ignored by normal decoders (according to the standards L.10 directive),
so there is the possibility of creating custom boxes without losing
compatibility. This, in fact, enables one to create more diverse labeling
methods, such us the one presented in this chapter.
The .jpm (ISO/IEC FCD15444-6) standard is about the transmission of
images through the Internet and it also allows the use of XML and IPR
boxes. In both of these formats, JPEG2000 codestreams in jpc and jp2
format can be inserted as boxes (named Media Data box and Contiguous
Codestream box in the .jpm format).
The incorporation of labeling methods in the jpx and jpm standards is
very simple and possibly more robust than using the JPEG2000 labeling
methods described earlier.
Labeling in Motion-JPEG2000 and MPEG-4. The JPEG2000, MotionJPEG2000, MPEG-4, and QuickTime standards have some common characteristics: They all share the same file format structure that comprises
objects. Boxes in Motion-JPEG2000 are called atoms in QuickTime and
MPEG-4 and they are formatted as jsizejtypejcontentsj. Furthermore,
unknown objects are ignored, a directive that allows the definition of
objects specifically designed for our authentication application. The
reader is referred to References 26 and 27.
Content-based Multimedia
Advanced data integrity techniques include content-based signatures.
Content-based digital signatures are data integrity techniques applied
to the data content, such as low-level image descriptors (edges [28]
with the popular canny edge detector [29], textures, histograms [30],
615
In that case, we could say that the encoded image is not contentoriented but, instead, that it is encoded hierarchically in a contentoriented way. This means that, on the one hand, the image is coded
losslessly (no information is being left out), but, on the other hand, the
encoded images packet hierarchy is carefully chosen so that the most
probably important image elements are encoded first.
To conclude, JPEG2000s design enables us to do integrity authentication of the scalable bit stream by using scalable labeling and at the same
time establish a connection to the original images content. This fact
enables the characterization of the labeled encoded JPEG2000 image as
a self-contained content-based digital signature. This type of digital
signature is very useful in reliable multimedia storage in insecure systems
(section Application Scenarios).
617
An example of the complete authentication signature for the lowspeed Internet is presented in Figure 20.6. In general, the length of the
blocks that the stream signature operates on should actually be selected
according to the application needs; the key factor would be the data rate,
thus providing low overhead for the signature bits in each case.
Optimizing the Stream Signature Technique for Scalable Video. A modification to stream signatures that exists since the creation of authentication
618
Figure 20.6. These successive instances of the JPEG2000 image have 160-bit
authentication tags embedded after each coding termination. Numbers show
the file size in bytes after the insertion of each tag.
620
ACKNOWLEDGMENT
The author would like to thank Associate Prof. Dimitrios Mitrakos of
Aristotle University of Thessaloniki, Department of Electrical and
Computer Engineering for his encouragement and support during the
early version of this work, as found in Reference 36.
623
REFERENCES
1. ISO/IEC JTC 1/SC 29/WG 1, coding of still pictures. JPEG 2000 Part I Final Committee
Draft Version 1.0, ISO/IEC FCD15444-1: 2000 (V1.0, 16 March 2000).
2. Tutorial on JPEG2000 and overview of the technology, http://jj2000.epfl.ch/
jj_tutorials/index.html.
3. Rabbani, M. and Santa-Cruz, D., The JPEG 2000 Still-Image Compression Standard,
presented at International Conference in Image Processing (ICIP), Thessaloniki,
Greece, 2001.
4. Christopoulos, C., Skodras, A., and Ebrahimi, T., The JPEG2000 still image coding
system: An overview, IEEE Trans. Consumer Electron., 46(4), 11031127, 2000.
5. Special edition on JPEG 2000, Ebrahimi, T., Christopoulos, C., and Leed, D.T., Eds.,
Signal Processing: Image Communication, Science Direct, 17(1), January 2002.
6. Adams, M., The JPEG-2000 still image compression standard, ISO/IEC JTC 1/SC 29/WG
1 N 2412, December 2002.
7. Merkle, R., A certified digital signature, in Advances in Cryptology, Crypto 89, Brassard,
G., Ed., Lecture Notes in Computer Science, Vol. 435, Springer-Verlag, New York,
1989, pp. 218238.
8. Menezes, A., van Oorschot, P., and Vanstone, S., Handbook of Applied Cryptography.
CRC Press, Boca Raton, FL, 1997.
9. Gennaro, N. and Rohatgi, P., How to Sign Digital Streams, presented at Crypto 97,
1997.
10. Perrig, A., et al., Efficient Authentication and Signing of Multicast Streams over Lossy
Channels, presented at IEEE Symposium on Security and Privacy 2000, pp. 5673.
11. Golle P. and Modadugu, N., Authenticating Streamed Data in the Presence of
Random Packet Loss, presented at ISOC Network and Distributed System Security
Symposium, 2001, pp. 1322.
12. Wong, C. and Lam, S., Digital signatures for flows and multicasts, IEEE/ACM Trans. on
Networking, 7(4), 502513, 1999.
13. Miner, S. and Staddon, J., Graph-Based Authentication of Digital Streams, presented
at IEEE Symposium on Security and Privacy, 2001.
14. Anderson, R., et al., A new family of authentication protocols, ACM Operating Systems
Review, 1998, Vol. 32, no. 4. ACM, pp. 920.
15. Rohatgi, P., A Compact and Fast Hybrid Signature Scheme for Multicast Packet
Authentication, in Proceedings of 6th ACM Conference on Computer and Communication Security, 1999.
16. Perrig, A., et al., Efficient and Secure Source Authentication for Multicast, presented
at Network and Distributed System Security Symposium, NDSS 01, 2001.
17. Mintzer, F., Lotspiech, J., and Morimoto, N., Safeguarding digital library contents and
users. Digital watermarking. D-Lib on-line Mag., 1997, http://www.dlib.org.
18. Anderson, R. and Petitcolas, F., Information Hiding: An Annotated Bibliography,
http://www.cl.cam.ac.uk/fapp2/steganography/bibliography/.
19. Petitcolas, F., Anderson, R., and Kuhn, M., Attacks on copyright marking systems,
Second Workshop on Information Hiding, Lecture Notes in Computer Science,
Aucsmith, D., Ed., 1998, Vol. 1525, pp. 218238.
625
626
627
21
Signature-Based
Media
Authentication
Qibin Sun and Shih-Fu Chang
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
629
INTRODUCTION
Basic Concepts and Definitions
Multimedia authentication is a relatively new research area compared
to other traditional research areas such as multimedia compression.
Different researchers with different research backgrounds may have
different understandings of the term authentication. For example,
people from multimedia watermarking usually use the term authentication to refer to content integrity protection; people from biometrics may
use the term to refer to source identification, verification, and so forth.
To help the reader, we first introduce some concepts and definitions
borrowed from cryptography [1,2] in which the digital signature
techniques have been well studied for traditional computer systems.
We then extend them for multimedia applications.
Date authentication: Data authentication is a process determined by
the authorized receivers, and perhaps the arbiters, that the particular
data were most probably sent by the authorized transmitter and have not
subsequently been altered or substituted. Data authentication usually
associates with data integrity and nonrepudiation (i.e., source identification) because these issues are very often related to each other: Data that
have been altered effectively should have a new source; and if a source
cannot be determined, then the question of alteration cannot be settled
(without reference to the original source).
Data integrity: Data integrity means that the receiver can verify that
the data have not been altered by even 1-bit during the transmission.
The attacker should not be able to substitute false data for the real data.
Nonrepudiation: Nonrepudiation, also called source identification,
means the data sender should not be able to falsely deny the fact that
he sent the data.
630
631
Figure 21.1.
Figure 21.2.
634
Figure 21.6.
One example for lossless watermarking is to compress the least significant bit (LSB) portion of the multimedia data by a lossless data
compression algorithm such as ZIP and then insert the signature into the
free space earned by this lossless data compression algorithm. Interested
readers can refer to Chapter 17 of this handbook for more details on how
to embed the data in a lossless way.
638
Referring to Figure 21.7, we can see that the module of the crypto
hash in Figure 21.1 has been replaced with the module of feature
extraction here in order to tolerate some incidental distortions.
The replacement is applied because the acceptable manipulations will
cause changes to the content features, although the changes may be
small compared to content-altering attacks. Such allowable changes to
the content features make the features non-crypto-hashing. (Any minor
changes to the features may cause a significant difference in the hashed
code due to the nature of the crypto hashing methods such as MD5
and SHA-1.) Accordingly, as a result of the incompatibility with crypto
hashing, the generated signature size is proportional to the size of
the content, which is usually very large. In typical digital signature
algorithms, the signature signing is more computational than signature
verifying; this will also result in a time-consuming signing process
because the size of the formed signature is much greater than 1024 bits
[1,2] and it has to be broken into small pieces (less than 1024 bits)
for signing. On the other hand, no crypto hashing on selected features
will make the decision of authenticity, which is usually based on
640
Xie et al. [27] proposed a content hash solution for image authentication called Approximate Message Authentication Codes (AMACs).
The AMAC is actually a probabilistic checksum calculated by applying a
series of random XOR operations followed by two rounds of majority
641
Figure 21.8.
To achieve the requirements described, another type of signaturebased approaches is proposed [3035]. Crypto hashing was incorporated into the proposed approaches to fix the length of the generated
signature and enhance system security. Figure 21.9 illustrates the
whole framework which comprise three main modules. Signature
Input
Original image to be signed Io
Begin
644
2.
End
Hash the concatenated codeword sequence Z to obtain H(Z )
Sign on H(Z ) by the owners private key to obtain the signature S
End
Output
Watermarked image Iw
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
Message
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
Codeword
PCB
0
1
1
0
1
0
0
1
0
1
1
0
0
1
1
0
0
1
0
1
1
0
1
0
Message
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
Message
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
PCB
0
1
1
0
1
0
0
1
1
0
0
1
1
0
0
1
1
0
1
0
0
1
0
1
645
Media
JPEG [31]
Lossy compression,
transcoding
JPEG2000 [32]
Lossy compression,
rate transcoding
MPEG-4/object [33] Lossy compression, object
translation/scaling/rotation,
segmentation errors
MPEG-1/2 [34]
Lossy compression, frame
resizing, and frame dropping
JPEG [35]
Lossy compression, noise,
packet loss
Malicious Attacks
Features
Watermarking
DCT quantization
Wavelet coefficients
Bitplane modification
648
TABLE 21.2.
Figure 21.11.
Multicycle compression. Typically, this involves reencoding a compressed image to a lower bit rate. The reencoding may be repeated
multiple times. In practice, there is a minimal requirement for
image quality. Thus, one may set a minimal acceptable bit rate
beyond which the image will be considered unauthentic. Such
manipulations will unavoidably introduce some incidental distortions. Assume that we want a 1-bpp JPEG2000 compressed image.
One example is that the original image repeatedly undergoes
JPEG2000 compression; that is, compressing the original image
into 1 bpp and decompressing it to an 8-bit raw image (one
cycle), compressing it again and again at 1 bpp. Furthermore, we
know that JPEG2000 provides much flexibility in compressing an
image to a targeted compression bit rate, such as directly encoding
from raw image or truncating or parsing from a compressed
codestream. Another example of obtaining a 1-bpp JPEG2000
compressed image is to compress the original image with full,
4 bpp, and 2 bpp, respectively and then truncating to the targeted
bit rate of 1 bpp.
Format or codec variations. Differences may exist between different
implementations of JPEG2000 codec by different companies. Such
differences can be due to different accuracies of representation in
the domains of (quantized) WT, color transformation, and the pixel
domain.
Watermarking. Image data are manipulated when authentication
feature codes are embedded back into the image. Such a
manipulation should not cause the resulting image to be considered unauthentic.
Figure 21.14.
Figure 21.15.
654
Figure 21.16.
(signing).
Figure 21.17.
(verifying).
The verifying operation is also similar to the lossy one, with the exception of watermark extraction. The code block is divided in patches and
difference value of each patch is calculated in the same way as lossless
sign. For each patch, if value is beyond the threshold, a bit of 1 is
extracted and the difference value is shifted back to its original position,
which means that original coefficients are recovered. If the value is
inside the threshold, a bit of 0 is extracted and nothing needs to be
done. Finally, an ECC correction is applied on the extracted bit sequence
to get the correct watermark bits.
More detailed test results are given in Reference 36. Figure 21.18 and
Figure 21.19 compare the image quality and file size before and after
signing. The image is encoded with a 9 7 filter (with and without lossy
watermark) and a 5 3 filter (with and without lossless watermark),
respectively. We can see that the image quality drops slightly with a
watermark embedded and no significant difference between the image
sizes.
In summary, the main contributions of the proposed solution could be
listed as follows.
Figure 21.19. File size comparison between compressed and compressed plus
signed JPEG2000 images.
used for locating malicious modifications and the final contentbased signature is obtained by cryptographically hashing all
corresponding codewords (message PCB) to make sure that no
catastrophic security holes exist.
The proposed solution can be incorporated into various security
protocols (symmetric and asymmetric): By using the
ECC scheme on extracted features, we can obtain stable crypto
hash results, thus gaining more room and flexibility in designing
authentication protocols in real applications. For example, by
using PKI, the content signer and content verifier can own different
keys. Working under PKI will also let the proposed solution
be easily incorporated into current data-based authentication
platforms.
The proposed solution is fully compatible with the JPEG2000
encoder and decoder ( Part 1). Based on the description
given earlier, we can see that all features are directly extracted
from EBCOT and the generated watermarks are also embedded
back in the procedure of EBCOT. The proposed solution is
fully compatible with JPEG2000 codec and could efficiently
co-work with the procedure of JPEG2000 image compression and
decompression.
Figure 21.20.
Usually content hashing is very robust to the acceptable manipulations. However, a robust content hashing is also robust to the
local attacks. Analysis on the security of content hashing is a little
complicated because it relates to the following issues. The first one is the
feature selected for content hashing: A good feature representation is
the first step to design a secure content hashing scheme. The second one
is the length of the generated content hash code because it implicitly
means how detailed the content is represented given the same feature
1
A typical brute force attack against crypto hashing: An adversary would like to find two
random messages, M and M 0 , such that H(M) H(M 0 ). It is named the birthday attack
because it is analogous to finding two people with the same birthday [1].
662
Figure 21.21. (a) Original image; (b) attacked image of (a). (c) Original image;
(d) attacked image of (c).
Figure 21.22. The histograms of face images and the attacked image.
(a) Histogram of Figure 21.21a; (b) histogram of Figure 21.21b; (c) the modified
image of Figure 21.21b but with the same histogram as Figure 21.21a.
Figure 21.23.
cation.
In Figure 21.24 (first row), the image on the left is taken as the
reference and the other four images on the right are taken as the attack
images. Our purpose is to modify these four images until they can
pass the authentication whose signature is based on the reference
image. We illustrated three examples for every 8 8 DCT block: DC
coefficients only (the second row), three AC coefficients (the third row),
and DC plus two AC coefficients (the fourth row). Visually, DC plus AC
coefficients are the best selection and, in practice, use as many DCT
coefficients as possible.
665
O0
O
and
FAR
M0
M
Complete
authentication
Fragile
authentication
Semifragile
(watermarking)
Semifragile
(nonhashing)
Semifragile
(content hashing)
Semifragile
(crypto hashing)
Security
Attack
localization
Yes
Yes
No
No
Yes
Yes
No
Very strong
Yes
Yes
No
Yes
Weak
Yes
Yes
Yes
Yes
Very large
Some
Some
Yes
Yes
Some
Large
Some
No
Yes
Yes
Yes
1024 bits
Strong
Yes
670
671
672
Part V
Applications
22
Digital
Watermarking
Framework
Applications, Parameters,
and Requirements
Ken Levy and Tony Rodriguez
INTRODUCTION
Digital watermarks are digital data that are embedded into content
and may survive analog conversion and standard processing. Ideally, the
watermark data are not perceptible to the human eye and ear, but can be
read by computers. Digital watermarks, as a class of techniques, are
capable of being embedded in any content, including images, text, audio,
and video, on any media format, including analog and digital.
Digital watermark detection is based on statistical mathematics
(i.e., probability). In addition, there are numerous digital watermarking
techniques. As such, for a framework chapter like this, many descriptions
include terms such as usually or likely due to the probabilistic nature
of detection as well as generalizing the numerous techniques.
Importantly, digital watermarks are traditionally part of a larger system
and have to be analyzed in terms of that system. This chapter reviews
a framework that includes digital watermark classifications, applications,
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
675
Help technology and solution providers design appropriate watermark algorithms and systems
Aid potential customers in understanding the applicability of
technology and solutions to their markets
Local data
Persistent identifier that links to a database
The local data can control the actions of the equipment that detected
the digital watermark or has value to the user without requiring a remote
database. The persistent identifier links the content to a database,
usually remote, which may contain any data related to that content, such
as information about the content, content owner, distributor, recipient,
rights, similar content, URL, and so forth.
Digital watermarking algorithms can also be classified as robust and
fragile. Although there are other types, such as semifragile, tamperevident, and invertible, this chapter is limited to the base types of robust
and fragile to simplify our discussion. A robust digital watermark should
survive standard processing of the content and malicious attacks up to
the point where the content loses its value (economic or otherwise) as
dictated by the specific application and system. It can, for example, be
used for local control and identification.
A fragile digital watermark, on the other hand, is intended to be brittle
in the face of a specific transformation. The presence or absence of the
watermark can be taken as evidence that the content has been altered.
For some applications, this can also be achieved by using appropriate fingerprinting or hashing techniques in conjunction with a robust
watermark. Regardless of the implementation, these techniques are
traditionally used when the desire is to determine if the content has been
manipulated. In many cases, it is desirable to employ both robust
and fragile watermarks in the same content: one to act as a persistent
identifier and for local control and the other as an indicator that the
content has been modified or has gone through a specific transformation.
Another classification is based on the detection design parameter
in watermarking algorithms, whether they are designed to do blind
676
Figure 22.1.
Annotation
Annotation refers to hiding information, usually about the content, in
the content. This approach may be more robust than using headers
because annotations are part of the content and can take less space than
headers because the information is part of the content. Most robust
techniques do not have the data capacity for annotations, but invertible
techniques can work perfectly for annotations. An example involves
embedding a persons medical information in an x-ray with an invertible
technique so the X-ray can be read in nonmodified form even after
embedding.
Copyright Communication
Content often circulates anonymously, without identification of the
owner or an easy means to contact the owner or distributor to obtain
rights for use. Digital watermarks enable copyright holders to communicate their ownership, usually with a public detector, thereby helping
to protect their content from unauthorized use, enabling infringement
detection and promoting licensing. The watermark payload carries a
persistent copyright owner identifier that can be linked to information
about the content owner and copyright information in a linked database.
For example, photographs can be embedded with the photographer
owners ID to determine whether two photos were taken from a similar
location at a similar time, or that one is an edited copy of another. The
same can occur with video, such as TV news.
678
Authentication
Digital watermarks can provide authentication by verifying that the
content is genuine and from an authorized source. The digital watermark
identifies the source or owner of the content, usually in a private system.
The system can recognize the private watermark on the local machine
or link the content owner to a private database for authentication. For
example, surveillance video recorders can be embedded with an ID that
links it to that specific video recorder. Additionally, ID cards can be
embedded with the authorized jurisdiction.
680
TABLE 22.1.
Robustness
Reliability (FP)
Payload
Granularity
Annotations
Pro
Low
Very
kilobytes
N/A
Copyright
Communications
Copy Protection
Pro
Very
Very
32 bits
Fine
Pro
Extremely
Extremely
8 bits
Fine
Monitoring
Consumer
Very
Very
32 bits
Fine
Filtering
Consumer
Very
Very
8 bits
Fine
Authentication
Pro
Extremely
Extremely
32 bits
Fine
Integrity
Pro
Low
Extremely
32 bits
Very fine
Forensic
Tracking
DAM
Pro/consumer
Extremely
Extremely
32 bits
Coarse
Very
Very
32 bits
Fine
DRM
Pro
Extremely
Very
32 bits
Fine
Remote
Triggering
e-Commerce
Consumer
Embed any
Detect any
Embed any
Detect fast
Embed any
Detect very fast
Embed any
Detect very fast
Embed any
Detect very fast
Embed any
Detect very fast
Embed any
Detect very fast
Embed very fast
Detect any
Embed any
Detect very fast
Embed any
Detect very fast
Embed any
Detect any
Embed any
Detect very fast
Very
Very
32 bits
Fine
Extremely
Very
32 bits
Fine
Pro
Consumer
687
Perceptibility Key: Pro acceptable to professional; Consumer acceptable to consumer. Performance Key: Any the faster the better. False
Positive (FP) Key: extremely 109; very 106; not very 103 (approximate values).
Perceptibility
WORKFLOW
Workflow is an essential part of a digital watermarking system and
framework. The watermarking solution must cause minimal disturbances
on the workflow for the customer to adopt the solution because changes
in workflow can be more expensive for the customer than the watermarking solution. The requirements for the applications in Table 22.1 are
chosen to minimize workflow problems. For example, in forensic tracking,
because each piece of content can require a unique ID at the time of
distribution, the embedder must be very efficient (as shown in Table 22.1)
to not cause workflow troubles.
However, workflow is extremely dependent on system details and
cannot be generalized. For example, determining whether the input to the
detector is SDI, MPEG-2, or analog video can be critical, but it is highly
system dependent. As such, workflow is not considered further within
this chapter.
SUMMARY
In summary, this framework includes classifications, definitions of
distinct but related watermark applications, six important watermark
algorithm parameters, requirements for applications in terms of these
parameters, and workflow. This digital watermark framework is helpful to
technology and solution providers in designing an appropriate watermark algorithm and solution, and to customers for evaluating the
watermark technology and related solutions.
688
689
23
Digital Rights
Management for
Consumer Devices
Jeffrey Lotspiech
INTRODUCTION
The ongoing digital revolution has presented both a threat and a promise
to the entertainment industry. On one hand, digital technologies promise
inexpensive consumer devices, convenient distribution and duplication, and flexible new business models. On the other hand, digital copies
are perfect copies, and the industry risks losing substantial revenue
because of casual end-user copying and redistribution. The entertainment industry and the technology companies that service it have been
attacking this problem with increasing sophistication. This chapter outlines the history of this effort and concludes with some discussion of
a possible future direction.
Digital rights management in consumer electronics devices, to date, is
about restricting unauthorized copies. As such, it has been the subject
of much controversy and misunderstanding. Bits want to be free, is
John Perry Barlows catchy phrase [1], and others, even cryptographers
such has Bruce Schneier, have joined the chorus. Schneier [2] argues that
any technology that tries to prevent the copying of bits is doomed to
failure. This is probably true, but misses the point. Bits can be freely
copied, moved, and exchanged, but it is possible that the bits themselves
are encrypted. The devices that have the keys necessary to decrypt the
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
691
MACROVISIONTM
Soon after video cassette recorders (VCRs) hit the market in the late
1970s, the race was on for a technology that would prevent consumers
simply copying prerecorded movies for their friends. As early as 1979,
1
692
697
Figure 23.1.
698
DIVX
After examining a successful copy protection technology like CPRM, it
might provide some balance to consider a copy protection scheme
that failed in the market. Divx was a technology developed by Circuit
City together with some Hollywood-based investors. It was an innovative
idea, with some interesting consumer and distribution advantages that
the company seemed singularly unable to articulate. Fundamentally, Divx
was a rental replacement technology. Built on top of the DVD technology,
the Divx player would allow the consumer to play the disk for a 48-h
period. If the user wanted to play it again later or if he had obtained the
disk from a friend who had already used the 48-h period, that was
permitted, but it would require an additional payment.
Divx was implemented by requiring the player to have a phone connection to a clearinghouse and periodically to call and report usage. If the
consumer blocked the phone calls, the player would eventually refuse
to play disks. A consumer might be able to obtain a few weeks of free play
by that trick, but it would be a one-time attack.
What were the advantages of the Divx technology? It had the potential to revolutionize the way the movie rental business works. Today,
the most difficult aspect of renting movies, from the point of view of the
outlet offering them, is maintaining inventory. In other words, the outlets
must keep track of the returned movies. Most retail stores shy away
from this complication. Divx disks, however, were designed not to be
4
5
700
701
(g x )y (g y )x.
There is no efficient way to calculate x if you are given the
remainder of gx after dividing by some number (such as a large
prime), even if you know g. Thus, if g and x are large numbers,
finding x is an intractable calculation.
All the devices in the system agree upon a number g and a prime P;
these are wired in at the factory. One device picks a random x and
transmits the remainder of g x after dividing by P; the other device picks a
random y and transmits the remainder of g y after dividing by P. The first
device raises g y to the x power; the second device raises g x to the y
power. Thus, they both calculate the remainder of g xy after dividing by P,
and this becomes the common key in which they encrypt other keys. In
effect, they have established a secret value while having a completely
public conversation.
In DTCP, each device signs its DiffieHellman messages with the
private key that corresponds to the public key in its certificate; this is
called an authenticated DiffieHellman protocol. Second, the signatures
in the messages and the values in the DiffieHellman protocol are not
integers, but points on an elliptic curve. Although a discussion of this
elliptical curve cryptography is too detailed for this chapter, it does not
change the underlying DiffieHellman mathematical principle. Instead,
702
704
3.
4.
10
709
The other values that are used to calculate the domain key are a domain identifier and, of
course, a media key from a media key block. The latter guarantees that only compliant
devices can calculate the domain key.
12
eBay is a trademark of eBay Inc.
711
REFERENCES
1. Barlow, J.P., The economy of ideas, Wired, Vol. 203, March 1994.
2. Schneier, B., The Futility of Digital Copy Prevention, http://schneier.com, May 15,
2001.
3. Fiat, A. and Naor, M. Broadcast encryption, in Advances in Cryptology Crypto 93,
Lecture Notes in Computer Science Vol. 773, Springer-Verlag, Berlin, 1994,
pp. 480491.
4. Wallner, D.M., Harder, H.J., and Agee, R.C., Key management for multicast: issues
and architectures, IETF draft wallner-key, July 1997. ftp://ftp.ietf.org/internet-drafts/
draft-wallner-key-arch-01.txt.
5. Wong, C.K., Gouda, M., and Lam, S., Secure Group Communications Using Key
Graphs, in Proceedings of ACM SIGCOMM98, 1998, p. 68.
6. Naor, D., Naor M., and Lotspiech, J., Revocation and tracing routines for stateless
receivers, in Advances in Cryptology Crypto 2001, Lecture Notes in Computer
Science Vol. 2139, Springer-Verlag, Berlin, 2001, p. 41.
712
713
24
Adult Image
Filtering for
Internet Safety
Huicheng Zheng, Mohamed Daoudi,
Christophe Tombelle, and Chabane Djeraba
INTRODUCTION
Internet filters allow schools, libraries, companies, and personal computer users to manage the information that final users can access. Filters
may be needed for legal, ethical, or productivity reasons. In the context of
legal reason, many new laws require schools and libraries to enforce the
policy of Internet safety that protects against minors and adults access to
unsuitable visual content that are obscene, child pornography, or harmful
to minors. More generally:
Parents can check their home computer for evidence of inappropriate usage by children, as well as protect their children from files
that may exist there already.
Teachers can use the content filters to make sure no improper documents have compromised school and university computer systems. So, schools need to limit access to the Web sites so children
will not be exposed to objectionable or inappropriate documents.
Companies can use the content filters to make sure their systems
are not contaminated with adult or otherwise unsuitable documents
and to determine if files were downloaded during office hours.
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
715
Paper Objective
This chapter is aimed at the detection of adult images appearing in the
Internet. Indeed, images are an essential part of todays World Wide Web.
The statistics of more than 4 million HTML Web pages reveal that 70.1%
of webpages contain images and that on average there are about 18.8%
images per HTML Webpage [10]. These images are mostly used to make
attractive Web contents or to add graphical items to mostly textual
content, such as navigational arrows.
However, images are also contributing to harmful (e.g., pornographic)
or even illegal (e.g., pedophilic) Internet content. Therefore effective
filtering of images is of paramount importance in an Internet filtering
solution.
Protecting children from harmful content on the Internet, such as
pornography and violence, is increasingly a research topic of concern.
Fleck et al. [11] detects naked people with an algorithm involving a skin
filter and a human figure grouper. The WIPE system [12] uses Daubechies
wavelets, moment analysis, and histogram indexing to provide semantically meaningful feature vector matching. Jones and Rehg [13] propose
techniques for skin color detection and simple features for adult images
detection. Bosson et al. [14] propose a pornographic image detection
system that is also based on skin detection and the multilayer perception
(MLP) classifier.
722
Figure 24.1.
725
Figure 24.2. The output of skin detection is a gray-scale skin map with the
gray level indicating the probability of skin.
where (xs, xt, ys, yt) > 0 are parameters that should be set up to satisfy
the constraints. This model is refered as the first-order model (FOM).
Parameter estimation in the context of MaxEnt is still an active research
subject, especially in situations where the likelihood function cannot
be computed for a given value of the parameters. This is the case here.
We use the Bethe tree approximation to deal with parameter estimation
and build a model TFOM (tree approximation of FOM). We use the BP
algorithm to obtain an exact and fast solution. The detailed description
can be found in our previous work [21].
Our TFOM model outperforms the baseline model in Reference 13,
as shown in Reference 21 in the context of skin pixel detection rate and
false-positive rate. The output of skin detection is a skin map indicating
the probabilities of skin on pixels. In Figure 24.2, skin maps of different
kinds of people are shown.
Vezhnevets et al. [22] recently compared some most widely used skin
detection techniques and conclude that our skin detection algorithm [21]
gives the best performance in terms of pixel classification rates.
Figure 24.3. The original input image (left); the global fit ellipse on the skin
map (middle); the local fit ellipse on the skin map (right).
six features are computed on the largest skin region of the input image:
(1) distance from the centroid of the largest skin region to the center of
the image, (2) angle of the major axis of the LFE from the horizontal axis,
(3) ratio of the minor axis to the major axis of the LFE, (4) ratio of the area
of the LFE to that of the image, (5) average skin probability inside the LFE,
and (6) average skin probability outside the LFE.
Evidence from Reference 14 shows that the MLP classifier offers a
statistically significant performance over several other approaches, such
as the generalized linear model, the k-nearest-neighbor classifier, and the
support vector machine. In this chapter, we adopt the MLP classifier. We
train the MLP classifier on 5084 patterns from the training set. In the
testing phase, the MLP classifier intakes a quick decision on the pattern
in one pass, and the output is a number op2 [0, 1], corresponding to the
degree of adult. One can set a proper threshold to get the binary decision.
EXPERIMENTAL RESULTS
All experiments are made using the following protocol. The database
contains 10,168 photographs, which are imported from the Compaq
Database and the Poesia Database. It is split into two equal parts
randomly, with 1,297 adult photographs and 3,787 other photographs
in each part. One part is used as the training set and the other one,
the test set, is left aside for the receiver operating characteristics curve
computation. The performance figures are initially correlated into a
confusion matrix (Table 24.1), where the letters are defined as follows:
Actual
Harmless pages
Harmful pages
Total
Harmless
(Accepted) Pages
Harmful
(Blocked) Pages
Total
a
c
C
b
d
D
A
B
728
Figure 24.4.
Harmful
Harmless
Total
Harmful
Harmless
Total
Precision
Recall
F-Measure
910
200
1110
0.82
0.91
0.86
90
800
890
0.90
0.80
0.85
1000
1000
2000
729
Figure 24.5. Experimental results on nonadult images. Below the images are
the associated outputs of the MLP.
Figure 24.6.
The elapsed time of image filtering is about 0.18 sec per image.
Compared with 6 min in Reference 11 and 10 sec per image in Reference
12, our system is more practical. Figure 24.5 shows some examples of
nonadults images with the corresponding output neural networks.
However, there are also some cases where this detector does not
work well. In Figure 24.6, several such examples are presented. The
first adult image is not detected because the skin appears almost white
due to overexposure. We see that most of the skin is not detected on
the skin map. The second adult image contains two connected large
frames. The LFE of this image will then be very large, and the average
skin probability inside this LFE will be very small. The third image is
benign, but it is detected adult because the toy dog takes a skinlike
color and the average skin probabilities inside the GFE and the LFE are
very high. The fourth image is a portrait but decided adult because it
exposes a lot of skin and even the hair and the clothes take skinlike
colors. We believe that skin detection based solely on color information
cannot do much more, so perhaps some other types of information are
needed to improve the adult image detection performance. For example,
some kind of face detector could be implemented to improve the results.
However, generally, adult images in Web pages tend to appear together
and are surrounded by text, which could be an important clue for the
adult content detector.
730
2.
3.
To improve the performance of our filters, we can use a face detector in the adult image filter. Research in face detection has progressed
significantly. The best systems recognize 90% of faces, with about
5% false positives. This is good performance and getting much better.
In 35 years, the computer vision community will have many good facedetection methods. This might help in identifying pornography, because
skin with a face is currently more of a problem than skin without a face.
Face-detection technology probably can be applied to very specific body
parts; text and image data and connectivity information also will help.
However, distinguishing semantic concepts such as hard-core from softcore pornography will remain difficult. These terms are rough, and the
relationships between visual features of images and these semantic
concepts are not evident. In general, images tend to appear together and
are surrounded by text in Web pages, so combining with text analysis
could improve the performance of our image filters.
REFERENCES
1. van Rijsbergen. C. J., Information Retrieval, Butterworths, Stoneham, MA, 1979.
2. Salton G. and McGill, M. J., Introduction to Modern Information Retrieval, McGraw-Hill,
New York, 1983.
3. Frakes, W. B. and Baeza-Yates, R., Information Retrieval: Data Structures & Algorithms,
Prentice-Hall, Englewood Cliffs, NJ, 1992.
4. Lesk, M., Books, Bytes, and Bucks, Morgan Kaufmann, San Francisco, 1997.
5. Strzalkowski, T., Ed., Natural Language Information Retrieval, Kluwer Academic,
Boston, 1999.
6. Rosch, E., Principles of categorization, in Cognition and Categorization, Rosch, E. and
Lloyd, B. B., Eds., Lawrence Erlbaum Associates, Hillsdale, NJ, 1978, pp. 2748.
7. Spitz, L. and Dengel, A., Eds., Document Analysis Systems, World Scientific, Singapore,
1995.
8. Jonassen, D. H., Semantic network elicitation: tools for structuring hypertext, in
Hypertext: State of the Art, McAleese, R. and Green, C., Eds., Intellect Limited, Oxford,
1990.
731
732
25
Combined Indexing
and Watermarking
of 3-D Models Using
the Generalized 3-D
Radon Transforms
Petros Daras, Dimitrios Zarpalas,
Dimitrios Tzovaras, Dimitrios Simitopoulos,
and Michael G. Strintzis
INTRODUCTION
Increasingly in the last decade, improved modeling tools and scanning
mechanisms as well as the World Wide Web are enabling access to and
widespread distribution of high-quality three-dimensional (3-D) models.
Thus, companies or copyright owners who present or sell their 3-D
models are facing copyright-related problems.
Watermarking techniques have long been used for the provision of
robust copyright protection of multimedia material [1,2] as well as for
multimedia annotation, with indexing and labeling information [3]. In
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
733
f x dx
25:1
x2Cg, l
f x dx
25:2
x2g, l
f x dx
25:3
x2Lg
The CIT is a slight modification of the RIT where the cylinder CYLg is
used instead of the line Lg. The radius of the cylinder is Th and its axis
737
Figure 25.1.
f x dx
25:4
x2CYLg
p
jxj2 jx gj2 Th.
The discrete form of CIT, which will be used for the actual extraction of
the shape descriptors, is given by
X
CITgi
f xj ;
i 1, . . . , NCYL , j 1, . . . , J
25:5
xj 2CYLgi
where NCYL is the total number of cylinders and J is the total number
of points xj . An illustration of CIT is given in Figure 25.1, where the red
dots indicate the points xj , the green line segments indicate the lines
Lgi , and the yellow cylinders CYLgi indicate the cylindrical integration
area.
3-D MODEL PREPROCESSING
A 3-D model M is composed of a set of vertices V and a set of connections
between the vertices. Each vertex vi has three coordinates in the
Cartesian space, vi fxi , yi , zi g. Before applying the proposed transform, a
738
Model rotation and translation. Let Q be the class of vectors for all
pairs of vertices of the 3-D model. The vector q1 is calculated,
where jq1 j maxfjqj : q 2 Qg. Further, the most distant vertex vd
from q1 and its projection O0 to q1 are found. Then, the vector
!
q2 O0 vd is formed. The point O0 fx , y, z g is the new origin of the
model. The model is translated so that the new origin coincides
with the old origin:
xi xi x ,
2.
yi yi y,
zi zi z
25:6
The translated, rotated, and scaled model is then placed into a bounding
sphere with radius Ra d max .
Figure 25.2.
25:7
1
1 expddc
25:8
N
CYL
X
CITf gi W jgk gi j; , dc
25:9
i1
i 1, . . . , N
25:10
N
X
g 0 ti
25:11
i1
F3 g
N
X
gti
25:12
i1
F4 g maxfgti g minfgti g,
i 1, . . . , N
25:13
k 1, 2, 3, 4
25:14
Gk Fk CITf , ,
k 1, 2, 3, 4
25:15
and
k, j 1, 2, 3, 4
25:16
Bkj Fj Gk ,
k, j 1, 2, 3, 4
25:17
and
25:18
i 1, . . . , NCYL
25:19
Figure 25.3.
742
2.
Mi1
NS
1 Xi
1 Xi T
Dij1
v ji ,
NSi j1
NSi j1 ij1
i 1, . . . , K
25:20
Whenever NPi is odd, the watermarking procedure simply bypasses the vertex with
minimum projection length in the set Si2 .
743
Figure 25.4.
Mi2
NS
i
1 X
1 Xi T
Dij2
v ji ,
NSi j1
NSi j1 ij2
i 1, . . . , K
25:21
where Dij2 is the distance between each vertex vij2 2 CYLgi and
the axis Lgi in Si2 ( Figure 25.6) and NSi is the total number of
vertices in set Si2 .
The difference ai Mi2 Mi1 is then calculated. If ai > 0, the
watermark is embedded in the vertices of the set Si2 , otherwise,
744
Figure 25.5. Example of the creation of two sets. For the cylinder with
orientation g1 , i 1, the vertices fv111 , . . . , v141 g with projection lengths
fl11 , . . . , l14 g form the set S11 . The vertices fv152 , . . . , v182 g with projection lengths
fl15 , . . . , l18 g form the set S12 .
vW
ij
8
< vij
if ai bi > 0
if ai bi < 0
vij dDi ji
745
25:22
ai Mi2 Mi1
NS
1 Xi T
1 Xi
vij2 ji
vij1 T ji
NSi j1
NSi j1
25:23
Let the watermark be embedded in the vertices of the set Si1 . Then,
NS
W
aW
i Mi2 Mi1
Thus,
8
NS
NS
>
>
1 Xi T
1 Xi T
>
>
v
j
v ji
i
>
ij2
>
NSi j1 ij1
>
< NSi j1
aW
i
>
>
NS
NS
>
>
1 Xi T
1 Xi
>
>
vij2 ji
vij1 dDi ji T ji
>
: NS
NS
i
j1
NS
1 Xi T
1 Xi W T
vij2 ji
v ji
NSi j1
NSi j1 ij1
if ai bi > 0
25:23a
if ai bi < 0
25:23b
j1
Watermark Detection
The block diagram of the watermark detection procedure is depicted
d
d
in Figure 25.7. Let M d be the model, vdij the vertices, Si1
and Si2
the sets,
d
d
d
Dij the distances, Mi1 and Mi2 the mean values of the distances, and adi
the difference Mi2d Mi1d , after geometric attacks. The watermark is
detected as follows:
1.
746
2.
3.
The CIT vector udCIT of the model M d is calculated and the values
of its components are sorted in descending order. The first K of
the values are selected and the corresponding cylinders CYLgi
are identified.
As in step 1 of the embedding procedure, in each selected CYLgi
d
d
the two sets Si1
fvdij1 g and Si2
fvdij1 g are found according to their
d
dT
d
d
lij vij gi . The sets Si1 and Si2 are, however, identical to the sets
Si1 and Si2 because the projections of their vertices onto the axis of
the cylinder they belong to remain the same. Further, Mi1d Mi1W
and Mi2d Mi2W and adi aW
i . Thus, the watermark sequence can be
easily extracted using the formula:
adi signMi2d Mi1d ,
i 1, . . . , NPi
25:24
Recall
Ndetection
Ndetection Nfalse
Ndetection
Ndetection Nmiss
25:26
25:27
Figure 25.8. Comparison of the proposed method (CIT) against the method
proposed in Reference 9 in terms of precisionrecall diagram using the
Princeton database.
(a)
(b)
(c)
(d)
(e)
Figure 25.9. Query results using the proposed method in the Princeton
database. The query models are depicted in the first horizontal line.
the precision of the proposed method is 13% higher than the method
in Reference 9.
Figure 25.11 illustrates the results produced by the proposed
method in the new database. The models in the first horizontal line
750
Figure 25.10. Comparison of the proposed method (CIT ) against the method
proposed in Reference 9 in terms of precisionrecall diagram using the new
database.
are the query models and the rest are the first seven retrieved models.
The similarity between the query model and the retrieved ones is
obvious.
These results were obtained using a personal computer ( PC) with a
2:4-MHz Pentium IV processor running Windows 2000. On average, the
time needed for the extraction of the feature vectors for one 3-D model
is 12 sec, whereas the time needed for the comparison of two feature
vectors is 0:1 msec. Clearly, even though the time needed for the extraction of the feature vectors is relatively high, the retrieval performance is
excellent.
Experimental Results for 3-D Model Watermarking
The proposed 3-D model watermarking technique for data hiding was
tested using models from the above databases. It was specifically tested
for the following:
1.
Robustness to geometric attacks and vertex reordering. The geometric attacks tested were translation, rotation, and uniform
scaling. Due to the preprocessing steps applied to each
model prior to embedding and detecting a watermark sequence,
the percentage of correct extraction, as expected, was 100% for
K 16, 24; and 32 bits. Similarly, because the coordinates of the
vertices do not depend on their order, the percentage of correct
751
(a)
(b)
(c)
(d)
(e)
Figure 25.11. Query results using the proposed method in the new database.
The query models are depicted in the first horizontal line.
2.
SNR PNM
i1 xi
752
2
2
2
i1 xi yi zi
xiW 2 yi yiW 2
zi ziW 2
25:28
3.
753
Figure 25.13.
Figure 25.14.
754
Figure 25.16.
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
Figure 25.18.
CONCLUSIONS
A novel technique for 3-D model indexing and watermarking, based on a
generalized Radon transform (GRT) was presented. The form of the GRT
implemented was the cylindrical integration transform (CIT). After proper
756
The overall method can be applied for any given model without the
necessity for any preprocessing in terms of models fixing
degeneracies.
The descriptor vectors are invariant with respect to translation,
rotation, and scaling of a 3-D model.
The complexity of the object-matching procedure is minimal,
because matching involves simple comparison of vectors.
The watermarking method is robust to geometric attacks such as
translation, rotation, and uniform scaling.
The watermarking method is robust to points reordering attack.
The watermark is imperceptible regardless of the length of the
watermark sequence.
The extraction of the watermark sequence is very fast and accurate.
757
758
26
Digital Rights
Management
Issues for Video
Sabu Emmanuel and Mohan S. Kankanhalli
INTRODUCTION
A huge amount of digital assets involving media such as text, audio,
video, and so forth are being created these days. Digital asset management involves the creation, transfer, storage, and consumption of these
assets. It is chiefly in the transfer function context that the Digital
Rights Management (DRM) becomes a major issue. The term DRM refers
to a set of technologies and approaches that establish a trust relationship among the parties involved in a digital asset creation and transaction. The creator creates a digital asset and the owner owns the digital
asset. The creators need not be owners of a digital asset, as in the case
of employees creating a digital asset for their company. In this case,
although the employees are the creators, the ownership of the digital
asset can reside with the company. The digital asset needs to be transferred from the owner to the consumer, usually through a hierarchy of
distributors. Therefore, the parties involved in the digital asset creation
and transaction are creators, owners, distributors, and consumers.
Each of these parties has their own rights; namely creators have creator
rights, owners have owner rights, distributors have distributor rights, and
0-8493-2773-3/05/$0.00+1.50
2005 by CRC Press
759
be noted that in this chain, the created digital asset moves from the
creator to the consumer through the owner and distributors. Also, the
fee for the digital asset flows from consumer to the creator through
owner and distributors. This can be seen in Figure 26.1. Often, the owner
already pays the fee for the creation of the digital asset to the creators,
as in the case of the employeeemployer relationship. Therefore, the
owner is the one concerned about the loss of revenue due to piracy
and unauthorized usage. The owners can directly deliver the digital asset
to the consumers or they may use a hierarchy of distributors. This can
be seen in Figure 26.1. The hierarchy of distributors is preferred to
deal with the business scalability and viability. Each of the distributors
may have their own already existing consumer bases and infrastructure,
which can easily be tapped into by the owners to deliver their digital
asset to the consumers. When there is a hierarchy of distributors, building trust relationships between the owner and the distributor, and
the distributor and the subdistributor become more complex. This is
because whereas the owner is concerned about the revenue loss due
to malpractices of the distributors and also of the end consumer, the
distributors would be concerned about the false framing by the owners
and also would be concerned about the revenue loss due to the
malpractices of the subdistributors and the endconsumer. Because the
consumer is at the end of the distribution chain, he would be concerned
about the digital asset, its quality and integrity, and also false framing by
the distributor or owner [11]. There should also be a trust relationship
765
The authentication of consumers or distributors is needed to reliably identify the consumer or distributor. In the event of DRM contract
violation, the violating consumer or distributor can be identified. This
can be implemented using authentication protocols. In order to trace
the DRM-contract-violating consumer or distributor and to declare the
ownership of the digital asset, digital watermarking techniques can be
used. The digital watermarking technique should be robust against
attacks and also noninvertible to prove the ownership of the digital asset.
Because the distribution is subscription based, only the consumers (who
have paid the subscription fee) should get a clear video for viewing,
whereas the nonconsumers should not get a clear video for viewing. This
requirement is the confidentiality requirement and can be implemented
using encryption techniques.
Next, we list the owners requirements of free distribution with
archival importance:
For tamper detection, one can employ fragile watermarks. For tracing
the consumer or distributor who has tampered with the video and also
to declare the ownership of the video, noninvertible robust watermarking techniques can be used. Authentication can be implemented
using authentication protocols.
For free distribution with no archival importance, there need not
be any rights associated, as the digital asset is free to use in whichever
way the consumer desires.
767
768
Rights Languages
In order to manage the rights of the parties, the rights are to be specified in a machine understandable way [1,5153]. For this purpose,
a rights data dictionary of allowed words and allowed constructs of
these words is defined. The allowed constructs are often defined using
the XML scheme, which forms the rights language. One of the rights
languages is eXtensible rights Markup Language ( XrML) defined by
ContentGuard Inc., and another is Open Digital Rights Language (ODRL)
from IPR Systems Pvt Ltd. Both have their own rights data dictionary
and vocabulary for the language. These languages express the terms and
conditions over any content, including permissions, constraints, obligations, offers, and agreements with rights holders. XrML is designed to be
used in either single-tier or multi-tier channels of distribution with the
777
TABLE 26.1.
intended
Identifier Codes
BICI ( Book Item and Component Identifier)
SICI ( Serial Item and Contribution Identifier)
DOI ( Digital Object Identifier)
URI ( Uniform Resource Identifier)
ISAN ( International Standard Audiovisual Number)
ISBN ( International Standard Book Number)
ISMN ( International Standard Music Number)
ISRC ( International Standard Recording Code)
ISSN ( International Standard Serial Number)
ISWC ( International Standard Musical Works Number)
UMID ( Unique Material Identifier)
778
Type of Content
Book content
Serial content
Any creation
Any creation
Audiovisual programs
Books
Printed music
Sound recordings
Serials
Musical works
Audiovisual content
784
785
786
787
Index
A
AAP. See Association of American
Publishers
Access control lists, 34
Access keys, secure distribution of, 12
Adaptive filtering topology, 71
ADSL. See Asymmetric Digital Subscriber
Line
Adult image filtering, 715732
content filtering, image categorization
and, 720
experimental results, 727730
image detection, 726727
internet safety, 715732
multimedia filtering approaches, 720722
paper objective, 72723
skin detection, 723726
first-order model, 725726
methodology, 723725
notations, 723725
structure of system, 723
Advanced encryption standard, 96, 99103
AES decryption, 102103
AES encryption, 99101
AES key schedule, 101102
Advanced Television Systems Committee,
15
Aegis, by Maples and Spanos, 1995, 110111
AES. See Advanced Encryption Standard
AFMA. See American Film Marketing
Association
American Film Marketing Association, 10
Analog copy protection methods, 5
Analog recording, characteristics of, 5
Anticollusion fingerprinting for multimedia,
594596
Architectural works
copyright protection extended to, 9
789
Index
Audio watermarking, high-capacity
real-time, with perfect correlation
sequence, repeated insertion,
283310
Audio watermarking real-time,
high-capacity, perfect correlation
sequence/repeated insertion,
283310
experimental results, 294308
experimental audio clip, PN binary
data profiles, 297298
using perfect sequences, 298299
using uniformly redundant array,
299303
proposed watermarking technique,
291294
audio similarity measure, 293294
watermark embedding, 291
watermark extraction, 291292
zero sidelobe, sequences with
autocorrelation of, 286291
correlation properties, 287
correlation property, 289291
perfect sequences, 286288
product theorem, 287288
synthesis of perfect sequences,
286287
synthesis of URAs, 289
uniformly redundant array,
288291
AudioFlatness, 266
Audiovisual works, copyright, 8
Australian Copyright Amendment Digital
Agenda Act 2000, 772
Authentication, 529672. See also under
specific technique
digital watermarking, 391
integrity, entity authentication,
contrasted, 624
scalable image, 605628
signature-based, 629672
tree-based signatures, 35
Avalanche property, loss of, 137138
B
Benchmarking, digital watermarking,
249250
Berne Convention, 8, 49
United States joins, 9
Binary images, watermarking for, 415419
exterior watermark embedding methods,
417419
790
Index
Broadcast encryption, 707708
Broadcast flag, 21, 705707
Broadcast monitoring, digital
watermarking, 257
Brussels Convention Relating to
Distribution of Programme-Carrying
Signals Transmitted by Satellite, 49
Brute force attack, 98
BSA. See Business Software Alliance
Bulky data, slow speed, trade-off between,
135136
Business models, 5053
Business Software Alliance, 10
Business software applications, losses due
to piracy, 11
C
Cable, security, 1821
Capability certificates, 34
Captioning. See Labeling
Categories of copyrightable, not
copyrightable items, 8
Centralized group control, 30
Centralized Tree-Based Key Management,
32
Chameleon cipher, 598599
Chaos-based encryption for digital images,
videos, 133168
chaos-based image, video encryption,
147156
chaos-based video encryption,
154156
chaos-based watermarking, 154
fractallike curves, image encryption
schemes based on, 149151
joint image, video encryption schemes,
142143
selective encryption, 140142
special features, 138139
survey, 139147
two-dimensional chaotic maps, image
encryption schemes based on,
147149
generic video encryption schemes,
146147
MPEG encryption schemes, 145146
video encryption schemes, 145147
Yen et al.s image encryption schemes,
151153
Chaos-based watermarking, 154
Chaotic Key-Based Algorithm, 152
791
Index
Consumer devices, digital rights
management (Continued)
macrovision, 692693
personal digital domain, 708712
recordable media, content protection on,
697700
serial copy management system,
693695
Consumer Electronics Association, 15, 45
Consumers, Schools, and Libraries Digital
Rights Management Awareness Act
of 2003 bill, draft legislation, 50
Content authentication, digital
watermarking, 255256
Content Protection System Architecture, 42
Content scrambling system, 695697
Content streaming, defined, 651652
Content transformation, various domains,
652
Context-based arithmetic coding, discrete
wavelet transform, 143
Continuous-tone watermarking techniques,
402408
robust watermark structure, 402403
RST robust watermarking, 403408
Contrast masking, 442
Conversational multimedia scenario,
174175
Copy protection, digital watermarking,
253254
Copy Protection Technical Working Group,
15
Copy Protection Technologies CPT
subgroup of technical module, 45
Copyright
defined, 69
derivation of term, 6
limitations on rights provided by, 6
notable dates in U.S. history of, 9
securing of, timing, 7
Copyright Industries in U.S. Economy, 10
Copyright law
codified as Title 17 of U.S. Code, 6, 9
first revision of, 9
fourth revision of, 9
international, 8
second revision of, 9
third revision of, 9
Copyright piracy, trade losses due to, 11
Copyright system in Library of Congress,
centralization of, 8
Copyright Treaty, World Intellectual
Property Organization, 49, 772
792
D
DACA. See Australian Copyright
Amendment Digital Agenda Act 2000
DARPA. See Defense Advance Research
Projects Agency
Data embedding, 473475
Data encryption key, 39
Data Encryption Standard, 96
Data hiding, 529672
digital watermarking, 392
lossless, 531548
categories, 533541
for fragile authentication, 533535
for high embedding capacity,
535538
for semifragile authentication, 538541
watermarking, 742747
selecting regions for watermarking,
743
watermark detection, 746747
watermark embedding, 743746
Index
Data rate controller, reversible watermarks,
512
Data source authentication, 3435
multicast, 29
Decision boundary, modifying, 96
DeCSS, software utility, 42
Defense Advance Research Projects
Agency, 28
Degradation. See Light encryption
Delayed rekeying, 40
DES, Data Encryption Standard, 96
Desynchronization attacks, 554, 582
Detection algorithms, 6569
asymmetric detectors, 67
correlation-based detectors, 6567
quantization-based detector, 6769
quantized index modulation, 6768
quantized projection watermarking,
6869
Detector structure, secure, 8490
algorithm, 84, 8788
attacker choices, 8890
fractal generation, 85
modifying decision boundary, 8586
practical implementation, 8687
Diagonal edge, 441
Difference expansion, reversible
watermarks, 493528
comparison with other algorithms,
522524
cross-color embedding, 513514, 519522
decoding, 482484
definition, 506507
difference expansion, 505506
dyad-based GRIT, 500501
embedding, 505507
experimental results, 515524
generalized reversible integer transform,
498505
LSB embedding in GRIT domain, 506
payload size, 510512
quad-based GRIT, 503505
recursive, cross-color embedding,
513514
recursive embedding, 513514
spatial quads, 517519
spatial triplets, 515517
translation registration, decoding,
482484
vector definition, 498499
Diffie-Hellman method
message sequence chart for, 184
multimedia internet keying, 183184
793
Index
Digital rights management (Continued)
macrovision, 692693
personal digital domain, 708712
recordable media, content protection
on, 697700
serial copy management system,
693695
video, 759788
asset management, 763764
business aspect, 769771
consumers, 769
content declarations, 778
creators, 766
cryptographic techniques, 773774
digital cinema, 782784
digital video broadcasts, 780781
digital video delivery, 763764
distributors, 768769
legal aspect, 771772
owners, 766767
pay-per-usage, 770
pay-per-usage-per usage constrained,
770
preview, 770
purchase, 770
rights languages, 777778
rights requirements, 764769
social aspect, 771
storage, 763
stored media, 781782
subscription based, 770
technical aspect, 772784
trading protocols, 776777
usage controls, 764
watermarking techniques, 775776
Digital signature technique, 618
Digital systems, universe of, 13
Digital transmission content protection,
701704
Digital video, credibility problems, 623
Digital Video Broadcasting, 17
Digital video broadcasts, video digital
rights management, 780781
Digital video delivery, 763764
Digital watermarking, 675690
algorithm parameters, 683686
annotation, 678
applications, 677683
authentication, 680
broadcast, internet, 679680
for broadcast monitoring, 257
classification of applications, 251
classifications, 676677
794
Index
for binary images, 415419
blind, nonblind watermarking, 395
capability, 393394
challenges to watermarking by using
mobile cameras, 426429
classifications of digital watermarking,
394396
continuous-tone watermarking
techniques, 402408
copyright protection, 391
data hiding, 392
digital watermarking system, 389
exterior watermark embedding
methods, 417419
extracting watermark using mobile
cameras, 419429
features of digital watermarking,
392394
frequency domain watermarking, 390
generalized scheme for watermark
embedding, 390
halftone image watermarking
techniques, 408413
for halftone images, 410413
halftoning technique, defined, 408410
imperceptibility, 392
interior watermark embedding
methods, 416417
limitation of watermarking technology,
397
mobile phone, 420422
printed images, 399
for printed images, 400402
printed textual images, 400
for printed textual images, 413414
problems of watermarking for printed
images, 400401
public, private watermarking, 396
reversible, inseparable watermarking,
395396
robust, fragile watermarking, 395
robust watermark structure, 402403
robustness, 392393
roles of watermarking printed for
textual images, 413414
RST robust watermarking, 403408
spatial-domain watermarking,
389390
standardization of watermarking
technology, 397398
for text documents, 414415
watermarking scheme, 424425
watermarking system, 425426
E
ECMs. See Entitlement Control Messages
Economic impact of piracy, 1112
Economy importance of copyright
industries, 1011
Embedding distortion, 223
795
Index
Embedding one bit in spatial domain,
digital watermarking techniques,
224227
Embedding reversible watermark, 508509
EMMs. See Entitlement Management
Messages
Encryption, 93218
compression, trade-off between, 136
dependence on compression, 136
streaming media, 197218
challenges, 200205
digital data stream encryption,
209211
dynamic network challenge, 204
enabling transcoding without
decryption, 211212
encryption in RTP, 206207
loss resilient scalability, 213215
PGS streaming media encryption
algorithm, 212213
potential bit-rate increase, 201202
potential cost increase, 201
protocol, 200
rate variation challenge, 202204
real-time constraint, 200201
sample streaming media encryption
system, 207209
scalable streaming media encryption,
209215
streaming media, defined, 198
streaming media system, 198200
transcoding challenge, 204205
vs. authentication, 624625
Encryption security, 598
Encryption standard, 99103
Encryption techniques, 95132
advanced encryption standard, 99103
AES decryption, 102103
AES encryption, 99101
AES key schedule, 101102
audio/speech encryption techniques,
29130
G.723.1 Speech Codec by Wu and Kuo,
2000, selective encryption algorithm
for, 128
MP3 Security Methods by Thorwirth,
Horvatic, Weis, and Zhao, 2000,
129130
Perception-Based Partial Encryption
Algorithm by Servetti and
De Martin, 2002, 128129
cryptanalysis, 9899
image encryption techniques, 124127
796
Index
Entity authentication, integrity
authentication, contrasted, 624
Entropy codec, video, 107
Error tolerability, 139
ESA. See Entertainment Software
Association
Evaluation confusion matrix, 728
Experimental verification, 280
Extensible Markup Language, 776
Extensible Markup Language documents,
776
EXtensible rights Markup Language, 14
Extracting watermark using mobile
cameras
F
Fair use doctrine, 67
False-negative errors, 226
False-positive errors, 226
File sharing, 51
Filtering, image, for internet safety, 715732
Filtering of images, adult, 715732
adult image detection, 726727
content filtering, image categorization
and, 720
experimental results, 727730
multimedia filtering approaches, 720722
paper objective, 72723
skin detection, 723726
first-order model, 725726
methodology, 723725
notations, 723725
structure of system, 723
Fingerprint size, 265
Fingerprint streams, 267
Fingerprintblock, 267
Fingerprinting, 255, 529672
audio, robust identification, 261282
digital media, 577604
anticollusion fingerprinting for
multimedia, 594596
attacks, 582586
in broadcast channel environment,
596599
chameleon cipher, 598599
definitions, 578581
figures of merit, 599600
fingerprinting code, 578580
limitations, 589
multimedia collusion, 583586
notation, 582583
797
Index
Fragile watermarks, 256
Frequency division, multibit watermarking,
244
Frequency domain
image, 106
video, 106
Frequency domain watermarking, 390
G
G.723.1 Speech Codec by Wu and Kuo, 2000,
selective encryption algorithm for,
128
Generalized three-dimensional radon
transform, 737738
three-dimensional models,
watermarking, indexing, 733759
Generic attack, 6984
on linear detectors, 7176
on QIM scheme, 7680
on quantization-based schemes, 7683
on quantized projection scheme, 8083
Generic framework, for system security
evaluation, 666
Geneva Convention for Protection of
Producers of Phonograms Against
Unauthorized Duplication of Their
Phonograms, 49
Geometric attacks, robustness, 751752
Geometric distortion
image watermarking resistant to, 331358
capacity, 355
complexity, 355
embedding without exact inversion,
355356
experimental results, 349352
geometric distortions, 334342
rotation invariance, 353355
RST-invariant watermarking method,
342349
watermark system design, 345349
watermarking framework, 333334
image watermarking robust to, JPEG
compression, 467492
message encoding, 472475
template detection, 479482
template embedding in DFT domain,
475478
training sequence embedding in DWT
domain, 472475
translation registration, decoding,
482484
798
H
Halftone image watermarking techniques,
408413
halftoning technique, defined, 408410
roles of watermarking printed for textual
images, 413414
watermarking techniques for halftone
images, 410413
watermarking techniques for printed
textual images, 414
HCIE. See Hierarchical Chaotic Image
Encryption
Health supervision, 419
Hearable frequency spectrum, audio
quality layers, 129130
Heterogeneous networks
security in, intellectual property
multimedia, 173
streaming media over, to various
devices, layout, 202
Hierarchical Chaotic Image Encryption, 152
Hierarchical key-based schemes, 31
Hierarchical key distribution trees, 31
Hierarchical node-based schemes, 31
Hierarchy of intermediaries
watermarking multicast video with, 588
watermarking with, 3738
High-capacity real-time audio
watermarking, perfect correlation
sequence/repeated insertion,
283310
audio similarity measure, 293294
correlation properties, 287
correlation property, 289291
experimental audio clip, PN binary data
profiles, 297298
experimental results, 294308
perfect sequences, 286288
product theorem, 287288
proposed watermarking technique,
291294
Index
synthesis of perfect sequences,
286287
synthesis of URAs, 289
uniformly redundant array, 288291
using perfect sequences, 298299
using uniformly redundant array,
299303
watermark embedding, 291
watermark extraction, 291292
zero sidelobe, sequences with
autocorrelation of, 286291
High definition content protection,
704705
High embedding capacity, lossless data
hiding, 535538
Hitachi, Ltd., 15
Home networking environments, efforts
addressing security in, 45
Home supervision, 419
Horizontal edge, 441
Huffman tree mutation process, 122
Hybrid signatures, 35
I
IANA. See Internet Assigned Numbers
Authority
IDCT. See Inverted discrete cosine
transform
Identification card, 419
IETF. See Internet Engineering Task
Force
IGMP. See Internet Group Management
Protocol
IIPA. See International Intellectual Property
Alliance
Ill-posed operator, secure image
authentication, 365378
singular-value decomposition, linear
ill-posed operators, 366367
verification procedure, 373378
watermark generation, 367368
watermarking process, 368373
Image authentication, fragile watermarking,
359386
attacks/countermeasures, 378384
cropping, 381382
estimation, set of keys, 382384
stego image attack, 384
swapping attack, 381
vector quantization attack, 378381
conventional watermarking, 362365
799
Index
Image watermarking robust to both
geometric distortion, JPEG
compression, 467492
watermark embedding, 472478
message encoding, 472475
template embedding in DFT domain,
475478
training sequence embedding in DWT
domain, 472475
watermark extraction with
resynchronization, 479484
template detection, 479482
translation registration, decoding,
482484
Immediate rekeying, 40
Imperceptibility, 312
evaluation of, in digital watermarking,
245248
Imperceptible watermarking in compressed
domain, 440446
generation of spreading sequences,
440441
perceptual analysis, 441443
quantized domain embedding, 443446
Indexing, watermarking, three-dimensional
models
generalized three-dimensional radon
transforms, 733759
using generalized three-dimensional
radon transforms, 733759
Individual sender authentication, 35
Industrial property, 4
Informed coding
watermark embedding, 241
watermarking, 235238
Informed detector, 223
Informed embedding, 233
watermark embedding, 241
watermarking, 233235
Informed watermark detection, 242
Inseparable watermarking, 395396
Integrity authentication, entity
authentication, contrasted, 624
Intellectual property
creation of, 4
protection for, 11
Intellectual property, categories of, 4
Intellectual property multicast security, 29
Intellectual property multimedia, 169196
basic mechanisms, 171173
conversational multimedia scenario,
174175
countermeasures, 170171
800
Index
Internet Open Trading Protocol, 776
Internet Protocol Security, 776
Internet Research Task Force, 27
Inverse DWT, 475
Inverted discrete cosine transform, 319321
IP. See Intellectual property
IR, immediate rekeying, 40
IRTF. See Internet Research Task Force
J
JPEG2000, 617
ability for content oriented encoding, 617
applications of, 651652
labeling, 614615
JPEG compression
image watermarking robust to, 467492
message encoding, 472475
template detection, 479482
template embedding in DFT domain,
475478
training sequence embedding in DWT
domain, 472475
translation registration, decoding,
482484
watermark embedding, 472478
watermark extraction with
resynchronization, 479484
image watermarking robust to both
geometric distortion, 467492
JPEG Images, encryption methods, by
Droogenbroeck and Benedett, 2002,
126127
K
Key ciphers, symmetric, 12
Key encryption key, 39
Key management, multicast group, 29
Keystream, 135
Known-plaintext attack, 98, 135
L
Labeling, image content, relationship
between, 616617
Labeling technique, digital signature
technique, 618
Legacy content, 279
Legal solutions for security protection,
4550
M
MAC. See Message Authentication Code
Macrovision, 692693
MACs. See Multiple Message Authentication
Codes
Magnetic storage capacity, 5
Main technologies, spread spectrum, 242
Management schemes, classification of, 31
Masking, watermark, cover merging, 242
Masking thresholds, calculating, 318319
Matsushita Electric Industrial Co., Ltd., 15
Media authentication, signature-based,
629672
complete authentication, 636638
content authentication, 632, 639649
begin, 644645
content hashing, 641642
crypto hashing, 643644
end, 645
fragile authentication, 639
input, 644
nonhashing, 640641
output, 645649
semifragile authentication, 640649
system setup, 644
data integrity, 631
data vs. content, 632
date authentication, 630
digital signature schemes, 631632
incidental distortion, 632633
intentional distortion, 632633
JPEG2000 authentication framework,
651652
nonrepudiation, 631
one-way hash function, 631
performance evaluation, 661669
error correction coding, 666668
feature extraction, 664666
hash, 662663
801
Index
Media authentication, signature-based
(Continued)
system robustness evaluation,
668669
system security evaluation, 661668
signature-based content authentication,
634636
unified signature-based authentication
framework for JPEG2000 images,
649661
complete authentication mode,
654655
semifragile content authentication
(lossless), 658661
semifragile content authentication
(lossy), 655658
Member authentication, 34
Message Authentication Code, 35
MHT-Encryption Scheme and MSI-Coder,
by Wu and Kuo, 2000, 2001, 121123
Mirror-Like Image Encryption, 153
Mis-identification rates, 280
Mitsubishi Electric Corporation, 15
MLIE. See Mirror-Like Image Encryption
Mobile cameras, extracting watermark
using, 419429
about current mobile market, 419420
advertisers, 422423
challenges to watermarking by using
mobile cameras, 426429
mobile phone, 420422
watermarking scheme, 424425
watermarking system, 425426
Modern cryptography, overview, 96103
Modern symmetric key cryptosystems, 96
Modifying decision boundary, secure
detector structure, 8586
Modulated watermark, 223
Motion Picture Association of America,
10, 16
Motion pictures
added to classes of protected works, 9
copyright, 8
losses due to piracy, 11
Moving Picture Expert Group, 17, 2526
Moving Picture Expert Group video data,
quantized, robust watermark
detection from, 437466
detector performance under attacks,
455462
experimental results, 454462
imperceptible watermarking in
compressed domain, 440446
802
Index
robustness, 326329
scaling, 329
spread spectrum signal, 313315
watermark in DCT domain considering
HVS, 315317
Multimedia content protection, players in,
1517
Multimedia data hiding, 529672
Multimedia encryption, 93218
advanced encryption standard, 99103
AES decryption, 102103
AES encryption, 99101
AES key schedule, 101102
audio/speech encryption techniques,
29130
G.723.1 Speech Codec by Wu and Kuo,
2000, selective encryption algorithm
for, 128
MP3 Security Methods by Thorwirth,
Horvatic, Weis, and Zhao, 2000,
129130
Perception-Based Partial Encryption
Algorithm by Servetti and De
Martin, 2002, 128129
cryptanalysis, 9899
image encryption techniques, 124127
Partial Encryption Algorithms by
Cheng and Li, 2000, 124126
Selective Bitplane Encryption
Algorithm by Podesser, Schmidt,
and Uhl, 2002, 127
selective encryption methods for
Raster and JPEG Images by
Droogenbroeck and Benedett,
2002, 126127
modern cryptography, overview,
96103
multimedia security, 103105
public key cryptosystems, 9798
symmetric key cryptosystems, 9697
video encryption techniques, 105124
Aegis by Maples and Spanos, 1995,
110111
Format-Compliant Configurable
Encryption by Wen et al., 2002, 123
MHT-Encryption Scheme and
MSI-Coder by Wu and Kuo, 2000
and 2001, 121123
Partial Encryption Algorithms for
Videos by Cheng and Li, 2000,
120121
SECMPEG by Meyer and Gadegast,
1995, 109110
N
Nonblind watermarking, 395
O
Oblivious watermark detection, 242
ODRL. See Open Digital Rights Language
OeBF. See Open eBook Forum
Offline security, 607
Old, new technologies for distribution,
storage, difference between, 4
OMA. See Open Mobile Alliance
Omnibus Trade and Competitive Act of
1988, 11
Open Digital Rights Language, 1314
Open eBook Forum, 27
803
Index
Open Mobile Alliance, 14
Open-system security, 607
Optical storage capacity, 5
P
Pantomimes, copyright, 8
Parallel embedding/detection, still image,
multidimensional watermark,
311330
joint detection, 321326
hypothesis testing, 322323
joint detection, 324326
multidimensional watermark embedding,
313321
embedding of multidimensional
watermark, 327321
spread spectrum signal, 313315
watermark in DCT domain considering
HVS, 315317
robustness, 326329
JPEG compression, 326
low-pass filtering, 326329
multiple watermarking, 329
scaling, 329
Partial Encryption Algorithms by Cheng
and Li, 2000, 124126
Partial Encryption Algorithms for Videos,
by Cheng and Li, 2000, 120121
Patchwork, spatial domain watermarking
technique, 227228
Pattern matching, blind, 549576
attack steps, 559562
algorithm A0, 560
algorithm A1, 560561
algorithm A2, 561562
block substitution, 562
computing replacement, 559562
search for substitution base, 559
signal partitioning, 559
attacking multimedia protection systems
with, 549576
audio signals, 562569
analysis of similarity function,
564567
audio processing for BPM attack,
562563
effect of attack on watermark
detection, 567569
images, 569573
image processing for BPM attack,
569671
804
Index
Personal digital domain, 708712
Philips audio fingerprinting, 266278
extraction algorithm, 267270
false-positive analysis, 270273
guiding principles, 266267
search algorithm, 273278
Photographs, added to protected works, 9
Pictorial works, copyright, 8
Piggybacking, 35
Pioneer Electronic Corporation, 15
Piracy
economic impact of, 1112
as growing threat, 36
Plaintext, 12
Podesser, Schmidt, and Uhl, Selective
Bitplane Encryption Algorithm
by, 127
Practical implementation, secure detector
structure, 8687
Preshared key method
intellectual property multimedia,
multimedia internet keying, 181182
message sequence chart for, 181
Preview, video digital rights management,
770
Printed images. See also Printed materials
watermarking, 399
watermarking for, 400402
problems of watermarking for printed
images, 400401
watermarking techniques for printed
images, 401402
Printed materials
digital watermarking, 387436
digital watermarking technology,
387436
about current mobile market, 419420
administration of watermarking
technology, 397
advertisers, 422423
applications of digital watermarking,
390392
authentication, 391
blind, nonblind watermarking, 395
capability, 393394
challenges to watermarking by using
mobile cameras, 426429
classifications of digital watermarking,
394396
continuous-tone watermarking
techniques, 402408
copyright protection, 391
data hiding, 392
805
Index
Printed textual images
watermarking, 400
watermarking for, 413414
Prints, added to protect works, 9
Prior art, video authentication, scalable
image, 610616
content-based multimedia, 615616
medium oriented authentication
techniques, 613615
stream signatures, 610613
traditional systems, 610613
Private key cipher, 134
Private watermarking, 396
Projective geometric codes, digital media
fingerprinting, 592594
Protection scheme vulnerabilities, 6392
current detection algorithms, 6569
asymmetric detectors, 67
correlation-based detectors, 6567
quantization-based detector, 6769
quantized index modulation, 6768
quantized projection watermarking,
6869
generic attack, 6984
on linear detectors, 7176
on QIM scheme, 7680
on quantization-based schemes, 7683
on quantized projection scheme,
8083
secure detector structure, 8490
algorithm, 84
algorithm performance, 8788
attacker choices, 8890
fractal generation, 85
modifying decision boundary, 8586
practical implementation, 8687
Public key cipher, 12, 13, 135
Public key cryptosystems, 9798
Public-key method
intellectual property multimedia,
multimedia internet keying,
182183
message sequence chart for, 182
Public watermarking, 396
Q
Quadtree decomposition, digital images,
143
Quadtree for black
white binary image, example of, 125
white image, example of, 125
806
Index
Quantized projection embedding, ACF with,
81
Quantized projection watermarking,
quantization-based detector, 6869
R
Random Seed Encryption Subsystem, 153
Raster
encryption methods, by Droogenbroeck
and Benedett, 2002, 126127
JPEG Images, selective encryption
methods for, by Droogenbroeck
and Benedett, 2002, 126127
Reading watermark, restoring original
image, 509510
Real-time audio watermarking, highcapacity, perfect correlation
sequence/repeated insertion,
283310
experimental results, 294308
experimental audio clip, PN binary
data profiles, 297298
using perfect sequences, 298299
using uniformly redundant array,
299303
proposed watermarking technique,
291294
audio similarity measure, 293294
watermark embedding, 291
watermark extraction, 291292
zero sidelobe, sequences with
autocorrelation of, 286291
correlation properties, 287
correlation property, 289291
perfect sequences, 286288
product theorem, 287288
synthesis of perfect sequences,
286287
synthesis of URAs, 289
uniformly redundant array, 288291
Real-time transport protocol, secure,
186194
key usage, 189190
protocol overview, 186189
efficiency, 188189
protecting RTCP, 188
protecting RTP, 187188
SRTP, key management, 192194
key refresh, 193194
rekeying, 192193
SRTP processing, 190191
807
Index
Reversible watermarks (Continued)
generalized reversible integer transform
dyad-based GRIT, 500501
vector definition, 498499
LSB embedding in GRIT domain, 506
recursive, cross-color embedding
cross-color embedding, 514
recursive embedding, 513514
translation registration, decoding,
quad-based GRIT, 503505
using difference expansion, 493528
RFCs. See Internet Engineering Task Force
IETF Request for Comments
Rights expression languages, 13
Rights management, digital, consumer
devices, 691714
Robust hash, 641
Robust identification
audio, watermarking/fingerprinting,
261282
audio fingerprint definition, 263264
audio fingerprinting parameters,
265266
audio/speech encryption techniques,
127130
extraction algorithm, 267270
false-positive analysis, 270273
fingerprints, watermarks, compared,
278280
guiding principles, 266267
Philips audio fingerprinting, 266278
search algorithm, 273278
Robust watermark detection, from
quantized MPEG video data,
437466
detector performance under attacks,
455462
experimental results, 454462
imperceptible watermarking in
compressed domain, 440446
generation of spreading sequences,
440441
perceptual analysis, 441443
quantized domain embedding,
443446
modeling of quantized DCT domain data,
446448
performance analysis, statistical
detectors, 452454
watermarking detection, 448452
detection based on hypothesis testing,
448451
statistical detectors, 451452
808
S
Satellite, security, 1821
Scalability, 138
Scalable image, video authentication,
605628
application, 621
authentication, vs. encryption,
624625
entity authentication, integrity
authentication, contrasted, 624
integrity authentication, 607608
labeling, image content, relationship
between, 616617
method design, 607610, 616623
modern coding standards, 609
multimedia data formats, 608
multimedia security, 624625
multiscale image labeling, 616
prior art, 610616
content-based multimedia, 615616
medium oriented authentication
techniques, 613615
stream signatures, 610613
traditional systems, 610613
Scalable streaming media encryption,
209215
digital data stream encryption,
209211
enabling transcoding without
decryption, 211212
loss resilient scalability, 213215
PGS streaming media encryption
algorithm, 212213
Index
Scalable streaming media encryption
scheme, transcoding without
decryption, 211212
Scrambled video signal, 597
Scrambling, 18
video, 105
SCTE. See Society of Cable
Telecommunications Engineers
Sculptural works, copyright, 8
SECMPEG, by Meyer and Gadegast, 1995,
109110
Secure detector structure, 8490
algorithm, 84
algorithm performance, 8788
attacker choices, 8890
fractal generation, 85
modifying decision boundary, 8586
practical implementation, 8687
Secure real-time transport protocol, 104,
186194
key management, 192194
key refresh, 193194
rekeying, 192193
key usage, 189190
processing, 190191
protocol overview, 186189
efficiency, 188189
protecting RTCP, 188
protecting RTP, 187188
transforms, 191192
Secure Sockets Layer and Transport Layer
Security, 776
Selective Bitplane Encryption Algorithm
by Podesser, Schmidt, and Uhl,
2002, 127
Selective encryption, 104, 109
Selective Scrambling Algorithm, by Zeng
and Lei, 2002, 123124
Selective video encryption, 105124
Aegis by Maples and Spanos, 1995,
110111
Format-Compliant Configurable
Encryption by Wen et al., 2002, 123
MHT-Encryption Scheme and MSI-Coder
by Wu and Kuo, 2000 and 2001,
121123
Partial Encryption Algorithms for
Videos by Cheng and Li, 2000,
120121
SECMPEG by Meyer and Gadegast, 1995,
109110
Selective Scrambling Algorithm by Zeng
and Lei, 2002, 123124
809
Index
Signature-based media authentication
(Contined)
hash, 662663
system robustness evaluation,
668669
system security evaluation, 661668
signature-based content authentication,
634636
unified signature-based authentication
framework for JPEG2000 images,
649661
complete authentication mode,
654655
semifragile content authentication
(lossless), 658661
semifragile content authentication
(lossy), 655658
Simulcrypt, 2021
Single-frame attacks, 582
Skin detection, in filtering of images,
723726
first-order model, 725726
methodology, 723725
notations, 723725
Smart card fingerprinting system, 600601
Society of Cable Telecommunications
Engineers, 16
Soft listening, 266
Sony Corporation, 15
Sound recordings, copyright, 8
Source authentication, 35
Spatial domain
image, 106
video, 107
Spatial-domain watermarking, 389390
Spatial workspace domain, 243
Speech encryption techniques, 127130
audio fingerprint definition, 263264
audio fingerprinting parameters,
265266
Speed of internet connection, 5
Spread spectrum, main technologies, 242
Spread spectrum technique, watermarking,
229230
SRTP. See Secure Real-time Transport
Protocol
Standard, advanced encryption, 99103
Standardization, watermarking technology,
397398
Statistical attacks, 582
Statute of Anne, first copyright law, derived
from English copyright law, 9
Stego image attack, 384
810
Index
Subgroup control, 30
Subscription based, video digital rights
management, 770
Swapping attack, 381
cropping, 381382
Symmetric cipher, 134135
Symmetric key cryptosystems, 9697
Syntax-awareness, 138
System enhancement, digital watermarking,
257258
T
TCP. See Transport Control Protocol
TCP/IP. See Transmission Control
Protocol/Internet Protocol
Technical solutions, 1245
Terrestrial distribution, security, 1821
Text documents, watermarking for,
414415
Textual images, watermarking for, 413414
Textured block, 441
Three-dimensional content-based search,
retrieval, 734735, 739841
Three-dimensional model preprocessing,
738739
Three-dimensional model watermarking,
indexing, using generalized threedimensional radon transforms,
733759
Three-dimensional radon transform
generalized, 737738
three-dimensional models,
watermarking, indexing, 733759
Time Warner Inc., 15
Timestamp, 587
Title 17 of U.S. Code, copyright law codified
as, 9
Toshiba Corporation, 15
Trade Act of 1974, protection for
intellectual property rights, 11
Trading protocols, video digital rights
management, 776777
Training sequence, 473
Transaction tracking, 255
Transcodability, 138
Transform DCT, workspace domain, 243
Transform domains, watermarking, 228233
in DCT domain, 229230
in DFT domain, 231232
spread spectrum technique, 229230
in wavelet domain, 230231
U
UDP. See User Datagram Protocol
Unintentional attacks, 582
Universal Copyright Convention, 8, 49
Universal Image Quality Index, 248
Universal Plug and Play, 45
Universe of digital systems, 13
U.S. copyright industries, 912
U.S. Copyright Office service unit of
Library, 8
U.S. economy, importance of copyright
industries to, 1011
User Datagram Protocol, 28
V
Variable encryption, defined, 109
VCRs. See Video cassette recorders
VDSL. See Very high bit-rate DSL
Vector quantization attack, 378381
Vertex reordering, robustness, 751752
Vertical edge, 441
Very high bit-rate DSL, 5
Victor Company of Japan, Ltd., 15
Video, digital rights management, 759788
asset management, 763764
digital video delivery, 763764
storage, 763
business aspect, 769771
content declarations, 778
cryptographic techniques, 773774
digital cinema, 782784
digital video broadcasts, 780781
legal aspect, 771772
pay-per-usage, 770
pay-per-usage-per usage constrained, 770
preview, 770
purchase, 770
rights languages, 777778
811
Index
Video, digital rights management
(Continued)
rights requirements, 764769
consumers, 769
creators, 766
distributors, 768769
owners, 766767
social aspect, 771
stored media, 781782
subscription based, 770
technical aspect, 772784
trading protocols, 776777
usage controls, 764
watermarking techniques, 775776
Video authentication
scalable image, 605628
application, 621
authentication, vs. encryption,
624625
content-based multimedia, 615616
entity authentication, integrity
authentication, contrasted, 624
integrity authentication, 607608
labeling, image content, relationship
between, 616617
medium oriented authentication
techniques, 613615
method design, 607610, 616623
modern coding standards, 609
multimedia data formats, 608
multimedia security, 624625
multiscale image labeling, 616
prior art, 610616
stream signatures, 610613
traditional systems, 610613
scalable image and, 605628
application, 621
authentication, vs. encryption,
624625
entity authentication, integrity
authentication, contrasted, 624
integrity authentication, 607608
labeling, image content, relationship
between, 616617
method design, 607610, 616623
modern coding standards, 609
multimedia data formats, 608
multimedia security, 624625
multiscale image labeling, 616
Video cassette recorders, 5
Video digital rights management, 759788
rights requirements
consumers, 769
812
creators, 766
distributors, 768769
owners, 766767
Video Electronics Standards Association,
45
Video Encryption Algorithm
by Qiao and Nahrstedt, 1997, 112115
by Shi, Wang, and Bhargava, 1998, 1999,
115118
Video Encryption Methods, by Alattar,
Al-Regib, Al-Semari, 1999, 119120
Video encryption techniques, 105124
selective video encryption, 105124
Aegis by Maples and Spanos, 1995,
110111
Format-Compliant Configurable
Encryption by Wen et al., 2002, 123
MHT-Encryption Scheme and MSICoder by Wu and Kuo, 2000 and
2001, 121123
Partial Encryption Algorithms for
Videos by Cheng and Li, 2000,
120121
SECMPEG by Meyer and Gadegast,
1995, 109110
Selective Scrambling Algorithm by
Zeng and Lei, 2002, 123124
Video Encryption Algorithm by Qiao
and Nahrstedt, 1997, 112115
Video Encryption Algorithms by Shi,
Wang, and Bhargava, 1998 and
1999, 115118
Video Encryption Methods by Alattar,
Al-Regib, and Al-Semari, 1999,
119120
Zigzag Permutation Algorithm by
Tang, 1996, 111112
video scrambling, 105
Video scrambling, 105
Video transmission on internet, 621
Vulnerabilities, multimedia protection
schemes, 6392
current detection algorithms, 6569
asymmetric detectors, 67
correlation-based detectors, 6567
quantization-based detector, 6769
quantized index modulation, 6768
quantized projection watermarking,
6869
generic attack, 6984
on linear detectors, 7176
on QIM scheme, 7680
on quantization-based schemes, 7683
Index
on quantized projection scheme,
8083
secure detector structure, 8490
algorithm, 84
algorithm performance, 8788
attacker choices, 8890
fractal generation, 85
modifying decision boundary, 8586
practical implementation, 8687
W
Watercasting, 37
Watermark, cover merging
addition, 241
masking, 242
quantization, 241
Watermark detection
blind or oblivious, 242
informed, 242
not using original data in, 312
quantization index modulation, 242
robust, from quantized MPEG video
data, 437466
Watermark detector, 222
Watermark embedding, 291, 472478
blind, 241
generalized scheme for, 390
informed coding, 241
informed embedding, 241
message encoding, 472475
template embedding in DFT domain,
475478
Watermark estimation attack, 554
Watermark extraction, 291292
Watermark extraction time, 753
Watermark removal, with minimum
distortion, 73
Watermarked cover C^ w, 223
Watermarking, 3536, 219528
applications, 221260
audio, robust identification, 261282
digital, 675690
administration of watermarking
technology, 397
advertisers, 422423
algorithm parameters, 683686
annotation, 678
applications, 677683
applications of digital watermarking,
390392
authentication, 391, 680
813
Index
Watermarking (Continued)
printed textual images, 400
for printed textual images, 413414
problems of watermarking for printed
images, 400401
public, private watermarking, 396
reliability, false positives, 685
remote triggering, 683
reversible, inseparable watermarking,
395396
robust, fragile watermarking, 395
robust watermark structure, 402403
robustness, 392393, 684685
roles of watermarking printed for
textual images, 413414
RST robust watermarking, 403408
spatial-domain watermarking, 389390
spread spectrum technique, 229230
standardization of watermarking
technology, 397398
for text documents, 414415
watermarking scheme, 424425
watermarking system, 425426
in wavelet domain, 230231
workflow, 688
digital watermarking
applications of, 250258
for broadcast monitoring, 257
classification of applications, 251
for content authentication, 255256
for copy protection, 253254
for copyright protection, 251253
for fingerprinting, 254255
for system enhancement, 257258
digital watermarking applications, 251
benchmarking, 249250
evaluation of, 240250
imperceptibility, evaluation of,
245248
digital watermarking techniques, 222250
communication, watermarking as, 224
digital watermarking systems, 222223
embedding one bit in spatial domain,
224227
informed coding, 235238
informed embedding, 233235
multibit payload, 238239
patchwork, spatial domain
watermarking technique, 227228
in transform domains, 228233
distributed, 37
fragile
attacks/countermeasures, 378384
814
Index
rotation invariance, 353355
RST-invariant watermarking method,
342349
watermark system design, 345349
watermark detection, 346
watermark embedding, 346349
watermarking framework, 333334
Watermarking systems, classification of,
241244
Watermarking techniques, 221260
applications, survey of, 221260
video digital rights management,
775776
Watermarking technology, digital, new
trends, printed materials, 387436
Watermarks
fingerprints, compared, 278280
reversible
algorithm for reversible watermark,
507510
comparison with other algorithms in
literature, 522524
cross-color embedding, 513514,
519522
data rate controller, 512
decoding, 482484
difference expansion, 493528
dyad-based GRIT, 500501
embedding, 505507
embedding reversible watermark,
508509
experimental results, 515524
generalized reversible integer transform, 498505
LSB embedding in GRIT domain, 506
payload size, 510512
quad-based GRIT, 503505
reading watermark, restoring original
image, 509510
recursive, cross-color embedding,
513514
recursive embedding, 513514
spatial quads, 517519
spatial triplets, 515517
translation registration, decoding,
482484
X
XML, Extensible Markup Language, 776
XrML. See EXtensible rights Markup
Language
Y
Yen, et al., encryption schemes for MPEG
videos, 154
Z
Zero sidelobe, sequences with
autocorrelation, perfect correlation
sequence/repeated insertion,
286291
correlation properties, 287
correlation property, 289291
perfect sequences, 286288
product theorem, 287288
synthesis of perfect sequences, 286287
synthesis of URAs, 289
uniformly redundant array, 288291
Zigzag Permutation Algorithm, by Tang,
1996, 111112
815