[Downloads] Rework how hashes are calculated for download files.
Prior to this CL, downloads assumed that there would be a stable
serialization of hash state. This serialization was meant to be
persisted and used if the download was subsequently resumed, thus
avoiding rehashing the portion of the download completed so far.
However, there's no standard serialization for SHA256 hash state. Any
change to the serialization format would render any historical data
invalid. While it's possible to come up with a reasonable and stable
format for hash state, the resulting complexity isn't worth it.
Fortunately, the code for persisting the hash state was never written.
Hence it's not too late to change things.
With this CL, downloads will behave as follows with respect to hashes:
* An in-progress download will calculate the hash of the bytes written
to disk so far using a crypto::SecureHash object using SHA256.
* Progress notifications from DownloadItem / DownloadFile no longer
report a hash state.
* If the download completes successfully, DownloadFile exposes the
crypto::SecureHash object representing the final hash state via the
DownloadDestinationObserver interface. DownloadItemImpl uses this
object to calculate the final hash which is then made available via
DownloadItem::GetHash() for completed downloads.
* In the event of an interruption, DownloadFile will expose the
crypto::SecureHash object representing the hash state.
DownloadItemImpl uses a clone of the hash state to obtain a SHA256
hash of the partial data. This hash is available via
DownloadItem::GetHash() on interrupted downloads. DownloadItemImpl
also keeps the crypto::SecureHash object in case the download is
resumed later.
* Resuming downloads pass the crypto::SecureHash object representing the
partial state via DownloadSaveInfo. If a crypto::SecureHash object
isn't available (e.g. because the download was restored from history),
then DownloadItemImpl can optionally pass along a SHA256 hash of the
partial file. If for some reason, the partial state of the download is
abandoned (e.g. because of a validation error), then
DownloadRequestCore destroys the cryto::SecureHash object and
resets the prefix hash so that the download can restart from the
beginning.
* When DownloadManagerImpl receives a StartDownload() callback (which
happens when a response is available for a download request), the
crypto::SecureHash object passed within DownloadSaveInfo is used to
construct a new DownloadFile.
* A newly created DownloadFile assumes that a crypto::SecureHash object
passed to it correctly represents the partial state of the partial
file.
* In the absence of a crypto::SecureHash object, DownloadFile reads the
partial file and calculates the partial hash state in a new
crypto::SecureHash object. If a prefix hash value is available, then
the hash of the partial file is matched against this prefix hash. A
mismatch causes a FILE_HASH_MISMATCH error which in turn causes the
download to abandon its partial state and restart.
These rules establish the following invariants:
* All downloads calculate the SHA256 hash of its contents.
* Regardless of how a download is started or resumed, by the time it is
completed successfully, DownloadItem::GetHash() correctly reports the
SHA256 hash of the downloaded bytes.
* Regardless of how a download is started or resumed, an interrupted
download with a received byte count > 0 will always report the correct
SHA256 hash of the partial data in DownloadItem::GetHash().
Note that this CL doesn't add code to persist the hash for an
interrupted download. In order to keep the size of the CL sane, the
download history changes are going to be done in a follow up CL.
BUG=7648
BUG=563684
Review URL: https://codereview.chromium.org/1751603002
Cr-Commit-Position: refs/heads/master@{#381158}
diff --git a/content/browser/download/download_manager_impl.cc b/content/browser/download/download_manager_impl.cc
index cdea5c32..ca423c4 100644
--- a/content/browser/download/download_manager_impl.cc
+++ b/content/browser/download/download_manager_impl.cc
@@ -120,6 +120,7 @@
const std::string& last_modified,
int64_t received_bytes,
int64_t total_bytes,
+ const std::string& hash,
DownloadItem::DownloadState state,
DownloadDangerType danger_type,
DownloadInterruptReason interrupt_reason,
@@ -140,6 +141,7 @@
last_modified,
received_bytes,
total_bytes,
+ hash,
state,
danger_type,
interrupt_reason,
@@ -364,17 +366,16 @@
if (info->result == DOWNLOAD_INTERRUPT_REASON_NONE) {
DCHECK(stream.get());
- download_file.reset(file_factory_->CreateFile(
- *info->save_info, default_download_directory, info->url(),
- info->referrer_url, delegate_ && delegate_->GenerateFileHash(),
- std::move(info->save_info->file), std::move(stream),
- download->GetBoundNetLog(), download->DestinationObserverAsWeakPtr()));
-
- if (download_file.get() && delegate_) {
- download_file->SetClientGuid(
- delegate_->ApplicationClientIdForFileScanning());
- }
+ download_file.reset(
+ file_factory_->CreateFile(std::move(info->save_info),
+ default_download_directory,
+ std::move(stream),
+ download->GetBoundNetLog(),
+ download->DestinationObserverAsWeakPtr()));
}
+ // It is important to leave info->save_info intact in the case of an interrupt
+ // so that the DownloadItem can salvage what it can out of a failed resumption
+ // attempt.
download->Start(std::move(download_file), std::move(info->request_handle),
*info);
@@ -617,6 +618,7 @@
const std::string& last_modified,
int64_t received_bytes,
int64_t total_bytes,
+ const std::string& hash,
DownloadItem::DownloadState state,
DownloadDangerType danger_type,
DownloadInterruptReason interrupt_reason,
@@ -642,6 +644,7 @@
last_modified,
received_bytes,
total_bytes,
+ hash,
state,
danger_type,
interrupt_reason,