Save-Page-As-Complete-HTML: Better handling of <object> elements.
Save-Page-As-Complete-HTML saves resources (i.e. images) and subframes
(i.e. html documents) into local files referenced from the main saved
file by rewritten uris. Before the fix the locally saved files didn't
include resources and subframes embedded via <object> elements. The fix
fixes this.
Long note about the distinction between frames and other savable resources:
- Why it is desirable to distinguish b/w subframes and other resources:
When reporting "savable resource" for a frame, we report subframes
separately from all the other resources (i.e. from images or css
stylesheets). This is because for those other resources 1) their
content doesn't change after being fetched from the network and 2) we
are not rewriting references to other resources (impossible for
images? possible for stylesheets - bug?). OTOH, frames 1) can change
their content at runtime (i.e. via javascript) and we want to save
their current (not original) content and 2) have to have their links
rewritten, so they point to the locally saved resources.
- Why we don't distinguish between frames and other savable resources
in the current CL:
The current CL always reports <object data=...> as resources (never
as frames). This means that *original* (not current) frame contents
are saved + it means that links in frame's html are not rewritten to
point to the locally saved files.
This is probably ok:
- Not a problem: Saving these resources is an improvement - previously
the images and frames linked from <object> elements would not be
saved locally.
- Not a problem: Saving the original content is not a regression -
previously the links in <object> elements would not be rewritten and
therefore would point to the original URI / would be fetched again
from a server.
- Acceptable problem: Not rewriting links in subframes embedded via
<object> element means that relative links used by the subframe's html
might be broken (i.e. now these links would be relative to the locally
saved copy of the subframe).
The "acceptable problem" above seems acceptable because:
- We have a net improvement: when saved files are opened without
ability to fetch original resources from the network (after the CL
local copies exist, previously some resources/frames would not
load at all).
- Determining whether <object> contains a subframe VS another resource
is not possible in general - a parent frame doesn't have visibility
into URL and/or mime-type of an OOP child frame (this is enforced
via --site-per-process cmdline switch).
- A long-term fix is being proposed at crrev.com/1442463002
BUG=553478
Review URL: https://codereview.chromium.org/1416113012
Cr-Commit-Position: refs/heads/master@{#362082}
2 files changed