High-level overview of Save-Page-As code
This document describes code under //content/browser/download
restricting the scope only to code handling Save-Page-As functionality
(i.e. leaving out other downloads-related code).
This document focuses on high-level overview and aspects of the code that
span multiple compilation units (hoping that individual compilation units
are described by their code comments or by their code structure).
Classes overview
-
SavePackage class
- coordinates overall save-page-as request
- created and owned by
WebContents(ref-counted today, but it is unnecessary - see https://crbug.com/596953) - UI-thread object
-
SaveFileCreateInfo::SaveFileSource enum
- classifies
SaveItemandSaveFileprocessing into 2 flavours:SAVE_FILE_FROM_NET(seeSaveFileResourceHandler)SAVE_FILE_FROM_DOM(see "Complete HTML" section below)
- classifies
-
SaveItem class
- tracks saving a single file
- created and owned by
SavePackage - UI-thread object
-
SaveFileManager class
- coordinates between the download sequence and the UI thread
- Gets requests from
SavePackageand communicates results back toSavePackageon the UI thread. - Shephards data (received from the network OR from DOM) into
the download sequence - via
SaveFileManager::UpdateSaveProgress
- Gets requests from
- created and owned by
BrowserMainLoop(ref-counted today, but it is unnecessary - see https://crbug.com/596953) - The global instance can be retrieved by the Get method.
- coordinates between the download sequence and the UI thread
-
SaveFile class
- tracks saving a single file
- created and owned by
SaveFileManager - download sequence object
-
SaveFileCreateInfo POD struct
- short-lived object holding data passed to callbacks handling start of saving a file.
-
MHTMLGenerationManager class
- singleton that manages progress of jobs responsible for saving individual
MHTML files (represented by
MHTMLGenerationManager::Job).
- singleton that manages progress of jobs responsible for saving individual
MHTML files (represented by
Overview of the processing flow
Save-Page-As flow starts with WebContents::OnSavePage.
The flow is different depending on the save format chosen by the user
(each flow is described in a separate section below).
Complete HTML
Very high-level flow of saving a page as "Complete HTML":
-
Step 1:
SavePackageasks all frames for "savable resources" and createsSaveItemfor each of files that need to be saved -
Step 2:
SavePackagefirst processesSAVE_FILE_FROM_NETSaveItems and asksSaveFileManagerto save them. -
Step 3:
SavePackagehandles remainingSAVE_FILE_FROM_DOMSaveItems and asks each frame to serialize its DOM/HTML (each frame gets fromSavePackagea map covering local paths that need to be referenced by the frame). Responses from frames get forwarded toSaveFileManagerto be written to disk.
MHTML
Very high-level flow of saving a page as MHTML:
-
Step 1:
WebContents::GenerateMHTMLis called by eitherSavePackage(for Save-Page-As UI) or Extensions (viachrome.pageCaptureextensions API) or by an embedder ofWebContents(since this is public API of //content). -
Step 2:
MHTMLGenerationManagercreates a new instance ofMHTMLGenerationManager::Jobthat coordinates generation of the MHTML file by sequentially (one-at-a-time) asking each frame to write its portion of MHTML to a file handle. Other classes (i.e.SavePackageand/orSaveFileManager) are not used at this step at all. -
Step 3: When done
MHTMLGenerationManagerdestroysMHTMLGenerationManager::Jobinstance and calls a completion callback which in case of Save-Page-As will end up inSavePackage::OnMHTMLGenerated.
Note: MHTML format is by default disabled in Save-Page-As UI on Windows, MacOS
and Linux (it is the default on Chrome OS), but for testing this can be easily
changed using --save-page-as-mhtml command line switch.
HTML Only
Very high-level flow of saving a page as "HTML Only":
SavePackagecreates only a singleSaveItem(alwaysSAVE_FILE_FROM_NET) and asksSaveFileManagerto process it (as in the Complete HTML individual SaveItem handling above.).
Other relevant code
Pointers to related code outside of //content/browser/download:
-
End-to-end tests:
//chrome/browser/download/save_page_browsertest.cc//chrome/test/data/save_page/...
-
Other tests:
//content/browser/download/*test*.cc//content/renderer/dom_serializer_browsertest.cc- single process... 😕
-
Elsewhere in
//content://content/renderer/savable_resources...
-
Blink:
//third_party/blink/public/web/web_frame_serializer...//third_party/blink/renderere/core/frame/web_frame_serializer_impl...(used for Complete HTML today; should useFrameSerializerinstead in the long-term - see https://crbug.com/328354).//third_party/blink/renderer/core/frame/frame_serializer...(used for MHTML today)//third_party/blink/renderer/platform/mhtml/mhtml_archive...