Image Capture API
This folder contains the implementation of the W3C Image Capture API. Image Capture was shipped in Chrome M59; please consult the Implementation Status if you think a feature should be available and isn't.
This API is structured around the ImageCapture class and a number of
extensions to the MediaStreamTrack feeding it (let's call them
theImageCapturer and theTrack, respectively).
API Mechanics
takePhoto() and grabFrame()
-
takePhoto()returns the result of a single photographic exposure as aBlobwhich can be downloaded, stored by the browser or displayed in animgelement. This method uses the highest available photographic camera resolution. -
grabFrame()returns a snapshot of the live video intheTrackas anImageBitmapobject which could (for example) be drawn on acanvasand then post-processed to selectively change color values. Note that theImageBitmapwill only have the resolution of the video track — which will generally be lower than the camera's still-image resolution.
(Adapted from the blog post)
Photo settings and capabilities
The photo-specific options and settings are associated to theImageCapturer or
theTrack depending on whether a given capability/setting has an immediately
recognisable effect on theTrack, in other words if it's "live" or not. For
example, changing the zoom level is instantly reflected on the theTrack,
while enabling red eye reduction, if available, is not.
| Object | Type | Example |
|---|---|---|
PhotoCapabilities |
non-live capabilities | theImageCapturer.getPhotoCapabilities() |
MediaTrackCapabilities |
live capabilities | theTrack.getCapabilities() |
PhotoSettings |
non-live settings | theImageCapturer.takePhoto(photoSettings) |
MediaTrackSettings |
live settings | theTrack.getSettings() |
Other topics
Are takePhoto() and grabFrame() the same?
These methods would not produce the same results as explained in this issue comment:
Let me reconstruct the conversion steps each image goes through in CrOs/Linux; [...]
a) Live video capture produces frames via
V4L2CaptureDelegate::DoCapture()[1]. The original data (from the WebCam) comes in either YUY2 (a 4:2:2 format) or MJPEG, depending if the capture is smaller than 1280x720p or not, respectively.
b) This
V4L2CaptureDelegatesends the capture frame to a conversion stage to I420 [2]. I420 is a 4:2:0 format, so it has lost some information irretrievably. This I420 format is the one used for transporting video frames to the rendered.
c) This I420 is the input to
grabFrame(), which produces a JS ImageBitmap, unencoded, after converting the I420 into RGBA [3] of the appropriate endian-ness.
What happens to
takePhoto()? It takes the data from the Webcam in a) and either returns a JPEG Blob [4] or converts the YUY2 [5] and encodes it to PNG using the default compression value (6 in a 0-10 scale IIRC) [6].
IOW:
- for smaller video resolutions:
OS -> YUY2 ---> I420 --> RGBA --> ImageBitmap grabFrame()
|
+--> RGBA --> PNG ---> Blob takePhoto()
- and for larger video resolutions:
OS -> MJPEG ---> I420 --> RGBA --> ImageBitmap grabFrame()
|
+--> JPG ------------> Blob takePhoto()
Where every conversion to-I420 loses information and so does the encoding to PNG. Even a conversion
RGBA --> I420 --> RGBAwould not produce the original image. (Plus, when you showImageBitmapand/or Blob on an<img>or<canvas>there are more stages of decoding and even colour correction involved!)
With all that, I'm not surprised at all that the images are not pixel accurate! 😃
Why are PhotoCapabilities.fillLightMode and MediaTrackCapabilities.torch separated?
Because they are different things: torch means flash constantly on/off whereas
fillLightMode means flash always-on/always-off/auto when taking a
photographic exposure.
torch lives in theTrack because the effect can be seen "live" on it,
whereas fillLightMode lives in theImageCapture object because the effect
of modifying it can only be seen after taking a picture.
Testing
Image Capture web tests are located in web_tests/external/mediacapture-image.