@ohos.security.identifySensitiveContent (Identifying Sensitive Content)
This module identifies sensitive information in a specified file based on the input Policy. The system matches the file content against the provided Policy (including sensitive labels, keyword sets, and regular expressions) and returns the matched sensitive content.
NOTE
The initial APIs of this module are supported since API version 21. Newly added APIs will be marked with a superscript to indicate their earliest API version.
Modules to Import
import { identifySensitiveContent } from '@kit.DataProtectionKit';
identifySensitiveContent.scanFile
scanFile(filePath: string, identifyPolicies:Array<Policy>): Promise<Array<MatchResult>>
Identifies sensitive content in a specified file based on the configured policy and returns the identified result array, including the matched sensitivity labels, matched content, and number of matched items. This API uses a promise to return the result.
Required permissions: ohos.permission.ENTERPRISE_DATA_IDENTIFY_FILE
System capability: SystemCapability.Security.DataLossPrevention
Parameters
| Parameter | Type | Mandatory | Description |
|---|---|---|---|
| filePath | string | Yes | File path identified. The path must be a physical path. The file to which the path points must exist and can be accessed. |
| identifyPolicies | Array<Policy> | Yes | An array of policies used to identify sensitive content. Each policy defines an identification rule (tags, keywords, and regular expressions). The system scans file content based on these rules and returns the matching result. |
Return value
| Type | Description |
|---|---|
| Promise<Array<MatchResult>> | Promise used to return the identification result of sensitive content. If the operation is successful, the matching result array is returned. If the operation fails, an error code is returned. |
Error codes
For details about the error codes, see Universal Error Codes and DLP Service Error Codes.
| Error Code | Error Message |
|---|---|
| 201 | permission denied. |
| 801 | Capability not supported. |
| 19110001 | Parameter error.Possible causes:1. Incorrect policy format. 2. Invalid parameter range. |
| 19110002 | Sensitive file content identification timed out. |
| 19110003 | The file is not supported. Possible causes:1. The file path does not exist. 2. The file type is not supported. 3. The file permission is not supported. |
| 19110004 | A system error has occurred. |
Example
// Import the sensitive content identification module.
import { identifySensitiveContent } from '@kit.DataProtectionKit';
// Define the physical file path to be scanned.
let filepath = "/data/app/el1/bundle/public/bundleName/test.txt";
// Configures the policy for sensitive content identification.
let policies: Array<identifySensitiveContent.Policy> = [
{"sensitiveLabel":"1", "keywords":[], "regex":""}
];
try {
// Call the scanFile API to identify sensitive content in the file.
identifySensitiveContent.scanFile(filepath, policies).then(records => {
// Identification result
console.info('scanFile finish');
}).catch((err:Error) => {
// Failed to identify.
console.error('error message', err.message);
})
} catch (err) {
// Capture exceptions.
console.error('error message', err.message);
}
Policy
Defines the policy for sensitive content identification.
- In a single policy, keywords and regular expressions are combined in sequence, and two-level matching is performed. First, keyword matching is performed. If a keyword is matched, regular expression matching is performed within a scope of 100 bytes: from the position 50 bytes before the matched position of the keyword to that 50 bytes after the matched position.
- Multiple policies are independent of each other, and each policy is applied separately during scanning.
- sensitiveLabel is used to mark the matching result to identify the specific policy matched.
System capability: SystemCapability.Security.DataLossPrevention
| Name | Type | Read-Only | Optional | Description |
|---|---|---|---|---|
| sensitiveLabel | string | No | No | Label of an identification policy, which is used to identify and classify matching results. The value is a string of 1 to 30 bytes. |
| keywords | Array<string> | No | No | Keyword set, which is used to match sensitive keywords in a file. The system searches for these keywords in the file content and returns the identification result if a keyword is matched. The keywords are case-sensitive. The array can contain a maximum of 50 elements, and each element can contain a maximum of 30 bytes. |
| regex | string | No | No | Regular expression used to match sensitive content. The system performs pattern matching on the file content based on the regular expression. The matched content is returned. The value contains 0 to 512 characters. When entering a string, check whether some special characters (such as backslash (), double quotation marks ("), and newline characters) are automatically escaped to ensure the input effect of the string. |
MatchResult
Displays the identification result of sensitive content.
System capability: SystemCapability.Security.DataLossPrevention
| Name | Type | Read-Only | Optional | Description |
|---|---|---|---|---|
| sensitiveLabel | string | Yes | No | Label of an identification policy, which corresponds to sensitiveLabel in the input policy and is used to label the policy used to identify the matching result. |
| matchContent | string | Yes | No | Matched sensitive content segment, that is, the text content matched by keyword or regular expression. |
| matchNumber | number | Yes | No | Total number of matched items. |