Data Factory > file-conversion-unarchive β
TL;DR;
This task allows extracting files from an archive.
Name:
Max execution time: 15 mins
Target β
Extract the files present in one or more archives.
Example of use in a job β
- Using the list of files present in the Zip
1. FTP Get Recover the Zip image on the MFT
2. Unzip Unzip the archive
3. XSLT Prepare the product file
4. Import Items Import of products1
2
3
4
2
3
4
Inputs and Outputs β
json
{
"name": "file-conversion-unarchive",
"taskReferenceName": "b",
"description": "The business description of the task",
"type": "SUB_WORKFLOW",
"optional": false,
"inputParameters": {
"mode": "FILE",
"file": "${a.output.file}"
}
}1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
Inputs β
TIP
If at least one task parameter (whether mandatory or not) is invalid, task execution is stopped and the returned status is FAILED. For instance:
- The entered mode is not valid (it does not correspond to
FILEnor toFILES) - The
requestfile is not present, whereas the mode isREQUEST
| Property | Description |
|---|---|
mode | Mandatory - Enum - FILE, FILES, REQUEST |
request | Mandatory if mode = REQUEST - FileThe xml file that defines the archives to process |
file | Mandatory if mode = FILE - FileAn archive |
files | Mandatory if mode = FILES - Array of FileAn archive list |
compressionMethod | ifsi mode = FILE or mode = FILES - Enum - zip, 7z, tar, gzipIf no compression method is specified, the application will attempt to detect the most appropriate decompression mode. |
password | if mode = FILE or mode = FILES - StringIf appropriate, the password to decrypt the file |
Outputs β
| Property | Description |
|---|---|
report | File The task execution report in xml format. See below Not available in VSCode extension for Data Factory |
allFilesTransformed | Enum - YES, NOIf all the files could be unzipped: YES else NO.Note: If the file cannot be unzipped, the task goes into error. |
file | File The first file present in the archive, null if the archive is empty |
files | Array of File All the files present in the archive(s) |
listing | File XML file that lists the generated files |
Input request document details β
xml
<Archives>
<Archive>
<Url>https://prodstoragevazc.blob.core.windows.net/(...)/7a597747dbc536e53eeb47760b5145d002b5bc25.zip</Url>
<Compression-Method>7z</Compression-Method>
<Password>********</Password>
</Archive>
</Archives>1
2
3
4
5
6
7
2
3
4
5
6
7
| XPath | Description | Occurrence |
|---|---|---|
Archives | Root | 1 |
Archives/Archive | For each archive to be processed. | 0..* |
Archives/Archive/Compression-Method | The type of compression to use. Possible values: zip, 7z, tar, gzip | 0..1 |
Archives/Archive/Password | The password to use for processing the archive. | 0..1 |
Archives/Archive/Url | The URL of the file to be processed | 1 |
listing document details β
xml
<Files>
<File source="file.zip">
<Url>https://app.product-live.com/data-factory/d05a74cf11788d8f3ae9bf0e0e028dde66f0c83005c5e0d1211b0069945c0c11</Url>
<Path>/foolder-1/products-1.pdf</Path>
<Sha256>4045f7aa376c69197dfe731117010343028a2c06d2620623c324858a84e799a1</Sha256>
<Md5>4124bc0a9335c27f086f24ba207a4912</Md5>
<Size>1954387</Size>
</File>
<File source="file.zip">
<Url>https://app.product-live.com/data-factory/fb26911d77fe9a9dc44b111eef5b5db7ca2019c8038445662f29b20c54cb6f29</Url>
<Path>/products-1.pdf</Path>
<Sha256>c4d5710e040627a040f0fd7a6ceddfc642e8cddc8a83a7e6c104d67a2ca493c7</Sha256>
<Md5>4124bc0a9335c27f086f24ba207a4912</Md5>
<Size>1954387</Size>
</File>
</Files>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| XPath | Description | Occurrence |
|---|---|---|
Files | Root | 1 |
Files/File | A file within one of the proposed archives | 0..* |
Files/File@source | The name of the source archive | 0..1 |
./Url | The URL of the current file | 1 |
./Path | The full path to the file within the zip. It is always preceded by the character / | 1 |
./Size | The weight in bytes of the current file | 1 |
./Sha256 | The sha256 type hash of the current file | 1 |
./Md5 | The md5 type hash of the current file | 1 |
./Archive/Url | The URL of the archive from which the current file is taken | 1 |
./Archive/Metadata | The metadata associated with the archive from which the current file is taken | 0..1 |
Execution Report Details β
General notes β
- The report follows the execution report standard structure.
Example β
xml
<Report task="file-conversion-unarchive" start-at="2023-01-20T13:27:54.810Z" task-cid="..." job-cid="..." end-at="2023-01-20T13:28:51.414Z" duration-ms="56604">
<Input name="mode">(le mode passé en paramètre de la tÒche)</Input>
<Input name="request">(l'url du fichier request - si celui-ci est prΓ©sent)</Input>
<Log type="warning" code="URL_NOT_REACHABLE">
<Metadata name="xpath"></Metadata>
<Metadata name="path">//abc/a.jpg</Metadata>
<Message>The url provided is invalid or the content is unreachable: https://www.google.com/abc/a.jpg</Message>
</Log>
</Report>1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
Available logs β
| Code | Message | Metadata | Type | Remarks |
|---|---|---|---|---|
| URL_NOT_REACHABLE | The url provided is invalid or the content is unreachable: | url, archiveName | warning | In this case the archive is not processed and all Files Transformed=NO |
UNABLE_TO_OPEN_FILE | Unable to open archive: | url, archiveName | error | In this case the archive is not processed and all Files Transformed=NO |
INVALID_PASSWORD | Invalid password for archive: | url, archiveName | error | In this case the archive is not processed and all Files Transformed=NO |
ARCHIVE_SIZE_TOO_BIG | Archive size is too big: | url, archiveName | error | If the archive size or the sum of all files insides the current archive is too large (see below). In this case the archive is not processed, we continue the transformation of the other archives and all Files Transformed=NO |
TOO_MANY_FILES | Archive contains too many files: | url, archiveName | error | In this case the archive is not processed, we continue the transformation of the other archives and all Files Transformed=NO |
ARCHIVE_PROCESSED_SUCCESSFULLY | Archive processed successfully: | url, archiveName | info | |
UNABLE_TO_EXTRACT_FILE | Unable to extract file: | archiveUrl, archiveName, filePath | warning | In this case, we continue the transformation of the other files and allFilesTransformed=NO |
UNARCHIVE_PENDING | Unarchive pending | url, archiveName | info | |
UNARCHIVE_FAILED_TO_UPLOAD | Failed to upload file: | url, archiveName | error | In this case, we continue the transformation of the other files and allFilesTransformed=NO |
UNARCHIVE_ENDED_ERROR | Unarchive ended with error | url, archiveName | error | In this case, we continue the transformation of the other archives and allFilesTransformed=NO |
Technical notes β
- This task uses the tool provided by
7zip.
Functional limitations and recommendations β
| Item | Limit | Recommendation |
|---|---|---|
| Input archive document size | 1GB | 100MB |
| Max size of all files present in an input archive document once unarchived | 6GB | 500MB |
| Maximum number of files within an input archive | 15,000 | 200 |