Data Factory > file-generation-pdf β
TL;DR;
This task allows you to generate one or multiple pdf file(s) from one or multiple html file(s).
Name:
Max execution time: 120 mins
Target β
Generate one or more pdf documents.
Example of use in a job β
- Generate a "4 product per page" catalog
1. Export Items - Export the products selected by the user
2. XSLT - Transform to the format expected by the Generate PDF task
3. Generate PDF - Generate a PDF file for all selected products2
3
- Generate a summary file for each product
1. Export Items - Export the products selected by the user
2. Split XML - Split into 1 file per product
3. XSLT - Transform to the format expected by the Generate PDF task
4. Generate PDF - Generate a PDF file for all selected products
5. Archive - Put all files in a zip2
3
4
5
Inputs and Outputs β
JSON task representation
{
"name": "file-generation-pdf",
"taskReferenceName": "b",
"description": "The business description of the task",
"type": "SUB_WORKFLOW",
"optional": false,
"inputParameters": {
"mode": "FILE",
"file": "${file_transformation_html.output.file}",
"settings": {
"width": "210mm",
"height": "297mm"
}
}
}2
3
4
5
6
7
8
9
10
11
12
13
14
15
Inputs β
TIP
If at least one task parameter (whether mandatory or not) is invalid, task execution is stopped and the returned status is FAILED. For example:
- The proposed value for the
modeproperty is invalid. - The file specified for the
fileproperty does not exist
| Property | Description |
|---|---|
mode | Mandatory β Enum - FILE, FILES |
file | Mandatory if mode = FILE - File |
files | Mandatory if mode = FILES - Array of File |
settings | Mandatory β Object |
settings.width | Mandatory β String The width of the documents, must integrate the unit used, mm or cm. |
settings.height | Mandatory β String The height of the documents, must integrate the unit used, mm or cm. |
Outputs β
| Property | Description |
|---|---|
listing | File An XML file that lists the generated pdf documents |
allFilesGenerated | Enum - YES, NOIf all files could be generated: YES otherwise NO. |
files | Array of File The generated files |
Details on HTML documents expected as input in file and files. β
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>products</title>
<style>
body {
margin: 0;
padding: 0;
font-family: 'Lucida Sans', 'Lucida Sans Regular', 'Lucida Grande', 'Lucida Sans Unicode', Geneva, Verdana, sans-serif;
}
</style>
</head>
<body>
Hello pdf world!
</body>
</html>2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
- The file must be a valid HTML document
- The name of the file to be generated must be entered in the
titletag in theheadsection of the document. It must not contain the extension.pdf. - If the tag is not present, the name
no-title.pdfwill be applied. - Images or media present in the assets folder of the job can be referenced in the document.
- Examples of use: propose a static cover page, propose icons for the most products.
- the reference is made in a relative way to the source HTML document (see example below)
Example of an HTML file containing references to files in the assets directory β
In the example below, the job is built this way:
assets\
pdf-1\
images\
image.jpg
print.css
index.html
REDME.md
job.json2
3
4
5
6
7
8
The index.html file looks like this
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" >
<link href="https://fonts.googleapis.com/css?family=Roboto&display=swap" rel="stylesheet" >
<link rel="stylesheet" type="text/css" href="./print.css" >
<title>Mon catalogue</title>
</head>
<body>
<div class="page page-A4">
<h1 class="big-red-title">Texte rouge, taille de police 45</h1>
<img src="./images/image.jpg" height="50%" />
</div>
</body>
</html>2
3
4
5
6
7
8
9
10
11
12
13
14
15
Note
The paths to the resources in the HTML are from the assets directory within the zip representing the job. A file with the following path - assets/images/image.jpg - must be referenced as follows in the target HTML document - images/image.jpg.
The json description of the job is presented in the file job.json as follows
{
"name": "file-generation-pdf",
"taskReferenceName": "b",
"description": "The business description of the task",
"type": "SUB_WORKFLOW",
"optional": false,
"inputParameters": {
"mode": "FILE",
"file": "file://assets/index.html",
"settings": {
"width": "210mm",
"height": "297mm"
}
}
}2
3
4
5
6
7
8
9
10
11
12
13
14
15
The path to the image.jpg and print.css files are expressed relative to the position of the HTML document.
The above example can be downloaded from the repository Product-Live/data-factory-job-collection
The result of this job is as below:

If on the other hand the HTML file was presented as below (with links to files that would not be present in the job). The PDF file would still be generated.
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" >
<link href="https://fonts.googleapis.com/css?family=Roboto&display=swap" rel="stylesheet" >
<link rel="stylesheet" type="text/css" href="./print-2.css" >
<title>Mon catalogue</title>
</head>
<body>
<div class="page page-A4">
<h1 class="big-red-title">Texte rouge, taille de police 45</h1>
<img src="./images/image-unknown.jpg" height="50%" />
</div>
</body>
</html>2
3
4
5
6
7
8
9
10
11
12
13
14
15
The generated PDF is as below:

Details on the listing document β
<Files>
<File>
<Url>https://app.product-live.com/files-data-factory/d05a74cf11788d8f3ae9bf0e0e028dde66f0c83005c5e0d1211b0069945c0c11</Url>
<File-Name>products-1.pdf</File-Name>
</File>
<File>
<Url>https://app.product-live.com/files-data-factory/fb26911d77fe9a9dc44b111eef5b5db7ca2019c8038445662f29b20c54cb6f29</Url>
<File-Name>products-2.pdf</File-Name>
</File>
</Files>2
3
4
5
6
7
8
9
10
| XPath | Description | Occurrence |
|---|---|---|
Files | Root | 1 |
File | For each file to be recovered | 0..* |
Url | Url of the file | 1 |
File-Name | The name of the file | 1 |
Technical details β
- We use
playwrightto generate the PDF files.- Previously
puppeteerwas used but due to resource consumption issue, we switched toplaywright. The provided features are the same. Playwright Test was created specifically to accommodate the needs of end-to-end testing. Playwright supports all modern rendering engines including Chromium, WebKit, and Firefox. Test on Windows, Linux, and macOS, locally or on CI, headless or headed with native mobile emulation of Google Chrome for Android and Mobile Safari.(see playwright.dev)
- Previously