Data Factory > file-generation-pdf

TL;DR;

This task allows you to generate one or multiple pdf file(s) from one or multiple html file(s).

Name:

Max execution time: 120 mins

Target

Generate one or more pdf documents.

Example of use in a job

Generate a "4 product per page" catalog

1.	Export Items - Export the products selected by the user
2.	XSLT - Transform to the format expected by the Generate PDF task
3.	Generate PDF - Generate a PDF file for all selected products

Generate a summary file for each product

1.	Export Items - Export the products selected by the user
2.	Split XML - Split into 1 file per product
3.	XSLT - Transform to the format expected by the Generate PDF task
4.	Generate PDF - Generate a PDF file for all selected products
5.	Archive - Put all files in a zip

Inputs and Outputs

JSON task representation

json

{
  "name": "file-generation-pdf",
  "taskReferenceName": "b",
  "description": "The business description of the task",
  "type": "SUB_WORKFLOW",
  "optional": false,
  "inputParameters": {
    "mode": "FILE",
    "file": "${file_transformation_html.output.file}",
    "settings": {
      "width": "210mm",
      "height": "297mm"
    }
  }
}

Inputs

TIP

If at least one task parameter (whether mandatory or not) is invalid, task execution is stopped and the returned status is FAILED. For example:

The proposed value for the mode property is invalid.
The file specified for the file property does not exist

Property	Description
`mode`	Mandatory – Enum - `FILE`, `FILES`
`file`	Mandatory if `mode = FILE` - File
`files`	Mandatory if `mode = FILES` - Array of File
`settings`	Mandatory – Object
`settings.width`	Mandatory – String The width of the documents, must integrate the unit used, `mm` or `cm`.
`settings.height`	Mandatory – String The height of the documents, must integrate the unit used, `mm` or `cm`.

Outputs

Property	Description
`listing`	File An XML file that lists the generated pdf documents
`allFilesGenerated`	Enum - `YES`, `NO` If all files could be generated: `YES` otherwise `NO`.
`files`	Array of File The generated files

Details on HTML documents expected as input in `file` and `files`.

html

<html lang="en">
<head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>products</title>
    <style>
        body {
            margin: 0;
            padding: 0;
            font-family: 'Lucida Sans', 'Lucida Sans Regular', 'Lucida Grande', 'Lucida Sans Unicode', Geneva, Verdana, sans-serif;
        }
    </style>
</head>
<body>
    Hello pdf world!
</body>
</html>

The file must be a valid HTML document
The name of the file to be generated must be entered in the title tag in the head section of the document. It must not contain the extension .pdf.
If the tag is not present, the name no-title.pdf will be applied.
Images or media present in the assets folder of the job can be referenced in the document.
- Examples of use: propose a static cover page, propose icons for the most products.
- the reference is made in a relative way to the source HTML document (see example below)

Example of an HTML file containing references to files in the `assets` directory

In the example below, the job is built this way:

assets\
    pdf-1\
      images\
          image.jpg
      print.css
    index.html
REDME.md
job.json

The index.html file looks like this

html

<html lang="en">
  <head>
      <meta charset="UTF-8" />
      <meta name="viewport" content="width=device-width, initial-scale=1.0" >
      <link href="https://fonts.googleapis.com/css?family=Roboto&display=swap" rel="stylesheet" >
      <link rel="stylesheet" type="text/css" href="./print.css" >
      <title>Mon catalogue</title>
  </head>
  <body>
      <div class="page page-A4">
          <h1 class="big-red-title">Texte rouge, taille de police 45</h1>
          <img src="./images/image.jpg" height="50%" />
      </div>
  </body>
</html>

Note

The paths to the resources in the HTML are from the assets directory within the zip representing the job. A file with the following path - assets/images/image.jpg - must be referenced as follows in the target HTML document - images/image.jpg.

The json description of the job is presented in the file job.json as follows

json

{
  "name": "file-generation-pdf",
  "taskReferenceName": "b",
  "description": "The business description of the task",
  "type": "SUB_WORKFLOW",
  "optional": false,
  "inputParameters": {
    "mode": "FILE",
    "file": "file://assets/index.html",
    "settings": {
      "width": "210mm",
      "height": "297mm"
    }
  }
}

The path to the image.jpg and print.css files are expressed relative to the position of the HTML document.

The above example can be downloaded from the repository Product-Live/data-factory-job-collection

The result of this job is as below:

If on the other hand the HTML file was presented as below (with links to files that would not be present in the job). The PDF file would still be generated.

html

<html lang="en">
  <head>
      <meta charset="UTF-8" />
      <meta name="viewport" content="width=device-width, initial-scale=1.0" >
      <link href="https://fonts.googleapis.com/css?family=Roboto&display=swap" rel="stylesheet" >
      <link rel="stylesheet" type="text/css" href="./print-2.css" >
      <title>Mon catalogue</title>
  </head>
  <body>
      <div class="page page-A4">
          <h1 class="big-red-title">Texte rouge, taille de police 45</h1>
          <img src="./images/image-unknown.jpg" height="50%" />
      </div>
  </body>
</html>

The generated PDF is as below:

Details on the `listing` document

xml

<Files>
  <File>
    <Url>https://app.product-live.com/files-data-factory/d05a74cf11788d8f3ae9bf0e0e028dde66f0c83005c5e0d1211b0069945c0c11</Url>
    <File-Name>products-1.pdf</File-Name>
  </File>
  <File>
    <Url>https://app.product-live.com/files-data-factory/fb26911d77fe9a9dc44b111eef5b5db7ca2019c8038445662f29b20c54cb6f29</Url>
    <File-Name>products-2.pdf</File-Name>
  </File>
</Files>

XPath	Description	Occurrence
`Files`	Root	1
`File`	For each file to be recovered	0..*
`Url`	Url of the file	1
`File-Name`	The name of the file	1

Technical details

We use playwright to generate the PDF files.
- Previously puppeteer was used but due to resource consumption issue, we switched to playwright. The provided features are the same.
- Playwright Test was created specifically to accommodate the needs of end-to-end testing. Playwright supports all modern rendering engines including Chromium, WebKit, and Firefox. Test on Windows, Linux, and macOS, locally or on CI, headless or headed with native mobile emulation of Google Chrome for Android and Mobile Safari. (see playwright.dev)

Data Factory > file-generation-pdf ​

Target ​

Example of use in a job ​

Inputs and Outputs ​

Inputs ​

Outputs ​

Details on HTML documents expected as input in file and files. ​

Example of an HTML file containing references to files in the assets directory ​

Details on the listing document ​

Technical details ​

Data Factory > file-generation-pdf

Target

Example of use in a job

Inputs and Outputs

Inputs

Outputs

Details on HTML documents expected as input in `file` and `files`.

Example of an HTML file containing references to files in the `assets` directory

Details on the `listing` document

Technical details