Traceability > General Information β
Introduction β
This feature (or set of features) is designed to provide a way to track the changes made to any entity (item, field, classification, etc.) inside the Product-Live platform, using any way possible to update the data (UI, API, Fata Factory import, etc.).
Terminology β
- Audit Log: A log, or audit log, is a single record of a change made to an entity inside the Product-Live platform
Key Concepts β
- The traceability feature is based on the Audit Trail concept
- Every change made to any entity inside the Product-Live platform is recorded in the database
- Every change is recorded with the following information:
- The user who made the change
- The date and time of the change
- A key to uniquely identify the type of change and the entity impacted (update of an item, update of a field, update of a classification, etc.)
- The entity identifier (item id, field id, classification id, etc.)
- The entity data after the change
Entities Audited β
The folowing entities are audited to date:
- Account
- Item
- Suggestion
Compliance and regulations β
- Our contract with our customers includes a clause that allows them to request the deletion of all their data from our platform. Every log entry should be linked to an account, so that they could be deleted if requested.
- The retention period of the logs is defined globally for the platform, and is currently set to 1 year. This period is configurable though a configuration file.
Audit Logs β
Audit Log structure β
Log Type β
The log type is composed of the following elements:
- The entity type (account, item, suggestion, ...)
- The action performed (update, delete, ...)
Example
account.update_account_planuser.loginitem.delete
Log data β
The log posesses the following data:
TenantId: The log unique identifier (generated by LogAnalytics). It is of typeguid(global unique id) in KQL.TimeGenerated: The date and time of the log ingestion (generated by LogAnalytics). It is of typedatetimein KQL.EntityType: The entity type:account,item,suggestion. It is of typestring.PlType: The log type:account.create,account.update,account.update_account_plan,item.create,item.update,item.delete,suggestion.apply. It is of typestring.Version: The version of audit log entry (defines the structure used inDetails)Details: The log details (specific to the log type). It is of typedynamicin KQL, and object with values of type string in JS.Metadata: Additional relevant metadata such as the user agent of the user who performed the action, information regarding the device used, etc. It is of typedynamicin KQL, and object with values of type string in JS.
Some other columns (generated by LogAnalytics) are sometimes added when querying the logs:
TenantId: The tenant id on azureType: The table storing the logs
Expected volumetry β
The expected volumetry of the logs is the following:
| Log type | Volume per day (to date) | Expected volume per day (in 1 year) | Expected volume year |
|---|---|---|---|
account.update_account_plan | 15 | 50 | 20 000 |
account.update_status | 10 | 30 | 10 000 |
account.create | 1 | 10 | 1 000 |
account.update | 50 | 100 | 40 000 |
user.login | 1 000 | 3 000 | 1 000 000 |
item.app_update_field | 3 000 | 10 000 | 4 000 000 |
item.app_delete_field | 300 | 1 000 | 400 000 |
item.data_factory_create | 2 000 | 5 000 | 2 000 000 |
item.data_factory_update | - | 50 0000 | 20 000 000 |
item.data_factory_delete | - | 10 000 | 4 000 000 |
item.api_create | - | 1 000 | 400 000 |
item.api_update | - | 10 000 | 4 000 000 |
| Log type | Volume per day (to date) | Expected volume per day (in 1 year) | Expected volume year |
|---|---|---|---|
image_api.fetch_image | - | 30 000 | 10 000 000 |
data_factory.job_launch | 10 000 | 20 000 | 8 000 000 |
data_factory.job_end | 10 000 | 20 000 | 8 000 000 |
data_factory.task_start | 200 000 | 400 000 | 150 000 000 |
data_factory.task_end | 200 000 | 400 000 | 150 000 000 |
Note
We expect to ingest around 40 millions log entries per year.
Possible applications β
This feature has many possible applications, including:
- Trace who performed an operation, when, what the data was before and after. In particular, in the event of an investigation by the support department into a suspected bug on the platform
- Expore the traced data periodically, so that it can be digested by third-party tools, with a view to internal Product-Live exploitation:
- Identify the users/accounts that perform the most editing within the grid
- Measure the evolution of product data collection within a given account
Technical implementation β
Terminology: β
- Azure monitor: resource in Azure that can ingest audit logs. In order to configure the ingestion of audit logs, two resources are needed:
- Data Collection Endpoint specifying the endpoints for ingesting the logs;
- Data Collection Rule specifying the format of the logs and the stream name. A stream corresponds to a table in the LogAnalytics workspace.
- LogAnalytics workspace: resource in Azure that can store and query logs. The logs are stored in tables, and can be queried using the Kusto Query Language.
- LogAnalytics client: used in yuba as a wrapper around the
azure/monitor-ingestionandazure/monitor-queryclients. It is responsible for sending the audit logs to an azure monitor, and for querying the LogAnalytics workspace to retrieve logs.
The flow of an audit log β
When an entity is created or updated, an audit log is generated with the information available and sent to a RabbitMQ queue. In ashitaka, an audit log consumer is retrieving the logs. The consumer is also in charge of formatting the log and hydrating it with missing information if needed. Note that the data added to the log in this step is not supposed to change over time (for example, the table id of an item). The consumer then sends the log to the azure monitor, via the LogAnalytics client.
The LogAnalytics client is used to query the LogAnalytics workspace. In the case of the audit logs for the account entity, kaonashi is using the LogAnalytics client to generate reports with the account activities. It is possible to filter the logs by account id, user id and activity type and a kusto query is generated based on these filters.
For testing the flow of an audit log, a mock is replacing the LogAnalytics client in tests. The mock is using mongodb to store the logs and then to query them, replacing the calls to azure.
Q&A β
What is the retention period of the logs?
The retention period of the logs is defined globally for the platform, and is currently set to 1 year.
Is this feature may be used to track the volume of a certain type of operation over a given period of time?
Yes, this feature may be used to track the volume of a certain type of operation over a given period of time. however, the proposed data is "raw", and to obtain a particular metric, it is necessary to reprocess the data. The purpose of this document is not to show how this data can be reprocessed. Many solutions are possible, such as reprocessing selected logs within a Data Factory job.