Maintain Elasticsearch indices
The events collected by Analytics increase the file index managed by Elasticsearch. Depending on the usage intensity of the systems that are analyzed via service.monitor, this file index can grow significantly. It is not uncommon for file system requirements to grow by 1 GByte per day, posing particular challenges to the service.monitor operating infrastructure.
Sooner or later, measures will have to be taken which weigh the long-term implications of the data collected against the operational capability of the system. For this decision, it is worth taking a look at the types of events that are currently collected by service.monitor. The following types of events are available within map.apps:
-
Start of the map.apps application
-
map interactions (zoom, pan)
-
tool interactions
-
Javascript console events
Within the scope of recording events on the server side, there are event types:
-
Requests to protected services (security.manager)
-
general server requests (security.manager, map.apps, service.monitor)
The types of events serve different analysis purposes, occur at different frequencies, and take up different amounts of space in the file index. The following table contains suggestions on how to store them for a certain period of time.
Event type |
Avg. event size in kb |
Recommended storage time in index |
Use in dashboards or widgets (selection) |
Console log events |
0,81 |
1 day - 1 month |
Table of errors and warnings, errors and warnings per app and per browser |
map.apps apps start events |
0,7 |
forever |
Many analyses are based on this type of event: Geo-IP localization, app usage frequency, user operating environment, … |
map.apps map interactions |
0,69 |
6 months - forever |
Used Services, Requests against ArcGIS Online, Heatmap and Display of Level of Details |
map.apps tool usage |
0,52 |
6 months - forever |
used actions in the apps, users are searching, users are selecting |
Server event of security.manager & map.apps |
0,35 |
6 months - forever |
Use of protected services, all server requests |
Since the log events contribute the lowest value to the long-term usage analysis, but occur frequently and require the most storage space, we recommend that you regularly delete such events from an operational point of view. This also applies to the events of card interactions and tool usage in a much weaker form. This must always be compared with your individual usage preference.
Break down and manage an existing Elasticsearch index
In the analytics part under examples/split-events-by-type
, the delivery of the product contains a template for a logstash pipeline that reads existing indexes of the pattern logstash-*
and writes them to different indexes depending on the event type.
After a successful transfer (copy) of the events, the old indexes can be deactivated or deleted..
Procedure
Data migration can take place while the system is running, but should not affect indexes that are still being accessed in writing. In the split-events-pipeline.conf
file, the input section must therefore be considered first.
In the default variant, all indices are read starting with the name "logstash-"
on a local Elasticsearch cluster ("localhost"
).
It is possible that this information must be adapted to local conditions.
The output section defines the splitting of events according to their event type.
The section uses if/else
conditions to check the presence of certain values and then redirects the events to different named indexes (logstash-map
, logstash-tool
, …).
This section may also have to be modified if Elasticsearch is available on another host.
It is also conceivable that the new indexes may also be broken down by time.
This can be done by adding time patterns to the index name, for example:
- logstash-other-%\{+YYYY}
-
Creation of indices that structure events by years of occurrence
- logstash-map-%\{+YYYY.MM}
-
Erstellung von Indizes, die die Events nach Monaten ihres Auftretens gliedern
- logstash-log-%\{+YYYY.MM.dd}
-
Erstellung von Indizes, die die Events nach Tagen eines Monats ihres Auftretens gliedern
The run-pipeline.bat
file contains a command to run the pipeline.
The command line argument --configtest
checks the pipeline for correctness.
The paths to Logstash and the pipeline configuration must be adjusted beforehand according to the system conditions.
Benefit
After successful migration of the data into the new indexes, an evaluation of the indexes of their size can take place in the file system. On this basis, in addition to the business requirements, you can start active index management. Old indexes that are no longer required are closed or deleted. Closing an index relieves Elasticsearch, since fewer events have to be searched. Deleting indexes also releases hard disk resources.
Structure and maintain future indices
In order to minimize future manual effort, it makes sense to ensure that event types are correctly distributed when indexes are saved for the first time. For this reason, version 4.3.1 is delivered with a more complex output pattern. The pattern can be used to decide individually which indexes/event types should be available with a limited or long retention period.
The automatic management of indexes can either be scripted or implemented with Elasticsearch Curator. |
Subsequent anonymisation of user information according to DSGVO
The Logstash Pipeline described in the article Checking and Ensuring DSGVO Conformity is stored under the path examples/anonymize-user-dsgvo
.