Maintain Elasticsearch indices

The events collected by Analytics increase the file index managed by Elasticsearch. Depending on the usage intensity of the systems that are analyzed via service.monitor, this file index can grow significantly. It is not uncommon for file system requirements to grow by 1 GByte per day, posing particular challenges to the service.monitor operating infrastructure.

Sooner or later, measures will have to be taken which weigh the long-term implications of the data collected against the operational capability of the system. For this decision, it is worth taking a look at the types of events that are currently collected by service.monitor. The following types of events are available within map.apps:

  • Start of the  map.apps application

  • map interactions (zoom, pan)

  • tool interactions

  • Javascript console events

Within the scope of recording events on the server side, there are event types:

  • Requests to protected services (security.manager)

  • general server requests (security.manager, map.apps, service.monitor)

The types of events serve different analysis purposes, occur at different frequencies, and take up different amounts of space in the file index. The following table contains suggestions on how to store them for a certain period of time.

Event type

Avg. event size in kb

Recommended storage time in index

Use in dashboards or widgets (selection)

Console log events

0,81

1 day - 1 month

Table of errors and warnings, errors and warnings per app and per browser

map.apps apps start events

0,7

forever

Many analyses are based on this type of event: Geo-IP localization, app usage frequency, user operating environment, …​

map.apps map interactions

0,69

6 months - forever

Used Services, Requests against ArcGIS Online, Heatmap and Display of Level of Details

map.apps tool usage

0,52

6 months - forever

used actions in the apps, users are searching, users are selecting

Server event of security.manager & map.apps

0,35

6 months - forever

Use of protected services, all server requests

Since the log events contribute the lowest value to the long-term usage analysis, but occur frequently and require the most storage space, we recommend that you regularly delete such events from an operational point of view. This also applies to the events of card interactions and tool usage in a much weaker form. This must always be compared with your individual usage preference.

Break down and manage an existing Elasticsearch index

In the analytics part under examples/split-events-by-type, the delivery of the product contains a template for a logstash pipeline that reads existing indexes of the pattern logstash-* and writes them to different indexes depending on the event type. After a successful transfer (copy) of the events, the old indexes can be deactivated or deleted..

Procedure

Data migration can take place while the system is running, but should not affect indexes that are still being accessed in writing. In the split-events-pipeline.conf file, the input section must therefore be considered first. In the default variant, all indices are read starting with the name "logstash-" on a local Elasticsearch cluster ("localhost"). It is possible that this information must be adapted to local conditions.

The output section defines the splitting of events according to their event type. The section uses if/else conditions to check the presence of certain values and then redirects the events to different named indexes (logstash-map, logstash-tool, …​). This section may also have to be modified if Elasticsearch is available on another host. It is also conceivable that the new indexes may also be broken down by time. This can be done by adding time patterns to the index name, for example:

logstash-other-%\{+YYYY}

Creation of indices that structure events by years of occurrence

logstash-map-%\{+YYYY.MM}

Erstellung von Indizes, die die Events nach Monaten ihres Auftretens gliedern

logstash-log-%\{+YYYY.MM.dd}

Erstellung von Indizes, die die Events nach Tagen eines Monats ihres Auftretens gliedern

The run-pipeline.bat file contains a command to run the pipeline. The command line argument --configtest checks the pipeline for correctness. The paths to Logstash and the pipeline configuration must be adjusted beforehand according to the system conditions.

Benefit

After successful migration of the data into the new indexes, an evaluation of the indexes of their size can take place in the file system. On this basis, in addition to the business requirements, you can start active index management. Old indexes that are no longer required are closed or deleted. Closing an index relieves Elasticsearch, since fewer events have to be searched. Deleting indexes also releases hard disk resources.

Query index size

print index name and disk usage
curl 'http://localhost:9200/_cat/indices/log*?h=i,ss'

Close Index

Close index 'my_index'
curl -XPOST 'http://localhost:9200/my_index/_close'

Delete Index

Delete index 'my_index'
curl -XDELETE 'http://localhost:9200/my_index'

Structure and maintain future indices

In order to minimize future manual effort, it makes sense to ensure that event types are correctly distributed when indexes are saved for the first time. For this reason, version 4.3.1 is delivered with a more complex output pattern. The pattern can be used to decide individually which indexes/event types should be available with a limited or long retention period.

The automatic management of indexes can either be scripted or implemented with Elasticsearch Curator.

Subsequent anonymisation of user information according to DSGVO

The Logstash Pipeline described in the article Checking and Ensuring DSGVO Conformity is stored under the path examples/anonymize-user-dsgvo.