Safe FME Server Monitoring (API v3)

Possible checks

  • FME Server Health Check

  • FME Server Engines available on server

  • FME Server Engines ready for job execution

  • Check for delayed FME server job execution

  • Job Execution Success Check from FME Server Schedules

Execution

Almost all FME server checks are a combination of a specific request and the formulation of specific expectations that are made of the FME server. For example, if the availability of FME server engines is checked, the user must define how many engines are expected on the server via an 'expectation'.

FME Server Health Check

Create a monitoring service, give it a name, select Safe FME Server (API v3) as type and define the http address of the FME Server (e.g. https://fmeserver.<myhost>.org). Usually, a credential template must also be created beforehand to store the user and password, as FME Server is access-protected. In the standard requests there is an entry Health Check which defines the correct request.

FME Server Engines available on server

In order to be notified in the event of a failure of engines registered on the server, an expectation must be defined. The following expectation expects two engine instances on the FME server:

{"type" : "FmeEngineCountExpectation", "content":[2]}
fme engine count

In the definition of the monitoring service, another job can now be created and assigned the correct request with the standard request 'Engine Check'. To activate the previously created expectation for this job, the corresponding entry must be selected on the 'Quality of Service' tab of this job.

350

FME Server Engines ready for job execution

The available capacity of server engines that are free for job execution can be defined via the same monitoring job and via an additional expectation. In the example below, it is expected that one engine will not be running a job at the time of the test.

{"type" : "FmeEngineAvailableExpectation", "content":[1]}

Check for delayed FME server job execution

The FmeJobTimeShiftExpectation is used for checking delayed jobs (in execution or in the queue). This allows the checking of any time information within an FME job item in relation to the current time and a user defined duration. The section below shows an FME job item from the queue.

{
	"request": {
		"publishedParameters": [
			{
				"name": "FME_SECURITY_ROLES",
				"raw": "fmeadmin fmesuperuser user:admin"
			},
			{
				"name": "FME_SECURITY_USER",
				"raw": "admin"
			}
		],
		"workspacePath": "\"admin_test/test_workspace_1/test_workspace_1.fmw\"",
		"TMDirectives": {
			"rtc": false,
			"ttc": -1,
			"description": "",
			"tag": "admin_test",
			"priority": -1,
			"ttl": -1
		},
		"NMDirectives": {
			"directives": [],
			"successTopics": [],
			"failureTopics": []
		}
	},
	"workspace": "test_workspace_1.fmw",
	"engineHost": "",
	"timeQueued": "2021-02-18T14:00:00+01:00",
	"description": "",
	"repository": "admin_test",
	"userName": "admin",
	"sourceType": "SCHEDULES",
	"id": 80959,
	"sourceName": "admin_test/create_test_log_2",
	"engineName": "",
	"timeSubmitted": "2021-02-18T14:00:00+01:00",
	"status": "QUEUED"
}

If an Expectation is defined as shown below, service.monitor evaluates the time value of the field timeSubmitted for each FME Job Item in relation to the time value of the evaluation subtracted by the duration from duration (here: one hour). If current Time - duration > timeSubmitted an error in the sense of service.monitor is triggered.

{"type" : "FmeJobTimeShiftExpectation", "content":[{"duration":"PT1H","attribute":"timeSubmitted"}]}

In the concrete case this means: If the job has been in the queue for longer than an hour, then service.monitor sends a notification about this FME job.

Use timeSubmitted, timeStarted or timeQueued in conjunction with the standard Jobs queued and Jobs running requests to perform the desired functional checks.

Job Execution Success Check from FME Server Schedules

The 'FmeScheduleJobExecutionExpectation' is suitable for the continuous monitoring of FME server schedules. This can be used to check the success of mission-critical FME schedules.

  1. First, the category and the name of the schedule are identified

  2. In a new monitoring job, select the example request 'Schedule check' and add the category and name for the request.

  3. Execute the test request and identify the periodicity of the schedule (e.g. cron string or day-based execution at a defined time).

  4. Design an "analogue" monitoring template based on the periodicity of the schedule at whose rhythm service.monitor is to be executed and for whose time it is expected that a successful FME job execution has taken place.

  5. Define a FmeScheduleJobExecutionExpectation that additionally defines a time duration in whose time frame a successful execution must have taken place.

The figure below shows the successful (expected) check of an FME job, because the reference time calculated on the basis of the monitoring job execution and the time value stored in the expectation is before the start time of the last FME job execution.

fme schedule expectation en
{"type" : "FmeScheduleJobExecutionExpectation", "content":[{"duration":"PT10M"}]}

At the actual execution time of the monitoring job, the schedule is queried by FME Server. service.monitor will then generate a request to FME Server based on the schedule information, which queries the last successful job execution of this job (and its start time). Based on this information, a decision can be made as to whether or not there is an expected error.

Therefore: If now - duration > timeStarted an error in the sense of service.monitor is triggered.

Duration notation: This follows ISO 8601 "duration format". For example, PT10M means a duration of 10 minutes, PT12H30M5S means 12 hours, 20 minutes and 5 seconds.