# GET Health Check
Design discussion for new Health Check implementation.
# Objectives
The goal for this design is to implement a new Health check for mojaloop switch services that allows for a greater level of detail.
It Features:
- Clear HTTP Statuses (no need to inspect the response to know there are no issues)
- ~Backwards compatibility with existing health checks~ - No longer a requirement. See this discussion (opens new window).
- Information about the version of the API, and how long it has been running for
- Information about sub-service (kafka, logging sidecar and mysql) connections
# Request Format
/health
Uses the newly implemented health check. As discussed here (opens new window) since there will be no added connection overhead (e.g. pinging a database) as part of implementing the health check, there is no need to complicate things with a simple and detailed version.
Responses Codes:
200
- Success. The API is up and running, and is sucessfully connected to necessary services.502
- Bad Gateway. The API is up and running, but the API cannot connect to necessary service (eg.kafka
).503
- Service Unavailable. This response is not implemented in this design, but will be the default if the api is not and running
# Response Format
Name | Type | Description | Example |
---|---|---|---|
status | statusEnum | The status of the service. Options are OK and DOWN . See statusEnum below. | "OK" |
uptime | number | How long (in seconds) the service has been alive for. | 123456 |
started | string (ISO formatted date-time) | When the service was started (UTC) | "2019-05-31T05:09:25.409Z" |
versionNumber | string (semver) | The current version of the service. | "5.2.5" |
services | Array<serviceHealth> | A list of services this service depends on, and their connection status | see below |
# serviceHealth
Name | Type | Description | Example |
---|---|---|---|
name | subServiceEnum | The sub-service name. See subServiceEnum below. | "broker" |
status | enum | The status of the service. Options are OK and DOWN | "OK" |
# subServiceEnum
The subServiceEnum enum describes a name of the subservice:
Options:
datastore
-> The database for this service (typically a MySQL Database).broker
-> The message broker for this service (typically Kafka).sidecar
-> The logging sidecar sub-service this service attaches to.cache
-> The caching sub-service this services attaches to.
# statusEnum
The status enum represents status of the system or sub-service.
It has two options:
OK
-> The service or sub-service is healthy.DOWN
-> The service or sub-service is unhealthy.
When a service is OK
: the API is considered healthy, and all sub-services are also considered healthy.
If any sub-service is DOWN
, then the entire health check will fail, and the API will be considered DOWN
.
# Defining Sub-Service health
It is not enough to simply ping a sub-service to know if it is healthy, we want to go one step further. These criteria will change with each sub-service.
# datastore
For datastore
, a status of OK
means:
- An existing connection to the database
- The database is not empty (contains more than 1 table)
# broker
For broker
, a status of OK
means:
- An existing connection to the kafka broker
- The necessary topics exist. This will change depending on which service the health check is running for.
For example, for the central-ledger
service to be considered healthy, the following topics need to be found:
topic-admin-transfer
topic-transfer-prepare
topic-transfer-position
topic-transfer-fulfil
# sidecar
For sidecar
, a status of OK
means:
- An existing connection to the sidecar
# cache
For cache
, a status of OK
means:
- An existing connection to the cache
# Swagger Definition
Note: These will be added to the existing swagger definitions for the following services:
ml-api-adapter
central-ledger
central-settlement
central-event-processor
email-notifier
{
/// . . .
"/health": {
"get": {
"operationId": "getHealth",
"tags": [
"health"
],
"responses": {
"default": {
"schema": {
"$ref": "#/definitions/health"
},
"description": "Successful"
}
}
}
},
// . . .
"definitions": {
"health": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": [
"OK",
"DOWN"
]
},
"uptime": {
"description": "How long (in seconds) the service has been alive for.",
"type": "number",
},
"started": {
"description": "When the service was started (UTC)",
"type": "string",
"format": "date-time"
},
"versionNumber": {
"description": "The current version of the service.",
"type": "string",
"example": "5.2.3",
},
"services": {
"description": "A list of services this service depends on, and their connection status",
"type": "array",
"items": {
"$ref": "#/definitions/serviceHealth"
}
},
},
},
"serviceHealth": {
"type": "object",
"properties": {
"name": {
"description": "The sub-service name.",
"type": "string",
"enum": [
"datastore",
"broker",
"sidecar",
"cache"
]
},
"status": {
"description": "The connection status with the service.",
"type": "string",
"enum": [
"OK",
"DOWN"
]
}
}
}
}
}
# Example Requests and Responses:
Successful Legacy Health Check:
GET /health HTTP/1.1
Content-Type: application/json
200 SUCCESS
{
"status": "OK"
}
Successful New Health Check:
GET /health?detailed=true HTTP/1.1
Content-Type: application/json
200 SUCCESS
{
"status": "OK",
"uptime": 0,
"started": "2019-05-31T05:09:25.409Z",
"versionNumber": "5.2.3",
"services": [
{
"name": "broker",
"status": "OK",
}
]
}
Failed Health Check, but API is up:
GET /health?detailed=true HTTP/1.1
Content-Type: application/json
502 BAD GATEWAY
{
"status": "DOWN",
"uptime": 0,
"started": "2019-05-31T05:09:25.409Z",
"versionNumber": "5.2.3",
"services": [
{
"name": "broker",
"status": "DOWN",
}
]
}
Failed Health Check:
GET /health?detailed=true HTTP/1.1
Content-Type: application/json
503 SERVICE UNAVAILABLE
# Sequence Diagram
Sequence design diagram for the GET Health