Skip to content

feat: Handle JSON-like structured logs better #763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions docker/otel-collector/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,15 @@ receivers:
processors:
transform:
log_statements:
- context: log
error_mode: ignore
statements:
# JSON parsing: Extends log attributes with the fields from structured log body content, either as an OTEL map or
# as a string containing JSON content.
- set(log.cache, ExtractPatterns(log.body, "(?P<0>(\\{.*\\}))")) where IsString(log.body)
- merge_maps(log.attributes, ParseJSON(log.cache["0"]), "upsert") where IsMap(log.cache)
- flatten(log.attributes) where IsMap(log.cache)
- merge_maps(log.attributes, log.body, "upsert") where IsMap(log.body)
- context: log
error_mode: ignore
conditions:
Expand Down
56 changes: 56 additions & 0 deletions smoke-tests/otel-collector/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# OpenTelemetry Collector Smoke Tests

This directory contains smoke tests for validating the OpenTelemetry Collector functionality in HyperDX.

## Prerequisites

Before running the tests, ensure you have the following tools installed:

- [Bats](https://github.com/bats-core/bats-core) - Bash Automated Testing System
- [Docker](https://www.docker.com/) and Docker Compose
- [curl](https://curl.se/) - Command line tool for transferring data
- [ClickHouse client](https://clickhouse.com/docs/en/integrations/sql-clients/clickhouse-client) - Command-line client for ClickHouse

## Running the Tests

To run all the tests:

```bash
cd smoke-tests/otel-collector
bats *.bats
```

To run a specific test file:

```bash
bats hdx-1453-auto-parse-json.bats
```

## Test Structure

- `*.bats` - Test files written in Bats
- `data/` - Test data used by the tests
- `test_helpers/` - Utility functions for the tests
- `docker-compose.yaml` - Docker Compose configuration for the test environment

## Debugging

If you need to debug the tests, you can set the `SKIP_CLEANUP` environment variable to prevent the Docker containers from being torn down after the tests complete:

```bash
SKIP_CLEANUP=1 bats hdx-1453-auto-parse-json.bats
```

or

```bash
SKIP_CLEANUP=true bats hdx-1453-auto-parse-json.bats
```

With `SKIP_CLEANUP` enabled, the test containers will remain running after the tests complete, allowing you to inspect logs, connect to the containers, and debug issues.

To manually clean up the containers after debugging:

```bash
docker compose down
```
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
SELECT ResourceAttributes, LogAttributes FROM otel_logs WHERE ResourceAttributes['suite-id'] = 'auto-parse' AND ResourceAttributes['test-id'] = 'default' ORDER BY TimestampTime FORMAT CSV
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"{'suite-id':'auto-parse','test-id':'default'}","{}"
"{'suite-id':'auto-parse','test-id':'default'}","{}"
"{'suite-id':'auto-parse','test-id':'default'}","{}"
"{'suite-id':'auto-parse','test-id':'default'}","{}"
53 changes: 53 additions & 0 deletions smoke-tests/otel-collector/data/auto-parse/default/input.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
{
"resourceLogs": [
{
"resource": {
"attributes": [
{
"key": "suite-id",
"value": {
"stringValue": "auto-parse"
}
},
{
"key": "test-id",
"value": {
"stringValue": "default"
}
}
]
},
"scopeLogs": [
{
"scope": {},
"logRecords": [
{
"timeUnixNano": "1901999580000000000",
"body": {
"stringValue": "[note] this is very much not JSON even though it starts with an array char"
}
},
{
"timeUnixNano": "1901999580000000001",
"body": {
"stringValue": "{note} this is very much not JSON even though it starts with an object char"
}
},
{
"timeUnixNano": "1901999580000000002",
"body": {
"stringValue": "NOTE: this is very much not JSON"
}
},
{
"timeUnixNano": "1901999580000000003",
"body": {
"stringValue": "this has some {Key {Value { '{' } } invalid JSON in it"
}
}
]
}
]
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
SELECT ResourceAttributes, LogAttributes FROM otel_logs WHERE ResourceAttributes['suite-id'] = 'auto-parse' AND ResourceAttributes['test-id'] = 'json-string' ORDER BY TimestampTime FORMAT CSV
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"{'suite-id':'auto-parse','test-id':'json-string'}","{'attr.intValue':'1','found':'false','message':'this should be parsed into a map'}"
"{'suite-id':'auto-parse','test-id':'json-string'}","{'bodyAttr':'12345','message':'this has an existing user attribute that should be preserved.','userAttr':'true'}"
"{'suite-id':'auto-parse','test-id':'json-string'}","{'found':'true','position':'trailing'}"
"{'suite-id':'auto-parse','test-id':'json-string'}","{'found':'true','position':'leading'}"
"{'suite-id':'auto-parse','test-id':'json-string'}","{'found':'true','position':'wrapped'}"
67 changes: 67 additions & 0 deletions smoke-tests/otel-collector/data/auto-parse/json-string/input.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
{
"resourceLogs": [
{
"resource": {
"attributes": [
{
"key": "suite-id",
"value": {
"stringValue": "auto-parse"
}
},
{
"key": "test-id",
"value": {
"stringValue": "json-string"
}
}
]
},
"scopeLogs": [
{
"scope": {},
"logRecords": [
{
"timeUnixNano": "1901999580000000000",
"body": {
"stringValue": "{\"attr\":{\"intValue\": 1},\"found\":false,\"message\":\"this should be parsed into a map\"}"
}
},
{
"timeUnixNano": "1901999580000000001",
"attributes": [
{
"key": "userAttr",
"value": {
"boolValue": true
}
}
],
"body": {
"stringValue": "{\"bodyAttr\":12345,\"message\":\"this has an existing user attribute that should be preserved.\"}"
}
},
{
"timeUnixNano": "1901999580000000002",
"body": {
"stringValue": "should find the trailing JSON object {\"found\":true,\"position\":\"trailing\"}"
}
},
{
"timeUnixNano": "1901999580000000003",
"body": {
"stringValue": "{\"found\":true,\"position\":\"leading\"} should find the leading JSON object "
}
},
{
"timeUnixNano": "1901999580000000004",
"body": {
"stringValue": "should find a wrapped JSON object {\"found\":true,\"position\":\"wrapped\"} between text"
}
}
]
}
]
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
SELECT ResourceAttributes, LogAttributes FROM otel_logs WHERE ResourceAttributes['suite-id'] = 'auto-parse' AND ResourceAttributes['test-id'] = 'otel-map' ORDER BY TimestampTime FORMAT CSV
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"{'suite-id':'auto-parse','test-id':'otel-map'}","{'account-id':'550e8400-e29b-41d4-a716-446655440000','message':'data sent as OTEL map should also extend the log attributes','user-id':'1234'}"
56 changes: 56 additions & 0 deletions smoke-tests/otel-collector/data/auto-parse/otel-map/input.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
{
"resourceLogs": [
{
"resource": {
"attributes": [
{
"key": "suite-id",
"value": {
"stringValue": "auto-parse"
}
},
{
"key": "test-id",
"value": {
"stringValue": "otel-map"
}
}
]
},
"scopeLogs": [
{
"scope": {},
"logRecords": [
{
"timeUnixNano": "1901999580000000000",
"body": {
"kvlistValue": {
"values": [
{
"key": "message",
"value": {
"stringValue": "data sent as OTEL map should also extend the log attributes"
}
},
{
"key": "user-id",
"value": {
"stringValue": "1234"
}
},
{
"key": "account-id",
"value": {
"stringValue": "550e8400-e29b-41d4-a716-446655440000"
}
}
]
}
}
}
]
}
]
}
]
}
32 changes: 32 additions & 0 deletions smoke-tests/otel-collector/hdx-1453-auto-parse-json.bats
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env bats

load 'test_helpers/utilities.bash'
load 'test_helpers/assertions.bash'

setup_file() {
validate_env
docker compose up --build --detach
wait_for_ready "otel-collector" "http://localhost:4318"
}

teardown_file() {
attempt_env_cleanup
}

@test "JSON string body content should be parsed and stored as log attributes" {
emit_otel_data "http://localhost:4318" "data/auto-parse/json-string"
sleep 1
assert_test_data "data/auto-parse/json-string"
}

@test "OTEL map content should be stored as log attributes" {
emit_otel_data "http://localhost:4318" "data/auto-parse/otel-map"
sleep 1
assert_test_data "data/auto-parse/otel-map"
}

@test "all other content should skip storing values in log attributes" {
emit_otel_data "http://localhost:4318" "data/auto-parse/default"
sleep 1
assert_test_data "data/auto-parse/default"
}
9 changes: 9 additions & 0 deletions smoke-tests/otel-collector/test_helpers/utilities.bash
Original file line number Diff line number Diff line change
Expand Up @@ -72,3 +72,12 @@ emit_otel_data() {
fi
return 0
}

attempt_env_cleanup() {
# Check if we should keep the test containers running
if [[ "${SKIP_CLEANUP}" == "1" ]] || [[ "$(echo "${SKIP_CLEANUP}" | tr '[:upper:]' '[:lower:]')" == "true" ]]; then
echo "🔍 SKIP_CLEANUP is set, skipping container cleanup" >&3
return 0
fi
docker compose down
}