DataPrime aggregation expressions

This guide provides a glossary of IBM® Cloud Logs DataPrime expressions that you can use for aggregation.

`any_value`

Returns any non-null expression value in the group. If expression is not defined, it defaults to the $data object.

any_value(expression: any?)

Returns null if all expression values in the group are null.

Example:

groupby $m.severity calculate any_value($d.url)

`avg`

Calculates the average value of a numerical expression in the group.

avg(expression: number)

Example:

groupby $m.severity calculate avg($d.duration) as average_duration

`count`

Counts non-null expression values. If expression is not defined, all rows will be counted.

count(expression: any?) [into <keypath>]

An alias can be provided to override the keypath the result will be written to.

For example, the following part of a query

count() into $d.num_rows

will result in a single row of the following form:

{ "num_rows": 7532 }

`count_if`

Counts non-null expression values on rows which satisfy the condition. If expression is not defined, all rows that satisfy condition will be counted.

count_if(condition: bool, expression: any?)

Example:

groupby $m.severity calculate count_if($d.duration > 500) as $d.high_duration_logs
groupby $m.severity calculate count_if($d.duration > 500, $d.company_id) as $d.high_duration_logs

`distinct_count`

Counts non-null distinct expression values.

distinct_count(expression: any)

Example:

groupby $l.applicationname calculate distinct_count($d.username) as active_users

`distinct_count_if`

Counts non-null distinct expression values on rows which satisfy condition.

distinct_count_if(condition: bool, expression: any)

Example:

groupby $l.applicationname calculate distinct_count_if($m.severity == 'Error', $d.username) as users_with_errors

`max`

Calculates the maximum value of a numerical expression in the group.

max(expression: number | timestamp)

Example:

groupby $m.severity calculate max($d.duration)

`min`

Calculates the minimum value of a numerical expression in the group.

min(expression: number | timestamp)

Example:

groupby $m.severity calculate min($d.duration)

`percentile`

Calculates the approximate n-th percentile value of a numerical expression in the group.

percentile(percentile: number, expression: number, error_threshold: number?)

Since the percentile calculation is approximate, the accuracy may be controlled with the error_threshold parameter which ranges from 0 to 1 (defaults to 0.01). A lower value will result in better accuracy at the cost of longer query times.

Example:

groupby $m.severity calculate percentile(0.99, $d.duration) as p99_latency

`sample_stddev`

Computes the sample standard deviation of a numerical expression in the group.

sample_stddev(expression: number)

Example:

groupby $m.severity calculate sample_stddev($d.duration)

`sample_variance`

Computes the variance of a numerical expression in the group.

sample_variance(expression: number)

Example:

groupby $m.severity calculate sample_variance($d.duration)

`stddev`

Computes the standard deviation of a numerical expression in the group.

stddev(expression: number)

Example:

groupby $m.severity calculate stddev($d.duration)

`sum`

Calculates the sum of a numerical expression in the group.

sum(expression: number)

Example:

groupby $m.severity calculate sum($d.duration) as total_duration

`variance`

Computes the variance of a numerical expression in the group.

variance(expression: number)

Example:

groupby $m.severity calculate variance($d.duration)

DataPrime Expressions in Aggregations

When querying with the groupby operator, you can apply an aggregation function (such asavg, max, sum) to the bucket of results. This feature gives you the ability to manipulate an aggregation expression inside the expression itself, allowing you to calculate and manipulate your data simultaneously.

Example 1

This examples takes logs which have some connect_duration and batch_duration fields, and calculates the ratio between the averages of those durations, per region.

# Query
source logs
  | groupby region calculate avg(connect_duration) / avg(batch_duration)

Example 2

This query calculates the percentage of logs which don’t have a kubernetes_pod_name out of the total number of logs. The calculation is done per subsystem.

# Query
source logs
| groupby $l.subsystemname calculate
  sum(if(kubernetes.pod_name != null,1,0)) / count() as pct_without_pod_name

Example 3

This query calculates the ratio between the maximum and minimum salary per department, and provides a Based on N Employees string as an additional column per row.

# Query
source logs
| groupby department_id calculate
    max(salary) / min(salary) as salary_ratio
    `Based on {count()} Employees`

Example 4

This query calculates the ratio between error logs and info logs.

source logs
| groupby $m.timestamp / 1h as hour calculate
    count_if($m.severity == '5') / count_if($m.severity == '3') as error_to_info_ratio