topic=logFormatting context=distributedSystems
- 6 minutes read - 1111 wordsStructured logging in Java for distributed systems
Where do applications log end up anyway?
Applications typically write to standard output. Logs can either be handled
by the app’s supervising process or be passed to the next supervision
layer to be processed. For example Kubernetes collects logs
written to stdout and stderr by containers and stores them in node-specific
log files (usually /var/log/containers based on name).
From there, they can be accessed by tools like kubectl logs.
Some logs are directly written to local log files
e.g. var/log/app.log. On the application level,
loggers can be configured
to write to files in custom location e.g. FileHandler in Java.
For example, Linux has 2 different logging systems:
syslog which writes to /var/log or systemd which writes
journals /var/log/journal in binary format.
To summarize, the way applications logs are handled depends on the supervisor layers i.e the init: each of them can handle the log or pass it to the layer above to handle. The logs end up either in a readable file format or stored in a different non-readable binary format such as journals.
When a process runs on multiple machines, hunting down the file containing the information you are looking for can be unrealistic. Oftentimes, greping through the log files of processes is enough to retrieve the relevant log. This method breaks down when needing to follow a particular data flow through multiple services across all their instances.
Common solutions in Distributed Systems
The logs generated by services often need to be processed, aggregated, stored and analyzed in a central location. Using a single logging location has its own benefits: it simplifies complex searches for troubleshooting without directly accessing the systems or even back-ups the data to protect against data loss.
In its raw form, Linux-based application can send logs over syslog to remote servers for centralized logging.
Alternatively, different log forwarding agents can be used to collect and push logs to a remote location. They usually run on the source application server or containers. Fluentd, Logstash or Filebeat are available options.
Once forwarded, aggregation solutions write the logs to a central store - for instance CloudWatch logs, ElasticSearch, Grafana Loki or even DataDog.
However, correlated logs and efficient troubleshooting won’t magically happen because we’ve forwarded some logs. How can the aggregator separate all the logs it receives?
The last piece to it is: structured logging.
Structured logging involves logging data in a machine-readable format,
typically JSON, XML, CLF or key-value pairs. JSON is a commonly used format
chosen but logfmt is another key-value pair format which is easy to read and write.
Structured logging in Java
Frameworks
Java provides several logging frameworks such as the built-in java.util.logging, Log4j, Logback, and Tinylog. The SLF4J abstraction layer lets developers select or change logging frameworks at deployment time, allowing great flexibility. A common binding is SLF4J with Log4j.
Components
Key components in the Java logging API include:
- Loggers: capture events (logRecords) and pass them to the appender,
- Appenders/Handlers: handle the destination of log records and format them using layouts,
- Filters: specify which appender should be used for each event,
- Layouts/Formatters: define the format of the log entries,

Configuring those components is specific to each framework.
SLF4J/LOG4J: ThreadContext & MDC
In multithreaded applications, tracking events can be challenging,
especially with simultaneous users. The ThreadContext helps
distinguish log entries by adding unique identifiers such as
session IDs, user IDs, or device IDs. Built on the Mapped Diagnostic Context (MDC), the ThreadContext
stores data as key-value pairs. ThreadContext.put(key, value) adds a new key-value pair
to the Context.
Those data stamps are managed ona per-thread
basis and last for the lifetime of the thread.
Hence, it is crucial to clear the context once it’s no
longer needed by using ThreadContext.clearMap()
to prevent memory issues and ensure accurate log data.
In practice
I have experimented on a dummy “encryption” web server using SpringBoot. It processes web requests posted with a text body and given a shift value, and returns the encrypted text following a simple Caesar cypher (more on cyphers to come - yay!).
Note that spring-starter-web already imports a LogBack binding which Slf4j will use. The ConsoleAppender is usually used as the default Appender and comes pre-configured. For simplicity, we provide a format layout to this Appender. The custom layout for our application includes timestamp, request ID, thread name, text length and shift applied.
package com.encrypter;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;
import org.springframework.stereotype.Service;
import java.lang.invoke.MethodHandles;
import java.util.UUID;
@Service
public class EncryptionService {
private static final Logger logger = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
public String encrypt(String text, int shift){
String requestId = generateRequestId();
setMdcContext(requestId, text, shift);
try {
// Log the incoming request with MDC information
logger.info("Received encryption request for Caesar cypher");
String encryptedText = caesarCipherEncrypt(text, shift);
return encryptedText;
} catch (Exception e) {
logger.error("Error during encryption", e);
return "Encryption failed";
} finally {
clearMdcContext();
}
}
private String generateRequestId(){
return UUID.randomUUID().toString();
}
private void setMdcContext(String requestId, String text, int shift) {
MDC.put("requestId", requestId);
MDC.put("shift", String.valueOf(shift));
MDC.put("textLength", String.valueOf(text.length()));
}
private void clearMdcContext(){
MDC.clear();
}
private String caesarCipherEncrypt(String text, int shift) {
// IRRELEVANT FOR THIS POST
}
}
Beyond basic configuration property such as logging level, configuring the logger needs to be done in the native
configuration format. In this case, it is an xml file logback.xml. The Layout is defined in the <pattern> along with
data that can uniquely identify a request coming from the MDC. The format is following
logfmt.
<configuration>
<!-- Logback Console Appender -->
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<!-- Logfmt format for structured logs -->
<pattern>
timestamp=%date{yyyy-MM-dd'T'HH:mm:ss.SSSZ} level=%level logger=%logger message=%message thread=%thread shift=%X{shift} requestId=%X{requestId} textLength=%X{textLength}
</pattern>
</encoder>
</appender>
<!-- Root logger configuration -->
<root level="INFO">
<appender-ref ref="CONSOLE"/>
</root>
</configuration>
Example logs generated by a unit test run and a request from terminal are shown below.
timestamp=2025-02-14T17:31:42.441+0000 level=INFO logger=class com.encrypter.EncryptionService message=Received encryption request for Caesar cypher thread=main shift=10 requestId=1977873c-c2af-4656-888a-1bce2eff0f08 textLength=17
timestamp=2025-02-14T17:39:29.292+0000 level=INFO logger=com.encrypter.EncryptionService message=Received encryption request for Caesar cypher thread=http-nio-8080-exec-1 shift=19 requestId=8bf517fd-700e-4a10-ac4f-d9279a64aa2b textLength=18
Further readings…
-
Logging best practices - article available in the references.
-
Asynchronous logging is a technique where logs are processed by a separate thread. This aims to reduce logging overhead on the execution thread in low-latency software. It is widely documented and discussed topic which is out of scope for this small post - article can be found in the references.
-
While on the topic of Log4j, most readers here will be familiar with the “Log4Shell” security incident back in 2021 on the versions < 2.16 - more details on it in the references. This leads us to a different article covering dependencies versions scanning for vulnerabilities under the DevOps tag.
References
- loggly.com
- SLF4J - MDC
- logfmt
- Oracle - Logging Diagram
- Logging Best Practices
- Asynch Loggin
- Log4Shell