Quick Forensics Analysis of Apache logs

Published: 2024-03-29
Last Updated: 2024-03-29 06:31:27 UTC
by Xavier Mertens (Version: 1)

Sometimes, you’ve to quickly investigate a webserver logs for potential malicious activity. If you're lucky, logs are already indexed in real-time in a log management solution and you can automatically launch some hunting queries. If that's not the case, you can download all logs on a local system or a cloud instance and index them manually. But it's not always the easiest/fastest way due to the amount of data to process.

These days, I'm always trying to process data as close as possible of their location/source and only download the investigation results. So you reduce the bandwidth usage, and local resources (memory, CPU, ...)

I had to analyze a huge set of Apache logs (the current one included all the archived ones - for 1 year) and used the following solution: mal2csv[1] (Malformed Access Logs to CSV). As the name says, the main purpose of this tool is to convert an Apache access log into a CSV file (easier to process in some cases) but it has two interesting extra features:

It deobfuscates encoding (common in web attacks) to humanly readable text
It checks log entries against the PHPIDS[2] regex rules to identify known malicious requests.

Interesting log entries are stored in separate files for further review.

On the web server, Docker was available. To perform my forensic analysis, I created a Docker image to not pollute the server with extra tools (and deleted after the processing). Simple config:

FROM ubuntu:latest
LABEL maintainer="Xavier Mertens <xmertens@isc.sans.edu>"
RUN apt update && \
    apt install -y git python3
WORKDIR /opt
RUN git clone https://github.com/RandomRhythm/mal2csv.git
WORKDIR /opt/mal2csv
ENTRYPOINT ["python3", "./mal2csv.py"]

Once the image is built, access log files can be analyzed like this (if they are located in a default location for Apache):

# mkdir /var/tmp/results
# for F in /var/log/apache2/access.log*
do
  zcat -f $F >/var/tmp/results/$(basename $F).txt
  docker run -it --rm -v /var/tmp/results:/data mal2csv:1.0 -i /data/$(basename $F).txt -o /data/$(basename $F).txt -d -l -p -r -f
done

This loop will process all access.log files one by one, and extract them in /var/tmp/results. For every log, 3 files will be created. Example:

-rw-r--r--  1 root    root    20488876 Mar 28 15:33 access.log.txtLogOutput.Formatted
-rw-r--r--  1 root    root      880986 Mar 28 15:33 access.log.txtLogOutput.Formatted.IDS
-rw-r--r--  1 root    root     1418806 Mar 28 15:33 access.log.txtLogOutput.Formatted.interesting

The "Output.Formatted" file will contain all events converted in CSV. The two others are more interesting:

The "Formatted.IDS" file will contain a listing of events that match PHPIDS rules:

"24","Detects basic obfuscated JavaScript script injections","GET /config/.env HTTP/1.1"
"35","Detects common comment types","GET /phpMyAdmin+++---/index.php HTTP/1.1"
"20","Detects JavaScript language constructs","GET /index.php?s=/Index/\\think\\app/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=HelloThinkPHP21 HTTP/1.1"
"8","Detects self-executing JavaScript functions","GET /?a=fetch&content=<php>die(@md5(HelloThinkCMF))</php> HTTP/1.1"

The "Formatted.Interesting" file will contain the original events that match a PHPIDS rule. Now, you know where to put more effort in your investigations.

Pretty straightforward to perform a quick first analysis of your logs! Note that mal2csv can also process Microsoft IIS logs (use the "-m" command line switch) and the detection rules are located in two files: