Search This Blog

Tuesday, May 2, 2017

Time Perspective

Time Perspective Telling time in forensic computing can be complicated. User interfaces hide the complexity, usually displaying time stamps in a human readable format, e.g. 2017-05-02 18:36:23. But the time stamp is usually not recorded in this format. Instead, it is recorded in seconds (e.g., 1493750183), and generally not even as an integer when looking at the data at the binary level, (e.g., 0xa7d10859).
In this post, I will not tackle decoding hexadecimal time stamps to human readable form. Instead, I will focus on another issue: “it is recorded in seconds” from what starting point?

Epochs

Think of an epoch as a line in the sand, if you will, a starting point. Unix-like systems use ‘unixepoch’ time as their start point, which is 1970-01-01. Windows systems use “Windows Time”, or 1601-01-01 and Macs have for some time used Mac Absolute Time, or 2001-01-01. So, as you can see, time is a matter of perspective, but not limited to the operating system. Application programmers can choose any epoch they wish (Webkit Time, GPS Time, etc.) and it is not uncommon to find many different epochs in use on a single device. Experienced examiners readily recognize some time formats based on their length and starting values, but none can do the conversion to a human readable format without programming assistance.

Converting time stamps

I recently was tasked with creating a super timeline of data from many different devices, mostly computers and mobile phones. Most of the data was stored in SQLite databases, but the automated tools in my arsenal did auto-process all of the databases of interest. I used a graphical SQLite Browser to study the databases and experiment with queries until I could export the evidentiary content, and I then used python to extract the data into a format from which I could synthesize the timeline. The chief problem was that I had several different time formats, meaning they used different epochs, and SQLite only understands unixepoch. While it is still possible to convert non-unixepoch time stamps in SQLite (i.e. adding or subtracting the difference between the foreign epoch and unixepoch), it is clunky and requires a little research and initial calculation to be successful.

The Pythonic Way

I wanted a simple function that would take an epoch and a time stamp as arguments, and then return a human readable format for inclusion in the timeline. In that way, it could flex depending on the format of the data source.
def convert_ts(epoch, ts):
    '''(str, int) --> str
    Takes a timestamp (ts) in seconds and returns a human readable format 
    based on a provided epoch.  Times are UTC.
    
    >>> convert_ts('1970-01-01', 0)
    '1970-01-01 00:00:00'
    >>> convert_ts('1970-01-01', 1493750183)
    '2017-05-02 18:36:23'
    >>> convert_ts('2001-01-01', 515442983)
    '2017-05-02 18:36:23' '''
    
    delta = datetime.datetime.strptime(epoch, "%Y-%m-%d")
    conversion = delta + datetime.timedelta(seconds=ts)
    return conversion.strftime("%Y-%m-%d %H:%M:%S")
Application of the function is quite simple, as you can see from the sample execution in the comments of the function. But I’ll quickly demonstrate how a dictionary and the function could be used to evaluate unknown time stamps.
>>> epochs = {
...    'win': '1601-01-01',
...    'unx': '1970-01-01',
...    'mac': '2001-01-01'}
>>> for item in epochs:
...     epoch = epochs.get(item)
...     print(epoch, convert_ts(epoch, 1493750183))
unx 2017-05-02 18:36:23
win 1648-05-02 18:36:23
mac 2048-05-02 18:36:23
As you can see, the time stamp is evaluated by all three epochs in the dictionary and printed to the terminal. The examiner can look at the dates and consider them in the context of the data source and and determine that the datestamp is likely unixepoch if the other dates make no sense. The dictionary could grow to evaluate as many timestamps as required.
The chief point here is that a time stamp has no meaning without the context of its epoch time. The python function is just a simple demonstration of a flexible way to change epochs and evaluate time stamps.

Tuesday, September 29, 2015

Compression and Android Gmail

Every registered Android mobile device has an associated Google account. Google accounts usually mean Gmail. And, for investigators interested in the Gmail content stored on Androids, that content can be found in the /data/com.google.android.gm/databases directory in a database named in the following format:

mailstore.[GoogleAccount]@gmail.com.db

The database contains 23 tables (at least at the time of this writing), the most interesting of which is messages.

The messages table has 41 fields (or columns). To obtain the basic email content (say, for keyword searching), an investigator would likely want to export the sender’s and receiver’s addresses, the date sent or received, and the subject line, and the message body, at the very least. There is plenty more to be gleaned from the database, but your investigation will dictate the investigative needs.

Caution
Automated tools do not provide the full weath of data to be found in the mailstore database. It is always a good idea to become familiar with the database schema to learn the full potential for your investigation.

The Big Squeeze

If you have experience searching SQLite databases, you might be thinking, "Why go to the trouble of exporting messages from the database? SQLite strings are usually UTF-8, so I can just search the database with regular expressions or plain keywords." Well, there is a catch when it comes to email content in the Gmail mailstore database: zlib compression.

Short length message bodies are written to the body field in the messages table as a plain text string. In a recent exam, the longest message I found in this field in a recent exam was 98 bytes, however. Longer message bodies are compressed using the zlib algorythm and stored in the bodyCompressed field. While SQLite supports compressed databases, it has no function to decompress fields within databases. Instead, it stores such data as a blob type, and it is up to the database user to decompress the data.

Note
The SQLite blob type is sort of a catch-all for any type or data. Data is stored in the format in which it was input.

Extracting Messages

Python is a good option for exporting messages from the Gmail messagestore database. It can both open and query databases, and it can decompress the long message bodies.

Exporting Gmail Messages
import sqlite3
import zlib

# open and query the database
conn = sqlite3.connect('messagestore.db') # database name abbreviated
c = conn.cursor
c.execute("select _id, fromAddress, datetime(dateSentMS/1000,
    'unixepoch', 'localtime'), datetime(dateReceivedMS/1000,
    'unixepoch', 'localtime'), case when body not Null then body
    else bodycompressed end from messages ")
rows = c.fetchall()

# interate through the rows and decompress the long messages
for row in rows:
    id, _from, sent, recv, body = row
    try:
        body = zlib.decompress(body)
    except:
        pass
    print('{}|{}|{}|{}|{}'.format(id, _from, sent, recv, body))
Note
The final line can be adapted to your own needs, i.e., writing the content to a new file or database, or use python regular expressions to search the content, etc.

Some Explaination

The example above is just that: an example. It is intended, like all my posts, to remind me how to process the data and demonstrate how just a few lines of python can be leveraged to extract data. The script could have been shorter, but it would have come at the cost of clarity. That said, there is still some explanation to be had:

The SQLite query in the c.execute method might need some dissecting for you to understand what I did.

select
    _id,
    fromAddress,
    datetime(dateSentMS/1000, 'unixepoch',   'localtime'),
    datetime(dateReceivedMS/1000, 'unixepoch', 'localtime'),
    case
        when body not Null
        then body
        else bodyCompressed
    end
from messages

The dateSentMS and dateReceivedMS fields are recorded in milliseconds since 1/1/1970 (Unix epoch). I let SQLite do the date converstion for me, rather and doing it python, and I converted from Unix epoch to local time. The case statement pulls a little trickery to select the body field or bodyCompressed field. Basically, the row’s body field is checked to see if it is populated. If so, it is returned. If not, the contents of the bodyCompressed field are returned.

In the body decompression section of the script, the contents of the each row are assigned to variables id, _from, sent, recv and body. The try/except clause attempts attempts to decompress the body. If it fails, as it will on the short message bodies, it just uses the contents of the body variable as is. Finally, the row is printed in a pipe-delimited fashion.


Time Perspective

Time Perspective Telling time in forensic computing can be complicated. User interfaces hide the complexity, usually displaying time stamp...