The System Resource Usage Monitor (SRUM) is a currently parsed artifact available on Windows 8+ systems. On a basic level, SRUM appears to be the backend database supporting the Task Manager. These tables are stored in an Extensible Storage Engine (ESE) database saved as SRUDB.dat. Generally, there are 30 to 60 days of data saved in this database. The data is written to the database approximately every hour and around shutdowns. Some the tables within this database are currently routinely parsed – including Application Resource Monitor, Push Notifications, Energy Usage (and Long-Term), Network Activity, and Network Connections. However, other tables exist in this artifact, and in this post, I am going to take a deeper dive into one – App Timeline Provider. I will go into my research method, the table’s default values, the columns of interest (AudioOutS, AudioInS, InFocusS, KeyboardInputS, MouseInputS, UserInputS), some interesting findings, a sample of raw data, and some ideas for future research.
To start, let us consider a basic scenario. In law enforcement, we investigate financial crimes. Like DFIR, this frequently includes spreadsheets and documents that are often updated by the same people. In our scenario, a spreadsheet was found to have been edited on Wednesday by an unexpected party. This was caught on Friday morning. Timelining of artifacts shows that around the time of the spreadsheet edit, the user was also invited to a virtual meeting. The employee states they have not edited any spreadsheets this week and do not remember any meetings around that time.
There are plenty of places to start trying to find application times and file interactions, but where would you start to look for data to show what application was visible on screen? What about what was interacted with using mouse and keyboard, or what may have played audio or accepted audio input? How could you show that the user not only may have attended a meeting but also typed something into the spreadsheet software?
Based on the research and data below, I recommend reviewing the App Timeline Provider table {5C8CF1C7-7257-4F13-B223-970EF5939312} in the SRUM database. Unfortunately, this table only appears to save seven days’ worth of data. This table is currently partially parsed by some tools; however, many tools are not parsing a large amount of the data that is available. This table appears to capture the duration of audio input, audio output, user input (keyboard input and mouse input), and in focus time in whole seconds per application in columns, among other things. This table includes the “UserId” which can be used to tie the app activity to a specific account on the machine. The columns I have reviewed for this post are all in the format of “[Artifact]S” which stands for whatever that “artifact” is and the duration in Seconds (ex: InFocusS). These columns operate as a timer in whole seconds for the associated application. For example: If the input starts, goes for 10 seconds, and stops, the column will read 10. If the input then takes a 10-minute pause and continues for 5 more seconds, the column will ignore the paused time and simply add 5 to the 10 seconds from the first input, reading 15 for that session.
Research Method
This research began in approximately March of 2022. I am still working in a loosely-related career field full-time so all research efforts had to work around that. Though it took multiple sessions and wipes, I used a clean installation of Windows 10 Professional 21H2 (Build # 19044.1645) run in VMware Workstation Pro. All on-screen activity in the virtual machine was recorded using OBS Studio with TimeApp (EZ Tool) running on top. That visual data was manually logged after the fact and compared to log data. Using KAPE, artifacts were pulled using the SRUM Target and processed with the SrumECmd Module at multiple intervals and the raw database was viewed using ESEDatabaseView. These two data points were combined in spreadsheets for comparison and review.
These artifacts were all at least minimally tested in multiple browsers (Edge, Chrome, Firefox, Pale Moon, Vivaldi, Brave, OperaGX, Yandex, and Tor) and using a few different applications (VLC, OBS, Solitaire, Zoom, Teams) and normal system applications and sounds (pop up notifications, user access control, speaker test, etc.). Overall, I conducted eight separate full sessions (6+ hours) of test data – starting very simple with one application per hour and ending with normal system usage. The findings were also validated against two other non-research data sets.
Due to the human element of this research, all visual data is “approximately within 1 second” of log time for the most basic simple test. As more data is added to a test, the margin of human error gets a bit wider. Part of the reason for the larger margin of error is that I cannot be sure when Windows starts a timer versus when I catch the frame-by-frame pause. Over the course of numerous instances, the gap continues to widen if I am off by even a fraction of a second. To attempt to mitigate this, I consistently reported start and end times based on on-screen cues across the different tests.
Default Values
The first thing to address is the columns I have highlighted in gray above. In viewing numerous data sets over multiple sample machines and days, there appear to be default values for each column. Depending on whether the column is a timeline entry or a duration column, there are two different values that appear to indicate there is no entry for that column. Throughout this research, I have also referred to them as null values and empty values but have decided that default value is the most accurate. It appears that this value is only changed when there is something to be added. For the timeline columns, the default value is 3038287259199220266. In duration of time columns, that value is 707406378.
For example, if the value in “InFocusTimeline” is the default value of 3038287259199220266, then “InFocusS” should have the default value of 707406378 as you should not have an InFocusTimeline entry if there was no InFocusS time. This has shown to be true in the reverse as well. I have checked across multiple data sets specifically for all the below-researched artifacts and this has held true in most cases, however, there are exceptions – review the “Interesting Findings” section below for details.
AudioOutS
AudioOutS is a column displaying Audio Output in Seconds. The timer starts when Windows receives output and stops when the output stops. This can be shown visually by right-clicking on the volume in the Windows 10 taskbar and clicking “Open Volume Mixer”. I have defined “receiving output” as the program has tried to send audio to an output device. It does not matter what the output device is – tested with wired speakers, display speakers, and wireless headphones. To try to break it down, I have found three states for audio output.
State | Activity | Incrementing in AudioOutS |
Default | Audio is sent from an application to Windows, Windows sends it to an output device. | True |
System Muted | Audio is sent from an application to Windows. Windows is muted and does not send to an output device. | True, since Windows is still receiving the output |
Application Muted | Audio is “playing” but muted within the application. Output is never sent to Windows nor the output device. | False |
A test of playing a video in multiple browsers showed that audio does not increment in this column if the audio is muted within the website or using a web browser’s “tab mute” feature. This is because the browser is not sending any audio to the speakers – I have defined this as Application Muted. However, it would increment if only the system’s output is muted (ex: speakers are muted in Windows taskbar or through system sounds/mixer) – System Muted. In this case, the browser is still sending audio to the speakers, but they are muted by the system.
I did run into specific audio “Interesting Findings” that I will include here too. When audio input is also available for an application session, the AudioInS and true audio out time are added together and that value is what is written to AudioOutS. To get the true audio out time, simply subtract any AudioInS time from the AudioOutS value. This was true whether the AudioInS is written to the table as expected or not. This may mean if the time in AudioOutS exceeds the expected duration time of the application, then it is likely that AudioInS time was included but not properly written. Additional testing on a per application basis would be needed.
Scenario: This data point would have some number of seconds displayed for a virtual meeting application if a meeting were attended. The number of seconds should be more than whatever is in AudioInS if the user activated their microphone in the application, even if they did not speak.
Summary: This column would be useful in showing that audio was played and sent to the speakers but does not prove that the user heard it from the speakers. It also can be used as a building block for AudioInS.
AudioInS
AudioInS is a column displaying Audio Input in Seconds. I have defined this as the program was accessing some form of audio input – a microphone or external audio mixer are possible examples. It does not necessarily mean that someone was audibly speaking into their microphone for the entire duration. For example, if I join a Zoom meeting using computer audio, even if I never speak, it will log to AudioInS. However, if I never allow Zoom to access the microphone or do not have one, it will not log.
This one is less consistently logged on a per application basis. For example, Firefox does not appear to log AudioInS, nor do Pale Moon or Tor – Firefox forks. However, the AudioOutS was still accurate with the visual audio input time plus visual audio output time. This would be difficult to “prove” on the backend when just viewing the logs, but it shows that in some applications, AudioInS displaying default values does not necessarily mean there was no audio in. Additional testing on a per application basis would be needed for validation.
Scenario: This data point would likely have several seconds displayed in a virtual meeting application if a meeting were attended and the user allowed the microphone to be accessed by the application. Again, it would not prove the user spoke in the meeting but that they would have the ability to use their computer to do so. A default value here would not necessarily disprove participation, only that the application may not log it or the user called in.
Summary: This column would be useful in showing that the microphone/audio input was accessed by an application but not all applications log it as expected.
InFocusS
InFocusS is a column displaying the time that a window is “In Focus.” I have defined “In Focus” to be “highlighted on the Windows taskbar” and likely visible as the foreground window on the screen. For example, two applications are open at the same time, whichever one is highlighted on the taskbar is the “In Focus” application and would be incrementing seconds to this data point, while the background one is not. This still works if the applications are split 50/50 on screen as one of them is still “in focus” as far as the taskbar recognizes/highlights, however, both are still clearly visible.
This data point saw more variation when comparing the visual on-screen time notes and the logs when more complicated scenarios were tested. I think this is because I consistently recorded from the time the application appeared in the taskbar or the window opened on-screen, whichever was first. However, it may have been counting from either the click to open the application or when the process started, which I could not consistently visually see in my tests. Many longer or more complicated tests were up to 5 seconds off from on-screen time and on rare occasions, much farther (60 seconds in two tests out of 11 multi-window examples).
Scenario: In this data point, I would expect to find InFocusS time for both the spreadsheet application and the virtual meeting application. It would show that the user at least opened a spreadsheet and did have the meeting software open.
Summary: This data would be useful in showing that the application in question was opened and visible to the user for the provided number of seconds.
KeyboardInputS
KeyboardInputS is a column displaying Keyboard Input in Seconds. This starts and stops with keyboard input. For example, the time it takes for me to type this is being recorded to the SRUM database. The timer stops incrementing within one second of when I stop typing and starts again when I start. Based on some browser tests, it appears this can also be triggered for +1 second input just by clicking into the text input box in a program. An example of this would be clicking into the address bar in a browser but not typing. Also, it does not matter if the keyboard input is displayed on-screen or not. For example, a version of solitaire tested does not use keyboard input, but I typed in the window and it still recorded that input time. Similarly, video games use the keyboard but do not “display” keystrokes on screen, that data is still recorded here.
Scenario: I would expect to find some duration of keyboard input in the spreadsheet software if that machine were used to edit a spreadsheet. This data point may also have a non-default value in the virtual meeting software if a text chat were used.
Summary: This data would be useful in showing that an application received keyboard input, even if it does not take or display traditional typing input.
MouseInputS
MouseInputS is a column displaying Mouse Input in Seconds. This starts and stops when the mouse starts/stops moving within the frame of the window that is in focus on the taskbar as defined above. The more complicated the test, the more difficult tracking this became as start/stop time is hard to gauge visually.
As an example, I open Notepad and immediately click into File Explorer. File Explorer is now in focus and would start to receive input. I then take my mouse outside of the File Explorer window, stopping File Explorer from receiving input, and move it over the background Notepad instance. Even though File Explorer is in focus, it does not have the mouse over its window, so it is not incrementing Mouse Input. And even though the mouse is moving over the Notepad window, it is not in focus, so it is not incrementing Mouse Input.
Scenario: This data may be important to show that the user was interacting with this application while it was on screen. For example, if the spreadsheet application was opened and the mouse input is at 60, it is reasonable to assume a lot of mouse movement (a whole minute worth) occurred during the session and the user certainly knew they had opened a spreadsheet.
Summary: This data can validate that an application was in focus and that the user was interacting with the mouse over that application.
UserInputS
Generally speaking, if UserInputS has an entry, it contains a combination of the time in MouseInputS and KeyboardInputS. Since this data point is true to the activity and not just adding the column data, UserInputS could potentially be a larger number than the sum of Mouse and Keyboard if mouse and keyboard input were occurring at the same time. With some system applications or duplicate processes (for example, the duplicated rows mentioned in “Interesting Findings” below), there can be default values in the keyboard/mouse columns but true activity values in the duplicate columns. It also appears that in some system processes, there may be supporting processes that are contributing to those data points that I am less familiar with.
Scenario: This data can be used to validate findings that the user provided input to the session. I would default to the specific type of input rather than using this column.
Summary: From my research at this time, I found this to be better as a validation of other findings than an independently useful metric.
Interesting Findings
The first interesting finding mentioned above was that it appears some applications have their input column data divided into two or more rows for the same session. For example, for browser data, one row may contain audio data (in/out) and UserInputS and one row contains InFocusS, KeyboardInputS, and MouseInputS. In my research thus far, adding the rows’ duration data together has created a full data point that is accurate to the expected data based on the visual logs. I have found that the duplicate rows are written to the database at the same time and some row pairs will also have the same duration or span columns and EndTime data (available but unresearched columns in App Timeline Provider).
Further, specific to browsers, it does not appear to matter if the browser window is “incognito” or “private,” the data is still recorded the same as a normal browsing session. I tested this in Vivaldi, OperaGX, Edge, Firefox, Chrome, and Yandex. I believe this is because it is the operating system storing this data, not the browser. This held true of running Tor as well – though since it is Firefox-based, it did not log AudioInS (did not even allow it, nor did Pale Moon) but KeyboardInputS, MouseInputS, InFocusS, and AudioOutS all had expected data logged to the App Timeline Provider.
Next, the audio columns were challenging. It did not help that I somehow repeatedly created almost exactly duplicate AudioInS and AudioOutS in my tests, so it took a long time to realize the values were combined in AudioOutS when both existed – an exceedingly long time. However, once I did, it was interesting to see how the different applications handled it. For example, in the final test, Chrome logged 23 seconds of AudioInS and 49 seconds of AudioOutS. Compare this to the visual data of audio input at ~22.3 seconds and audio output at ~26.7. The sum of the visual audio input and audio output is 49 seconds. On the other hand, Firefox does not log AudioInS by default but logged 205 seconds to AudioOut. Compare this to the visual data of audio input at ~40.4 seconds and audio output at ~160 seconds. This leaves the sum of the two visual data points at ~200.4 seconds. This is longer than the “DurationMS” time of 179.9 seconds and visual on-screen time of ~170.8 seconds. If I found that anomalous data in the logs on a case, I may want to dig into browser history and see where Firefox may have been receiving audio input. (Note: I have not shared my findings on DurationMS at this time because it also has its own other anomalies.)
Last, some other Windows processes do weird things that I do not fully understand but intend to continue to research. While Windows is logging the data, I simply do not understand all of them. As a result, some of these system processes do break the rules above but, as far as my user activity examples/tests are concerned, this would not affect my findings. Also, I would only recommend findings from applications that have been validated through app-specific testing. Many of the odd findings are limited to system processes so if a filter is applied to include only the suspect usernames and their activity, a lot of the anomalous data is removed.
Summary
At this point, I hope I have shown exactly where you would look for data to show what application was visible on screen, what was interacted with using mouse and keyboard, and what may have sent audio output or accepted audio input – SRUM’s App Timeline Provider. While I cannot guarantee that all data from all applications are logged as expected to this database, I can say that in my tests, if there was data other than the default value, then something happened in that application regarding that artifact. If there is a “1” in an [Artifact]S column, 1 second of something was reported to have happened in there. As with any artifact, it is recommended to validate the findings of that specific application since, as shown in my tests, some do things that are different from the default, or expected, logging.
Scenario Summary: In our spreadsheet and meeting scenario, I would expect to find non-default data in KeyboardInputS, MouseInputS, UserInputS, and InFocusS for the spreadsheet and AudioInS, AudioOutS, MouseInputS, UserInputS, InFocusS, and possibly KeyboardInputS for the virtual meeting. This data would not prove that the spreadsheet was edited and saved by the user but it would show that the user interacted with the application around the same time and typed into the application to then be combined with other artifact findings. This data would also not prove the virtual meeting was malicious in nature, however, it would show that the user is either incredibly forgetful or not being truthful about attending a virtual meeting during that time.
Data Review
I am also making the final set of test data available for review in spreadsheet form on Google Sheets as well as the above visual test data. It is easier to do it this way as there is a lot of data and it would add “pages” to this write up. In this test, I was specifically looking at the above artifacts except for MouseInputS. I only skipped MouseInputS because it is very time-consuming to visually log. If interested, I do have other shorter tests that validate the MouseInputS findings.
The test data available here contains the following: SrumECmd output was copied to columns A-L, an empty column was left in column M, and the raw ESE database output was copied in columns N-BE. This data is representative of the following on-screen activity: In a fresh virtual machine, on May 14, 2022, I installed Firefox, Yandex, Edge, Chrome, Vivaldi, Pale Moon, and OperaGX. On May 15, 2022, I began recording and clipped out the relevant portion in the above video. I ran each of the installed browsers to play a portion of a Youtube video and do a microphone input test using a website that had been validated as logging appropriately in previous tests. This session includes keyboard input into the address bar of each browser, audio out from Youtube, and audio in for the mic test. “In Focus” time was tested with each browser and by running two browsers at the same time doing the same tests – specifically, Firefox & Yandex and Vivaldi & Chrome. This is a slightly more realistic set of test data to show how the data appears in a normal machine. If you watch the Youtube video above, you will see that I have both the Mixer and Sound settings up on the right side of the screen, this is because I was not capturing the audio in/out within the VM because I suspected this would introduce new anomalies in the data. These two applications appear on the spreadsheet as “SndVol.exe” and “rundll32.exe” if you are curious. The spreadsheet was then filtered down and sorted to display the relevant data in a clean and simple format. However, if you are interested in the full data, making a copy within Google Sheets or downloading a copy should allow you to remove my “protection” from the sheets and do as you wish.
Future Research
Each of the above-mentioned artifacts has an “[Artifact]Timeline” entry that I currently have hypotheses for but no completed research. I expect that there is a 0 time that increments up but I am not sure when that 0 time begins – hypotheses include CPU start time, first process start time, or OS start time. It does appear that separate applications must operate on different timelines, and I believe this may involve interaction with another separate table in the SRUM database.
I suspect that some system processes that are providing anomalous data to these findings are likely supporting other processes, similar to the duplicate rows discussed in “Interesting Findings.”
I am also interested in the EndTime, DurationMS, and SpanMS in this table and how those columns interact. In theory, one would think you could subtract the DurationMS from the EndTime and find the start time. I have found that some of the duration/span data is accurate to on-screen data. However, on just as many occasions the data appeared rounded to approximately one-minute increments or completely broken. At this time, I am not sure how this is determined. I suspect it has to do with the write time of the SRUM database overlapping with application duration. This was briefly mentioned at the end of “Interesting Findings” as well.
I have also started some very preliminary research on DiskRaw, NetworkTailRaw, NetworkBytesRaw, and DisplayRequiredS columns. However, each has proven to require more time than I am able to commit right now.
Final Notes
To knock out this research, for about two months, I watched myself click, mouse wiggle, talk, and watch Youtube in a virtual machine at frame-by-frame speed and entered those timestamps into a spreadsheet. I considered downloading a keylogger to do this live but the added time to test and validate the keylogger was also not appealing. I would then copy of all this data to a combined spreadsheet with tool output and raw output and filter until I found things worth tracking. Once I found them, I spent an entire day testing those specific findings in a fresh virtual machine. Using a virtual machine also negated at least a full day worth of tests because the keyboard input to my personal machine invaded the virtual machine and inflated the data artificially. To its credit, the applications were still “in focus” and I probably should have known better at that point. I am hopeful that these findings will assist in further research and investigations.
If anyone has questions about this research or suggestions for follow up, you are welcome to reach out to me on Twitter, LinkedIn, or even using the form on this website. Thank you for reading!
Finally, a quick note to say thank you to Andrew Rathbun for reviewing this research along the way and assuring that I was not wasting my time after the fifth or sixth mostly unhelpful, but not failed, test. This research also spawned out of a conversation we had about a different table in the SRUM database which ended up above my skillset at the time. I do plan to revisit it, now that I have spent time staring into the void that is SRUM.