Memory Leak – Microsoft.Reporting.WinForms.LocalReport

Several weeks ago I was asked to assist with a sluggish ASP.NET C# MVC application. This application is an OLTP system and processes several thousand orders a day. It was hosted on a Windows 2012 R2 64bit 16 Core machine. After discussing with the developer, I was told that the application has acceptable levels of performance for a while and then becomes sluggish (this behavior suggests some kind of  a leak). At times, it becomes extremely slow and rebooting the machine solves the problem. The problem exhibits itself more frequently during days when there were above average volumes.

Downloading Process Explorer from SysInternals, I found the w3wp.exe (there were several, but 2 of them stood out). Process Explorer does not display handle information by default. I will probably blog about this separately. There are many articles out there that explain how to look for Handles using Process Explorer. This should get you started. Here is another excellent blog explaining the theoretical limit of handles. It’s a theoretical limit because there are many types of thresholds that an application can reach which causes severe performance degradation.

I knew that we had a potential handle leak. There were also other issues within the application, that were not related to the handle leak. The challenge was to isolate the handle leak versus defects in the program logic, 3rd party hardware behaviors (like non genuine printer cartridges) and various other user errors.  Debugging a live system with several 100 users connected wasn’t happening. We were able to get a 4.5Gb memory dump.

When I moved the memory dump to a machine where I could analyze it, I found that the Production server was running an older version of the .NET Framework. The MS Symbol server wasn’t much help. I ended up copying the complete framework64 folder to the local machine (a different folder, not the Microsoft.NET\Framework folder) and loaded SOS using the .load command.

01-load sos
Using the time command, the overall CPU utilization by the Process compared to the System uptime showed that it wasn’t CPU contention.

02-time

The dumpheap command displays the managed heap. The output contains the Method Table, Object Instance Count, Memory Allocated and the Type from left to right. !dumpheap -stat reported about 164k lines of output and 9.7M object references. Many of these objects pointed to dynamically created Report Expressions.

01-dumpheap -stat

Typically the next logical step is to use !dumpheap -mt <<method table address>> to display all the individual object instances and then reviewing the individual objects to understand them better.

02-dumpheap -stat

Looking for loaded modules using the !eeheap -loader command showed that the worker process had 2622 App Domains (with a heapsize of 537mb). Dumping one of the modules using !DumpModule command pointed to a Dynamically loaded Expression Library associated with SQL Server Reporting Services.

05-eeheap -loader06-eeheap -loader

At this point, I knew that the issue may be related to the Dynamically Loaded Expression Assembly and its associated AppDomain that failed to unload. What was keeping the AppDomain from unloading? The dynamic expression library is used by the LocalReport object, which is part of the Microsoft.Reporting.Winforms namespace.

Further research showed that creating a new AppDomain is per design. The developers @ Microsoft had identified an issue with the dynamic expression library being loaded into the Default AppDomain and using up more memory each time the LocalReport object was created. An assembly once loaded into an AppDomain can never be unloaded. Hence the dynamic assembly was loaded in a new AppDomain and if all goes well, the AppDomain can be safely unloaded. As per the documentation calling LocalReport.ReleaseSandBoxAppDomain should unload the AppDomain. Unfortunately, the unload never happens because there is a problem with the internal implementation of the Dispose Method.

Using .NET Memory Profiler (JetBrains dotPeek), I was able to identify the Event Delegate

Screen Shot 2017-04-11 at 11.13.42 AM

Looking at the implementation, the handlers are not removed in the Dispose method. This in turn keeps the AppDomain from unloading. Hope this gets resolved in a future release. For now, we are recycling the Application Pool more frequently to work around the memory leak and the developers are looking at an alternate solution to print the labels.

08

One of the questions that I was asked while presenting my findings with the project manager, is why did this suddenly become a problem, since the application had been in Production for over 2 years. The issue had been there from the very beginning. The OOTB Application Pool Recycle was  causing the host process to shutdown after a period of inactivity (this app was used only between 8am-5pm). So it had enough time to shutdown after hours and the problem would never surface until the transaction volume increased, at which point the application started hitting thresholds.

It always helps to set baselines. Keep tabs on System configuration, Transaction Volume and major system change events. This will greatly shorten the time it takes to troubleshoot performance issues.

 

 

WinDbg – Welcome to Logging

Performing dump analysis is often times challenging due to the nature of the beast, the UI or the lack thereof :). Some commands are extremely time consuming or like me, just hate scrolling. Also there is only so much that you can view by scrolling.

Logging can be enabled before running commands that produce tons of output. We can also create seperate log files for each command by setting up separate log files before executing each command. From here, we can open the log files in an External Editor like Notepad++ and work with WinDbg on another monitor.

To enable Logging, goto the Edit Menu -> Open/Close Log File…

01

provide the Log file name and click ok. You are all done. Check the Append checkbox to append to an existing file or else the file gets overwritten.

02

After I am done capturing the logs that I need, I come back and Close Open Log File button to close the open file. This disables logging, while I take the file offline for Analysis.

The output can also be copied to excel and formatted to quickly sift though.

.foreach WinDbg

Recursively executing a command in WinDbg is one of the coolest features. In the example below, I used one method table from the output of a !FinalizeQueue command. The FinalizeQueue similar to DumpHeap provides the Method Table, Object Count, Size allocated and the Type.

  • Here 6dcf1230 is one of the Method Tables that I was interested in. I wanted to see if any objects in this table were rooted.
  • myvar is a variable that holds the output of the !DumpHeap command. –short returns only the address. Maybe in another article I will cover how to process the tokens
  • the .echo command needs no introduction, in this content it is used twice to display the address and to separate the output.
.foreach (myvar {!DumpHeap /d -mt 6dcf1230 -short}) {.echo myvar;!gcroot -all myvar;.echo **************;}

 

Fun with WinDbg

I was looking into a Memory Leak  using a win-32 w3wp memory dump. Looking into the FinalizeQueue, I saw System.Data.DataTableCollection object in Gen2. Based on the hunch, I dumped the method table.

01

Dumped a couple of addresses listed above

02

I looked at the dataset and the _list, which brought me to the _items below.

03

The list of items is actually an Array.

04

Dumping the array and reviewing one of the array element(s) as below.

06

Viewing the value for tablename showed the string below. This is one of the objects used to retain Trace data.

07I am not posting the rest of the table structure, but if you are interested, you are welcome to follow the steps. Trace.axd was still active and ended up disabling it. Now back to looking into the actual issue.

Defrag Tools – Debugging User Mode Crash Dumps Redux

By: Andrew Richards, Chad Beeder

https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-167-Debugging-User-Mode-Crash-Dumps-Redux/player

Recover Unencrypted Password using WinDBG

Interesting challenge at work today. A colleague of mine wanted to recover a password saved as part of an application’s configuration. The Administrator who had setup this Server was no longer with the Team!

I tried a couple of options, but listing the one that worked.

01-Cognos Password Screen

Launched the Application and had the particular window with the Password prompt open. This Configuration utility was loading each section dynamically.

After the configuration window was loaded and visible, the next step was to take a FULL memory dump (using Process Explorer).

Using WinDBG, searched for the username as shown below. The command does an ASCII Search from the base address to the maximum address (32 bit). Refer to the documentation for 64 bit syntax.

s -a 0 L?80000000 “Your Text”

02-Ascii String Search

Press ALT + 5 or Goto View | Memory menu.

03-View Memory

Typing the address as displayed in the ASCII search above. I was lucky enough to stumble upon a section of memory that had the unencrypted configuration with the username / password right next to each other. The security risk this might have posed is another topic

04-Password

I also noticed that the other database credentials were not part of this memory dump. I had to redo these steps from the beginning to recover the other passwords. There were numerous other passwords that had to be recovered, but that is for another post.

 

Debugging Cognos Transformer (Cogtr.exe) Crash using Windbg

Cogtr.exe is the Cognos Transformer Executable, that can execute in batch mode (without a UI). Once in a while this process used to hang. It seemed to be random for the most part, but I did notice that the likelihood of this happening increased when multiple cubes were being refreshed in parallel. Since it pointed to some shared resource at this point, I used Process Monitor and Process Explorer to see if there were any common locks or files that were being shared. Cogtr does use the file system to denote locks for the Project file. There wasn’t anything that I could co-relate. Once this process hangs, the scheduled refresh of that particular cube would fail prompting end users to escalate. Ending the task seems to get things back to normal.

Taking a full memory dump of a process that had hung, I used Windbg to open the dump file.

The following command loads the symbols using the default path.

0:000> .symfix
0:000> .reload

The next step is to get to the stack trace of the method where it crashed. Bang Analyze, does just that. -v is for verbose mode.

0:000> !analyze -v

01-Analyze

Staring with the stack trace, bottom up and ignoring the missing stack, since I did not have the symbol files for cogtr.exe, the “0027e3e4 68faa900 004ff028 03634f50 00000030 mfc100!CWinApp::ShowAppMessageBox+0x120” method call caught my attention. Cogtr was supposed to be in batch mode. A message box wasn’t supposed to be displayed without a UI. This could be the problem. The next step was to get the IBM engineers engaged. We were then able to identify that each cube needed it’s own preference file. More of the -f option here.

02-Memory

After configuration each cube refresh with its own preference file, this issue was resolved.