What is malware analysis?

What is malware?

Malware is short for malicious software. Malicious content is basically anything that harms your computer by running software that you did not intend to run. It could be deleting sensitive files or abusing CPU and memory or network or a combination thereof.

There are many more examples of malicious software. Some of which maybe a little odd like the Sunburst(UNC2452) attack in which the trapdoor did not even steal information.

But most malware cause serious harm and some malware is installed on you phone or computer with your own permission.

When you sign up for usage of a particular software you may be agreeing to some idea about the functions it is expected to do for you but they may run ads or do things you may not expect.

Malware is not spread by unknown means but what you know quite well. Email.

Email is the most popular means of spreading malware and infecting machines

The way things work on the big bad world of Internet is not something that requires a great deal of thinking since it is all done by programs running on various computers and phones and other connected devices.

The way malware spreads could be unpredictable in some cases as when unpatched software with vulnerabilities run on machines infecting and spreading or it could be triggered by users clicking on email links or when they open attachments or run a macro inside an Excel document.

Malware has always been part of the Internet landscape and nobody really likes them or wants them but just like E-mail spam it is a business that shows no sign of abating.

How does malware work ?

Malware works by piggybacking atop other delivery mechanisms, perfectly legitimate activities like email or web browsing or some network management software or any such thing.

Nowadays more and more mobile phones are getting hooked up.Computers and mobile phones run various different operating systems and the operating environment of the software that is designed to inject may not always get an opportunity replicate.

But the crackers/attackers writing malware software know how to spread, infect and run.

The motive behind development of malicious software need not always be money. Even a disgruntled employee or enemy government may use this to compromise enemy systems or to simply hurt someone.

The way to be well behaved from a software perspective can quite easily be applied to name a piece of software as being malicious or otherwise. It may or may not be legal to prevent malware but it is better as software developers to not depend on law and governments but to fight this menace on our own.

If software developers create malware by the same logic software can also defeat them.

Lots of folks think viruses and worms the spread very quickly and cause visible harm or annoyance is what malware means. But if you look at the next section you will be surprised.

There is incredible variety in the ingenuity and creativity of the categories of malware authors.

It is not very clear why malware gets written and there seems to be infinite variety in their motivations as well.

Malware categories

Here are the categories of malware that are popularly understood:

  • adware,
  • backdoor,
  • bot,
  • ddos,
  • dropper,
  • exploit-kit,
  • keylogger,
  • ransomware,
  • remote-access-trojan,
  • resource-exploitation,
  • rogue-security-software,
  • rootkit,
  • screen-capture,
  • spyware,
  • trojan,
  • virus,
  • worm

Despite this breathtaking variation, the single method by which malware infects and spreads has always been email. SMTP as a method for transporting arbitrary binary data or links with binary data that can be run has never lost its appeal.

Why?

This is a question to be addressed to malware writers. But suffice it to say that malware has always been around and will be so for foreseeable future.

It is in our own interest to learn how to identify them, categorize them, report them to authorities and act on them.

How does malware spread?

In this age of viral marketing and things that get shared and liked and consumed en masse or in big numbers being so important for the success of any company/business/idea, we must understand first where this idea of viral comes from.

A virus on Internet is supposed to spread at a very rapid pace. So rapid that no human can cause things to happen so fast.

It is due to automation or the speed of software doing things 1000 to a million times quicker than humans.

Whilst we sleep and rest software keeps spreading mindlessly.

It is like fire in a forest of dead trees.

With plenty of fuel and inflammable material spreading is so easy. Just like that with computers and phones and high speed Internet available even in remotest corners of the world and even in outer space trojans and trapdoors and exploits can easily find methods of replication.

First thing a malware does after infecting a system is find others who it can infect within the environment.

This is similar to the Christian conversion industry in third world countries. The minute you get a convert , first thing he/she is taught is to spread the gospel to as many as he can.

The ability of malware to replicate itself can be sometimes be very educational in how smart the malware authors are since they must know human and program behavior to be able to foresee how they can create more copies and more infections.

Since software by definition is an ideal candidate for replication and duplication this is not a problem as long as the binary of the malware can run and does not stay dormant.

The networked nature of all sorts of big and small devices with various computing capacity does not make things any easier either.

What is yara?

Yara otherwise known as Yet Another Recursive Acronym is a rules engine invented by a Spanish guy Victor Alvarez in 2013.

This company called Virus total came up with this phenomenal malware analysis toolkit/programming language that can look for signatures in malware not as a static fingerprint as previously thought of but as an identity in how it behaves and functions.

The technique before 2013 when this first came about was to depend on the unique file signatures relied upon by DCC or Vipuls' razor or Cymru malware hash registry.

But with attackers learning to adapt their file fingerprints to defeat this , such methods started becoming worthless.

The method was to keep a database of known malicious file signatures and users could query the database for a possible match.

But yara took a different approach to the whole thing.

Instead of looking for a fingerprint that can easily be altered, yara looks for regular expressions, strings, binary matches and other tell tale signs of malware. Here are some sample yara rules from GitHub.

import "hash"

rule Maldoc_APT10_MenuPass {
   meta:
      description = "Detects APT10 MenuPass Phishing"
      author = "Colin Cowie"
      reference = "https://www.fireeye.com/blog/threat-research/2018/09/apt10-targeting-japanese-corporations-using-updated-ttps.html"
      date = "2018-09-13"
   strings:
      $s1 = "C:\\ProgramData\\padre1.txt"
      $s2 = "C:\\ProgramData\\padre2.txt"
      $s3 = "C:\\ProgramData\\padre3.txt"
      $s5 = "C:\\ProgramData\\libcurl.txt"
      $s6 = "C:\\ProgramData\\3F2E3AB9"
   condition:
      any of them or
      hash.md5(0, filesize) == "4f83c01e8f7507d23c67ab085bf79e97" or
      hash.md5(0, filesize) == "f188936d2c8423cf064d6b8160769f21" or
      hash.md5(0, filesize) == "cca227f70a64e1e7fcf5bccdc6cc25dd"
}
rule FE_LEGALSTRIKE_MACRO {
       meta:version=".1"
       filetype="MACRO"
       author="Ian.Ahl@fireeye.com @TekDefense"
       date="2017-06-02"
       description="This rule is designed to identify macros with the specific encoding used in the sample 30f149479c02b741e897cdb9ecd22da7."
strings:
       // OBSFUCATION
       $ob1 = "ChrW(114) & ChrW(101) & ChrW(103) & ChrW(115) & ChrW(118) & ChrW(114) & ChrW(51) & ChrW(50) & ChrW(46) & ChrW(101)" ascii wide
       $ob2 = "ChrW(120) & ChrW(101) & ChrW(32) & ChrW(47) & ChrW(115) & ChrW(32) & ChrW(47) & ChrW(110) & ChrW(32) & ChrW(47)" ascii wide
       $ob3 = "ChrW(117) & ChrW(32) & ChrW(47) & ChrW(105) & ChrW(58) & ChrW(104) & ChrW(116) & ChrW(116) & ChrW(112) & ChrW(115)" ascii wide
       $ob4 = "ChrW(58) & ChrW(47) & ChrW(47) & ChrW(108) & ChrW(121) & ChrW(110) & ChrW(99) & ChrW(100) & ChrW(105) & ChrW(115)" ascii wide
       $ob5 = "ChrW(99) & ChrW(111) & ChrW(118) & ChrW(101) & ChrW(114) & ChrW(46) & ChrW(50) & ChrW(98) & ChrW(117) & ChrW(110)" ascii wide
       $ob6 = "ChrW(110) & ChrW(121) & ChrW(46) & ChrW(99) & ChrW(111) & ChrW(109) & ChrW(47) & ChrW(65) & ChrW(117) & ChrW(116)" ascii wide
       $ob7 = "ChrW(111) & ChrW(100) & ChrW(105) & ChrW(115) & ChrW(99) & ChrW(111) & ChrW(118) & ChrW(101) & ChrW(114) & ChrW(32)" ascii wide
       $ob8 = "ChrW(115) & ChrW(99) & ChrW(114) & ChrW(111) & ChrW(98) & ChrW(106) & ChrW(46) & ChrW(100) & ChrW(108) & ChrW(108)" ascii wide
       $obreg1 = /(\w{5}\s&\s){7}\w{5}/
       $obreg2 = /(Chrw\(\d{1,3}\)\s&\s){7}/
       // wscript
       $wsobj1 = "Set Obj = CreateObject(\"WScript.Shell\")" ascii wide
       $wsobj2 = "Obj.Run " ascii wide

condition:
        (
              (
                      (uint16(0) != 0x5A4D)
              )
              and
              (
                      all of ($wsobj*) and 3 of ($ob*)
                      or
                      all of ($wsobj*) and all of ($obreg*)
              )
       )
}
rule FE_LEGALSTRIKE_MACRO_2 {
       meta:version=".1"
       filetype="MACRO"
       author="Ian.Ahl@fireeye.com @TekDefense"
       date="2017-06-02"
       description="This rule was written to hit on specific variables and powershell command fragments as seen in the macro found in the XLSX file3a1dca21bfe72368f2dd46eb4d9b48c4."
strings:
       // Setting the environment
       $env1 = "Arch = Environ(\"PROCESSOR_ARCHITECTURE\")" ascii wide
       $env2 = "windir = Environ(\"windir\")" ascii wide
       $env3 = "windir + \"\\syswow64\\windowspowershell\\v1.0\\powershell.exe\"" ascii wide
       // powershell command fragments
       $ps1 = "-NoP" ascii wide
       $ps2 = "-NonI" ascii wide
       $ps3 = "-W Hidden" ascii wide
       $ps4 = "-Command" ascii wide
       $ps5 = "New-Object IO.StreamReader" ascii wide
       $ps6 = "IO.Compression.DeflateStream" ascii wide
       $ps7 = "IO.MemoryStream" ascii wide
       $ps8 = ",$([Convert]::FromBase64String" ascii wide
       $ps9 = "ReadToEnd();" ascii wide
       $psregex1 = /\W\w+\s+\s\".+\"/
condition:
       (
              (
                      (uint16(0) != 0x5A4D)
              )
              and
              (
                      all of ($env*) and 6 of ($ps*)
                      or
                      all of ($env*) and 4 of ($ps*) and all of ($psregex*)
              )
       )
}
rule FE_LEGALSTRIKE_RTF {
    meta:
        version=".1"
        filetype="MACRO"
        author="joshua.kim@FireEye.com"
        date="2017-06-02"
        description="Rtf Phishing Campaign leveraging the CVE 2017-0199 exploit, to point to the domain 2bunnyDOTcom"

    strings:
        $header = "{\\rt"

        $lnkinfo = "4c0069006e006b0049006e0066006f"

        $encoded1 = "4f4c45324c696e6b"
        $encoded2 = "52006f006f007400200045006e007400720079"
        $encoded3 = "4f0062006a0049006e0066006f"
        $encoded4 = "4f006c0065"

        $http1 = "68{"
        $http2 = "74{"
        $http3 = "07{"

        // 2bunny.com
        $domain1 = "32{\\"
        $domain2 = "62{\\"
        $domain3 = "75{\\"
        $domain4 = "6e{\\"
        $domain5 = "79{\\"
        $domain6 = "2e{\\"
        $domain7 = "63{\\"
        $domain8 = "6f{\\"
        $domain9 = "6d{\\"

        $datastore = "\\*\\datastore"

    condition:
        $header at 0 and all of them
}

Or


rule Contains_VBE_File : maldoc
{
    meta:
        author = "Didier Stevens (https://DidierStevens.com)"
        description = "Detect a VBE file inside a byte sequence"
        method = "Find string starting with #@~^ and ending with ^#~@"
    strings:
        $vbe = /#@~\^.+\^#~@/
    condition:
        $vbe
}

It is not at all hard to learn how to write yara rules.

But yara also has limitations. It can only guess the behavior based on the internal makeup of the malware executable.

To really glean details about the behavior we need a sandbox like the Cuckoo sandbox or Joe sandbox.

The sandbox environments above as well as websites like Malware zoo help you to analyze malware’s behavior without fear of spreading/infection.

It is the networked computers/devices that cause spreading and it is always risky to play with malware hence a sandboxed environment.

In case you wish to get a taste of this process, then Cuckoo sandbox is open source and free. But to get it going is a lot of work.

What is Malware bazaar?

The open source world in Github is full of various yara rule repositories and malware samples and you also have sites like Malware bazaar and other commercial alternatives like Virus Total Fireeye and so on.

I wish to emphasize malware bazaar API service since it is free even for commercial use and SpamCheetah uses it as well.

What is Malware zoo ?

We already saw Malware zoo above and there are many similar malware repositories where users/programmers can get a taste of how malware is written and the category of activity.

Malware needs to be categorized based on which software it runs on, and how it behaves. The categories we saw above all fit into this idea.

How to analyze it?

The analysis of malware is normally done using the techniques we saw in this blog. But that is not all. What does analysis of malware mean after it has does done all its damage?

Preemptive strikes are not possible here anyway. But the triage and analysis is vital too. As the attackers evolve so do big Internet names in cyber security. If you Google for the Sunburst attack we can find that plenty of top notch companies had to work together for months together to identify the malware and know its behavior.

The threat posed by malware is often innocuous and sometimes not even something that can be understood like the case above.

But malware does leave a bad aftertaste since none of us like someone entering our house and stealing things from it.

Espionage is one of the goals but it is as diverse as the attacker can plan and execute and target.

Sometimes the targeting is based on geo location and location services are vital to thwart such an attack. Or it could be a single business or government entity.

What is wrong with yara rules?

As we saw above Yara rules can be used to detect malware. But they cannot understand the dynamic behavior of the malware executable. For that we need to do a lot more than yara alone.

Moreover the ability to write good yara rules also takes some getting used to.

What other way is there?

If the objective is to fight malware then that is a moving target. With a constantly changing goalpost, the rules of the game also must keep changing and in today’s world we have sandboxes and yara rules and E-mail security software like SpamCheetah that can break down the spread of spam/viruses/trojans/trapdoors and other forms of malware.

SpamCheetah comes out of the box with various malware scanning methods either in attachment or body or URLs also known as SURBL.