Efficient Static Analysis of Executables For Detecting Malicious Behaviors

Podobne

[ Pobierz całość w formacie PDF ]

the techniques described by Wilson in [37]. This technique is similar to the
Needleman-Wunsch Algorithm used in comparing sequences of DNA samples (See
Figure 18). In addition, this is also similar to the problem of finding the Longest
Common Sequence (LCS) as is described by Cormen et. al. in [38]. Additionally,
it has the same cost of O(nm), where n and m are the lengths of the two strings.
The technique can be conceptualized as loading the two signatures inside a matrix,
with one signature placed on the horizontal axis and the second signature placed on
the vertical axis, and then computing the score of the best path to each cell, as is
shown in Figure 19. Any matches are marked off with an (X). Beginning with the
uppermost left cell, the score of the best path to that cell is calculated by adding the
highest score to the top and left of the cell with a 1 if a match occurs and a 0 if
match does not occur. Once the scores have been calculated, the highest score is
where the best alignment occurs. The correct alignment is constructed by working
backwards from the maximum match and adding gaps or performing deletions.
When finished, the best alignment has been achieved. See Table 5. It is important
to note that even when the best alignment has been achieved, it is possible that the
Euclidean distance will be larger than when the vectors were not aligned.
9
Formal alignment is a precise quantitative scoring system for matches and gaps.
52
10
Figure 18. The Needleman-Wunsch Algorithm used in aligning DNA sequences.
1.1 1.2 1.3 1.4 1.5 1.6 1.1 1.2 1.3 1.4 1.5 1.6
2.1 2.1 0 0 0 0 0 0
1.1 (X) 1.1 1 0 0 0 0 0
1.2 (X) 1.2 0 2 1 1 1 1
1.3 (X) 1.3 0 1 3 2 2 2
1.4 (X) 1.4 0 1 2 4 3 3
1.5 (X) 1.5 0 1 2 3 5 4
Figure 19. The sequence alignment algorithm in action on the sequences from Table 5. Match
= +1, Mismatch = 0.
We can use the other classical similarity measures including cosine
measure, Pearson s Correlation, and the Jaccard coefficient, for greater accuracy
when comparing two signatures. Each of the similarity measures has advantages
and disadvantages. The cosine measure is given in Figure 20 and it measures the
cosine of the angle between the two vectors. The cosine measure does not depend
on the length of the vectors (i.e. signatures), meaning that signatures of different
10
This image taken from slides at http://www.maths.tcd.ie/~lily/pres2/sld007.htm
53
lengths, but similar composition will be treated as similar as opposed to radically
different.
S1S2
Cosine Measure =
S1 " S2
Figure 20. Cosine similarity measure.
The Jaccard coefficient measures the proportion of overlap (the shared
elements) between two vectors to the total number of elements shared between the
two vectors. The formula for the Jaccard coefficient is shown in Figure 21.
S1S2
Jaccard Coefficient =
S1 + S2 - S1S2
Figure 21. Jaccard Coefficient.
Pearson s Correlation measure is another popular similarity measure. It
measures the degree to which two variables (or vectors) are related. It reflects the
strength of linear relationship (i.e. how close the point are to forming a straight
line) between the two variables. It ranges from +1 to -1, with +1 meaning a perfect
positive linear relationship, and -1 means the opposite. Figure 22 has examples of
various Pearson s Correlations and linear relationships. The formula for Pearson s
Correlation is given in Figure 23. Pearson s Correlation measure has the advantage
of minimizing the effects of a single wildly different element if the rest of the
elements are similar.
54
11
Figure 22. Examples of Pearson's Correlations. (a) has a perfect positive linear relationship
(+1). (b) has a perfect negative linear relationship (-1). (c) has a strong, but not perfect
positive linear relationship. (d) has no linear relationship (0).
"S1"S2
S2 -
"S1
N
Pearson s Correlation = r =
2 2
�# �#�# �#
("S1) ("S2)
2 2
�# �#�# �#
- -
"S1 "S2
�# �#�# �#
N N
�# �#�# �#
Figure 23. Pearson's Correlation.
11
Image composed from images found at http://davidmlane.com/hyperstat/index.html
55
4 Chapter 4
4.1 Conclusion
Current virus signatures used by AV scanners are alarmingly weak as
demonstrated by [2, 3, and 21]. A need for stronger signatures based on different
information exists. One solution to this problem is to use Functional Flow
Signature (FFSig), which is a virus signature that would encompass a specific
sequence of API calls, instead of the small bit of binary code currently used for
signatures which could be easily modified.
In this thesis, we present and describe a methodology for efficiently and
statically analyzing a potentially malicious Windows executable, extracting specific
sequences of Win32 API calls made by the executable, storing them efficiently by
utilizing the method presented in [5], and then using several similarity measures to
compare known and unknown signatures. This forms the basis for FFSig. Our [ Pobierz całość w formacie PDF ]

zanotowane.pl

doc.pisz.pl

pdf.pisz.pl

cukierek.xlx.pl