Notes for Nov. 18, 2013 Based upon Chapter 8 of Essentials of Software Engineering, 3rd Ed. cs561 SW Engineering Design Metrics Early complexity measures included the Halstead Complexity Metric Developed in the 1970's by Maurice Halstead n1 = # of distinct operators n2 = # of distinct operands N1 = sum of all occurrences of n1 N2 = sum of all occurrences of n2 Examples: "+" operator "IF" operator In "a + b", a and b are distinct operands Two other measures exist for the Halstead Metric Program vocabulary or unique tokens n = n1 + n2 Program length N = N1 + N2 (total occurrences) All unique tokes form the unique vocabulary (operands + operators) Notice that the program length is different from the number of lines of code There are other measures, too Volume - V = N * log2 (n) Potential Volume - V@ = (2 + n2@) * log2(2 + n2@) Program implementation level - L = V@/V Effort - E = V/L Volume Volume is straightforward in computation (see formula above) Potential Volume Represents the "most succinct" program f(x1, x2, x3, ..., x_z) n2@ = number of operands used by f, so n2@ = z Program Implementation Level The program implementation level is how close we are to an ideal implementation Effort This is an estimation of effort. This could refer to how hard a program is to implement or to understand Halstead Metric Recap Notice that this metric measures logical complexity of a program, but not its structure or logic. ************************************ McCabe Cyclomatic Complexity Cyclomatic complexity - program quality is directly related to the complexity of the control flow or number of branches in the design or source. Calculate cyclomatic complexity from program or design via control flow diagram (flowchart) Cyclomatic complexity = E - N + 2p E - # of edges in graph N - # of nodes in graph p - # of connected components (often 1) Cyclomatic complexity is also # binary decisions + 1 # closed regions + 1 2 binary decisions + 1 = 3 2 regions + 1 = 3 7 edges - 6 nodes + 2*1 = 3 Important to understand structural complexity and code risk More risk --> more testing needed Per the Software Engineering Institute (SEI) Cyclomatic Complexity <= 10 implies low risk Cyclomatic Complexity >= 50 implies high risk This also refers to the maximum number of linearly independent tasks in the code. *********************************************** Henry Kaufera Information Flow Structural metric for intermodular flow Measures information flow into and out of modules including: -Parameter passing -Global variable access -Inputs -Outputs Fan-in - # of calls to a module and # of global data variables accessed by a module Fan-out - # of modules called by a module and # of global data variables modified by a module Structural complexity of a module p is Cp Cp = (fan-in * fan-out)^2 Example: Module x Module y Module z Fan-in 3 5 1 Fan-out 2 7 4 Cx = (3*2)^2 = 36 Cy = (5*7)^2 = 1175 Cz = (1*4)^2 = 16 Total Henry Kafura Complexity for x, y, and z is 36 + 16 + 1175 = 1227 Modified Henry Kafura Complexity Updated by Henry and Selig to include internal complexity called Cip HCp = Cip * (fan-in * fan-out)^2 Cip is determined by a cyclomatic or Halstead metric. High number means a program is complex, but it does not necessarily refer to understandability or quality ************************************************* Higher level complexity measures Program level and inter program level complexity/interaction -Developed by Card and Glass These include -Structural Complexity -Data Complexity -System Complexity Structural complexity of module y is defined as S_y = (fan-out_y)^2 Data Complexity of module y is defined as D_y = P_y / (fan-out_y + 1) P_y - # of variables passed to and from module y System Complexity C_y is the sum of structural and data complexity C_y = S_y + D_y Notice that this measure is based primarily upon fan-out ******************************************** Program Slice and Data Slice Cohesion Measures data token - variable or constant slice - all statements affecting a variable of interest in a program, procedure, or method data slice - all data tokens in a slice that will affect a variable of interest glue tokens - data tokens that lie in more than one data slice superglue tokens - data tokens that lie in every data slice weak functional cohesion = # glue tokens / total # of data tokens strong functional cohesion = # superglue tokens / total # data tokens Example: See figure 8.2 from the book. Could count occurrences of the variables in a program max(int [] arr, int size) { int max = arr[0]; for(int i = 0; i < size; i++) if(arr[i] > max) max = arr[i]; return max; } Notice that the for loop is a compound statement and contains four data tokens that all refer to the same data slice for max. Notice that all the data tokens refer only to the slice for max Thus we have high cohesion. ******************************************** Coupling Measurement Fenton and Melton (1990) measure coupling as follows: Assign coupling types levels (e.g. 1 = data coupling (not as bad)... 5 = content coupling (worse) ) Evaluate modules x and y by identifying the tightest coupling between x and y and store it as variable i Count the number of types of coupling between x and y and save it as n Pair-wise coupling between x and y is defined as C(x,y) = i + (n/(n+1)) Given a SW system with n modules, x_1... x_n, the overall coupling, C(SW system), is defined as the median of the set of C(x_i, x_j) for all i, j between 1…n ******************************************** Object-Oriented Design Metrics C-K metrics identified by Chidamber and Kemerer (1994) included: -Weighted Methods per Class (WMC) weighted sum of all methods in a class Assume there are n methods m_1 ... m_n WMC = sum(w_i), i = 1...n w_i is the weight for method i This weight could be as simple as assigning 1 for each method or could be more complex by using another metric such as cyclomatic complexity as the weight -Depth of Inheritance Tree (DIT) height of the inheritance tree -Number of Children (NoC) Number of children of a class (i.e. immediate subclasses) -Coupling between object classes (CBO) number of object classes to which another class is coupled higher values often increase complexity and cause programs to become more error-prone because increased coupling means there is increased interclass dependency -Response for a class (RFC) set of methods involved in a response to a message sent to an object of a class this is related to coupling and highly correlated to CBO and WMC Large RFC implies a highly error prone program -Lack of cohesion in methods (LCOM) count of the difference between the number of similar and dissimilar method pairs P - set of method pairs with non common instance variables Q - set of method pairs with common instance variables LCOM = #P - #Q, where # represents set size or cardinality High LCOM implies weak cohesion and possibly higher complexity within a class ******************************************** Law of Demeter Objects should send messages to the following: -Object itself -Object's attributes (instance variables) -Parameters of methods in the object itself -Objects created by the methods of this object -Objects returned from a call to one of this object's methods -Any object in a collection of the items listed above