Notes for Nov. 18, 2013
Based upon Chapter 8 of Essentials of Software Engineering, 3rd Ed.
cs561 SW Engineering

Design Metrics

Early complexity measures included the Halstead Complexity Metric
	Developed in the 1970's by Maurice Halstead

	n1 = # of distinct operators
	n2 = # of distinct operands
	N1 = sum of all occurrences of n1
	N2 = sum of all occurrences of n2

	Examples:
	"+" operator
	"IF" operator

	In "a + b", a and b are distinct operands

Two other measures exist for the Halstead Metric

	Program vocabulary or unique tokens

	n = n1 + n2

	Program length

	N = N1 + N2 (total occurrences)

	All unique tokes form the unique vocabulary (operands + operators)

	Notice that the program length is different from the number of lines of code

There are other measures, too

	Volume - V  = N * log2 (n)
	Potential Volume - V@ = (2 + n2@) * log2(2 + n2@)
	Program implementation level - L = V@/V
	Effort - E = V/L

Volume

	Volume is straightforward in computation (see formula above)

Potential Volume

	Represents the "most succinct" program

	f(x1, x2, x3, ..., x_z)

	n2@ = number of operands used by f, so

	n2@ = z

Program Implementation Level

	The program implementation level is how close we are to an ideal implementation

Effort

	This is an estimation of effort.

	This could refer to how hard a program is to implement or to understand

Halstead Metric Recap

	Notice that this metric measures logical complexity of a program, but not its structure or logic.

************************************

McCabe Cyclomatic Complexity

	Cyclomatic complexity - program quality is directly related to the complexity of the control flow
	or number of branches in the design or source.

	Calculate cyclomatic complexity from program or design via control flow diagram (flowchart)

	Cyclomatic complexity = E - N + 2p

	E - # of edges in graph
	N - # of nodes in graph
	p - # of connected components (often 1)

Cyclomatic complexity is also 
	# binary decisions + 1
	# closed regions + 1


	2 binary decisions + 1 = 3
	2 regions + 1 = 3
	7 edges - 6 nodes + 2*1 = 3

Important to understand structural complexity and code risk

	More risk --> more testing needed

	Per the Software Engineering Institute (SEI)

		Cyclomatic Complexity <= 10 implies low risk
		Cyclomatic Complexity >= 50 implies high risk

	This also refers to the maximum number of linearly independent tasks in the code.

***********************************************

Henry Kaufera Information Flow

	Structural metric for intermodular flow
	Measures information flow into and out of modules including:
		-Parameter passing
		-Global variable access
		-Inputs
		-Outputs

	Fan-in - # of calls to a module and # of global data variables accessed by a module
	Fan-out - # of modules called by a module and # of global data variables modified by a 
		module

	Structural complexity of a module p is Cp

	Cp = (fan-in * fan-out)^2

	Example:

		Module x	Module y	Module z
	Fan-in		3		5		1
	Fan-out	2		7		4

	Cx = (3*2)^2 = 36
	Cy = (5*7)^2 = 1175
	Cz = (1*4)^2 = 16

	Total Henry Kafura Complexity for x, y, and z is 36 + 16 + 1175 = 1227

Modified Henry Kafura Complexity

	Updated by Henry and Selig to include internal complexity called Cip

	HCp = Cip * (fan-in * fan-out)^2

	Cip is determined by a cyclomatic or Halstead metric.

	High number means a program is complex, but it does not necessarily refer to 
	understandability or quality

*************************************************

Higher level complexity measures

	Program level and inter program level complexity/interaction
		-Developed by Card and Glass

	These include
		-Structural Complexity
		-Data Complexity
		-System Complexity

	Structural complexity of module y is defined as 

		S_y = (fan-out_y)^2

	Data Complexity of module y is defined as

		D_y = P_y / (fan-out_y + 1)

	P_y - # of variables passed to and from module y

	System Complexity C_y is the sum of structural and data complexity

		C_y = S_y + D_y

	Notice that this measure is based primarily upon fan-out

********************************************

Program Slice and Data Slice Cohesion Measures

	data token - variable or constant
	slice - all statements affecting a variable of interest in a program, procedure, or method
	data slice - all data tokens in a slice that will affect a variable of interest
	glue tokens - data tokens that lie in more than one data slice
	superglue tokens - data tokens that lie in every data slice

	weak functional cohesion = # glue tokens / total # of data tokens

	strong functional cohesion = # superglue tokens / total # data tokens

Example:
	See figure 8.2 from the book.

	Could count occurrences of the variables in a program

	max(int [] arr, int size)
	{
	    int max = arr[0];
	    for(int i = 0; i < size; i++)
	        if(arr[i] > max) max = arr[i];
	    return max;
	}

	Notice that the for loop is a compound statement and contains four data tokens
	that all refer to the same data slice for max.

	Notice that all the data tokens refer only to the slice for max

	Thus we have high cohesion.

********************************************

Coupling Measurement

Fenton and Melton (1990) measure coupling as follows:

	Assign coupling types levels (e.g. 1 = data coupling (not as bad)... 5 = content coupling
	(worse) )
	
	Evaluate modules x and y by identifying the tightest coupling between x and y and store it
	as variable i

	Count the number of types of coupling between x and y and save it as n

	Pair-wise coupling between x and y is defined as C(x,y) = i + (n/(n+1))

	Given a SW system with n modules, x_1... x_n, the overall coupling, C(SW system), is
	defined as the median of the set of C(x_i, x_j) for all i, j between 1…n

********************************************

Object-Oriented Design Metrics

C-K metrics identified by Chidamber and Kemerer (1994) included:

	-Weighted Methods per Class (WMC)
		weighted sum of all methods in a class
		Assume there are n methods m_1 ... m_n
		WMC = sum(w_i), i = 1...n
		w_i is the weight for method i
		This weight could be as simple as assigning 1 for each method or could be more
		complex by using another metric such as cyclomatic complexity as the weight
	-Depth of Inheritance Tree (DIT)
		height of the inheritance tree

	-Number of Children (NoC)
		Number of children of a class (i.e. immediate subclasses)

	-Coupling between object classes (CBO)
		number of object classes to which another class is coupled

		higher values often increase complexity and cause programs to become more
		error-prone because increased coupling means there is increased interclass
		dependency

	-Response for a class (RFC)
		set of methods involved in a response to a message sent to an object of a class
		this is related to coupling and highly correlated to CBO and WMC
		Large RFC implies a highly error prone program

	-Lack of cohesion in methods (LCOM)
		count of the difference between the number of similar and dissimilar method pairs

		P - set of method pairs with non common instance variables
		Q - set of method pairs with common instance variables

		LCOM = #P - #Q, where # represents set size or cardinality

		High LCOM implies weak cohesion and possibly higher complexity within a class

********************************************

Law of Demeter

Objects should send messages to the following:

	-Object itself
	-Object's attributes (instance variables)
	-Parameters of methods in the object itself
	-Objects created by the methods of this object
	-Objects returned from a call to one of this object's methods
	-Any object in a collection of the items listed above