# Performance Index Calculation

The Software Development Performance Index (SDPI) framework codifies a balanced set of outcome measures that, when used within CA Agile Central® Unlimited Edition, can give you feedback on your own teams and organization. This document explains the SDPI and how these metrics are calculated. To learn more, visit www.rallydev.com.

This topic includes the following:

- Decision versus outcome measurements
- Scores
- Timebox granularity
- Snapshots and the temporal data model
- Real teams
- Measurements

#### Time buckets

Each metric is calculated for a particular time bucket. The summary SDPI charts are most commonly shown in quarters. The drill-down charts are most commonly shown in months.

#### Real teams from projects

The project entity in CA Agile Central is the team container but its hierarchical nature means that some projects represent other organizational entities (meta-teams, divisions, departments, and so on). Some may even represent projects. To determine which project entities are actually teams, we use a Bayesian classifier that looks at how much work is contained in the project, how close to the leaf of the hierarchy it sits, and a number of other characteristics.

#### Team size

We heuristically extract team membership by looking at who is working on what items and who is the owner of those work items. We then determine what fraction of the time each person is working on each team. The team size is the sum of these fractions.

#### Percentile scoring

The units for each raw metric are different. For some metrics higher is better whereas lower is better for others. To make it easier to interpret the metric and enable the aggregation of dissimilar units into a single index, raw metrics are converted into a percentile score across the entire distribution of all similar metrics. Higher is always better for percentiles.

#### Calculating the index

The SDPI is made up of several dimensions. Each raw metric is percentile scored and one or more of those are averaged to make up a particular dimension (for example, the quality dimension is the percentile score of defect density for defects found in production averaged with the percentile score of defect density for defects found in test). To calculate the overall SDPI, we take the average of the contributing dimensions' scores. If there are four dimensions, then the maximum contribution of any one will be 25 to this final SDPI score.

#### Responsiveness score from Time in Process (TiP)

Time in Process (TiP) is the amount of time (in fractional days) that a work item spends in a particular state. Weekends, holidays, non-work hours are not counted. We take the median TiP of all the work items that completed in a particular time bucket (say January, 2013) and record that as the TiP for that time bucket. While other parameters are possible, we primarily look at the TiP of User Stories and we define In Process as ScheduleState equals In Progress or Completed.

#### Quality score from Defect Density

Defect density is the count of defects divided by man days, where man days is team size times the number of workdays in that time bucket. This results in a metric that represents the number of defects per team member per workday.

CA Agile Central looks at both the defects found in production as well as those found in test and other areas as indicated by the Environment field in CA Agile Central. We sense whether defects are typically being recorded in CA Agile Central for each of these types for each team over a time period and only use it if it passes this test. We will take either as the quality score or the average of the two if both are reliably recorded.

#### Productivity score from throughput and team size

Throughput is the count of user stories, defects, and features completed in a given time period. The productivity score is the percentile scoring of this throughput normalized by the team size. While defects and features are shown in the drill-down charts, currently only user stories contribute to the productivity score of built-in scorecards.

#### Predictability score from throughput variability

Throughput variability is the standard deviation of throughput for a given team over three monthly periods divided by the average of the throughput for those same three months. This is referred to as the Coefficient of Variation (CoV) of throughput. Only user stories are considered for this predictability score.

## Decision versus outcome measurements

The measurements below are generally targeted at characterizing a decision or an outcome. An organization either decides to split people across many projects or they dedicate them to one. The Percent Dedicated Work measurement extracts this decision. Defect density is an example of an outcome measurement.

Although not strictly accurate, they can be thought of as input and output variables in a correlation analysis.

## Scores

Raw outcome measures are translated into a score so they can be easily interpreted as indicators of performance. Measures closer to 100 are good, measures closer to 0 are bad. The raw measure and the score are both available for analysis.

## Timebox granularity

Unless otherwise specified, each metric specified below is calculated for each of the following timeboxes:

- Month
- Quarter (Calendar)
- 3-month (sliding)
- 6-month (sliding)
- 12-month (sliding)
- Iteration (coming soon)

The sliding window measurements are useful when trying to identify a correlation where the impact of a decision measurement for a given month might correlate with the outcome measurement over the course of several following months. For instance, field-reported defects will come in over time. So, logically, we would expect a change in this measurement to be evident for several months after the impacting decision. The empirical evidence supports this trailing effect because bad-decision metrics (non-dedicatedness) correlate best with the six-month, trailing-defect density metric.

## Snapshots and the temporal data model

We do not directly measure things like Percent Dedicated Work. It and the other measurements specified in this document are built from snapshots of changes representing transactions of users working with artifacts in their project management, source code management, build, or bug tracking systems. A detailed discussion of this data model including its data structures, constraints, and operations can be found here. Many of the details of calculating these metrics cannot be understood without at least a basic understanding of this underlying snapshot data structure and temporal data model.

## Real teams

In addition to being associated with a timebox, every measurement in the data set is also associated with a team. Our data set does not have a strict definition of a team. Rather, it includes the concept of a team or project hierarchy, where higher-level entries might represent divisions or teams of teams and lower-level entries represent the team itself. It is also common for a team to break their work down into project streams. This is a typical team or project tree:- Division ABC
- Meta-team I
- Team A
- Team A – project 1
- Team A – project 2
- Team B
- Meta-team II
- Team C
- Team D
- Division XYZ
- ...

Since the data is non-attributable and large (25,000 projects), we have no way of determining which entries in this tree represents a real team. Instead, we heuristically extract this using a Bayesian classifier. The features that the classifier keys off of include:

- The number of levels from the leaf nodes of the current branch of the project tree. Real teams tend to be at the leaf nodes, which is 0, or one level up, which is 1.
- The number of work items in progress in the node.
- The full-time equivalent value for the node. Real teams tend to have between 5 – 8 members, and outside of this range, the probability of being a real team decreases.

## Measurements

### Percent Dedicated Work

This measurement indicates how much of the work for a given team is done by people dedicated to that team.

**Type: Decision**

**Formula **

- Find all transactions (snapshots) for stories, defects, and tasks that are in progress, have no children, have an owner (user), and are not blocked.
- Sum all transactions by user
*U*, project_{total }*P*, and user contribution to a project_{total }*U*where_{project}*U*> 5 (for example, users with a total trans action count less than or equal to five are not counted towards_{total}*U*or_{project }*P*)._{total} - Find the percent of a user’s total work each project represents:
- Count as dedicated for a given project the users whose
*U**percent*is greater than 70% for that project. This threshold was determined by experimentation with a training set of data from teams with known, dedicated members. - For each project sum the dedicated user transactions:
*P*, for all dedicated members._{dedicated }= Σ U_{project } - Find the percent of dedicated work for each project:

Data cleaning

The transactions of any user with five or less transactions in a given timebox or project pair are ignored when calculating *U _{project} *or

*P*. This omits data from people that are not true team members (managers, administrators, and so on).

_{total}### Full-time equivalent

This measurement is an indicator of team size including contributions from part-time contributors to the team.

Type: Decision

Formula

- Find all transactions (snapshots) for stories, defects, and tasks that are in progress, have no children, have an owner (user), and are not blocked.
- Sum all transactions by user
*U*, project_{total}*P*, and user contribution to a project_{total }*U*where_{project}*U*> 5 (for example, users with a total transaction count less than or equal to five are not counted towards_{total }*U*or_{project}*P*). If the revision editor and owner are different people, the edit is associated to both people._{total} - Find the fraction of a user’s total work each project represents:
- Sum the full-time equivalent for each project:
*P*._{fte }= Σ U_{fte }

### Team stability

This is an indication of the team's stability. For example, given:

Month n:

George: 90% dedicated

Joe: 50%

Jen: 80%

Month n + 1:

George: 75% (-15% delta)

Jen: 100% (+20%)

Jeff: 25% (new) (+25%)

Joe: missing (-50%)

The TeamGrowth metric for the team would be .2 + .25 = .45 divided by the current team size (2) or 22.5%.

The TeamShrinkage metric for the team would be |-.15| + |-.5| = .65 divided by the old team size (2.2) or 29.54%.

The total volatility would be the sum of the two prior metrics or roughly 52% and Team Stability would be 100 - 52/2 = 74.

Type: Decision

Formula

- Find all transactions (snapshots) for stories, defects, and tasks that are in progress, have no children, have an owner (user), and are not blocked.
- Sum all transactions by user
*U*, project_{total}*P*, and user contribution to a project_{total}*U*where_{project}*U*> 5 (for example, users with a total transaction count less than or equal to 5 are not counted towards_{total}*U*or_{project}*P*)._{total} - Find the fraction of a user’s total work each project represents for all time periods:
- Sum the full-time equivalent for each project for all time periods:
*P*._{fte}= Σ U_{fte} - For each project and each pair of adjacent time periods (
*t*and*t*− 1 ) compute:

### Process type

This measurement is an indicator of what type of agile process a team is using.

Type: Decision

Formula

- Find all snapshots for stories whose ScheduleState >= In-Progress and have no children.
- Sum the total number of unique stories
*S*for each project in each time period._{total} - Sum the total number of unique stories that have a non-null field
*S*for each project in each time period where_{field}*field*is each of c_KanbanState, Iteration, TaskActualTotal, TaskRemainingTotal, TaskEstimateTotal, and PlanEstimate. - For each project in each time period, divide the sum for each field by the total number of unique stories and multiply by 100 to get the percent of stories with the field:
- After calculating the percent of stories with each field, the project is assigned a value for process type
*T*as specified in the following table:_{process}

T_{process} |
if... |
---|---|

Kanban, ScrumBan |
P≥ 90 _{iterations } |

Kanban, No Iterations |
P< 90 _{iterations} |

Iterative, Scrum, Full |
P≥ 90 ⋀ _{iterations }P≥ 50 ⋀ _{planEstimate} P≥ 50 _{taskEstimateTotal } |

Iterative, Scrum, Story points only |
P≥ 90 ⋀ _{iterations }P≥ 50 ⋀ _{planEstimate} P< 50 _{taskEstimateTotal } |

Iterative, Scrum, Tasks only |
P≥ 90 ⋀ _{iterations }P< 50 ⋀ _{planEstimate} P≥ 50 _{taskEstimateTotal } |

Iterative, Other |
P≥ 90 ⋀ _{iterations }P< 50 ⋀ _{planEstimate} P< 50 _{taskEstimateTotal } |

Other, Estimates |
P< 90 ⋀ _{iterations }( P ≥ 50 ⋁ _{planEstimate }P≥ 50) _{taskEstimateTotal } |

Other, No estimates |
P< 90 ⋀ _{iterations }P< 50 ⋀ _{planEstimate} P< 50 _{taskEstimateTotal } |

### Time in Process (TiP) and responsiveness score

Time in process (TiP) is a measure for an individual work item (story, defect, feature) indicating how much work-day time (excluding non-work hours, weekends, and holidays) it spent in process. For stories and defects, in process is defined by the ScheduleState field being either In-Progress or Completed (often means In-Test). For features, in process is when ActualStartDate is set and PercentDoneByStoryCount is less than 100%.

Although not calculated exactly the same, it is analogous to the common definition of cycle-time or lead-time. For a given project or timebox pair, an aggregation (median, or p50) of the TiP of the work items that completed during that timebox for that project is computed. The responsiveness score is based on the percentile of the median. Higher values will result in lower scores, and vice versa.

The median (p50) is used rather than the arithmetic mean as the aggregation because the distribution of TiP measurements for individual work items is far from normal and frequently includes outliers. Median deals well with the non-normal distribution and does not allow a single outlier to greatly impact the measurement like an arithmetic mean would. The data set also includes p75, p85, p95, p99 representing the 75th, 85th, 95th, and 99th percentile coverage levels for the set of completed work items but we currently only use the p50 (median) to calculate the score.

Type: Outcome

Variations:

Stories, defects, and features

Formula

- Find all stories, defects, and features that were In-Progress, then moved to Completed within the time frame under consideration.
- Stories and defects are considered completed when
*ScheduleState*≥*Accepted*. - Features are considered completed when
*P**ercentDoneByStoryCount*→ 100% .

- Stories and defects are considered completed when
- Calculate a TiP value for each of those stories, defects, and features.
- Story and defect TiP is the duration where
*In Progress*≤*ScheduleState*<*Accepted*. - Feature TiP is the duration between
*ActualStartDate*and when*PercentDoneByStoryCount*→ 100%. - The responsiveness score is the percentile rank of the p50 TiP value for stories.

### Defect density and quality score

Defect density is the count of defects over some normalizing size measurement. In our case we use the team's man-days (FTE * the number of working days in the period) as a proxy for size.

Type: Outcome

Variations:

All defects (Defect) or just defects found in production (ReleasedDefect)

Formula

- Count all defects
*D*and defects released to production_{all}*D*for each project._{released} - Calculate defect density
*E*for each project by:

where*P*is the project’s full-time equivalent and_{fte }*W*is the number of working days in the time period under consideration. - For each project, determine if either defects or released defects are being tracked by checking if the defect count is greater than zero for the year granularity that ends at the same time as the granularity under consideration. For example, if the granularity is a quarter ending on 2013-01-01, we check the full year ending on 2013-01-01 to see if the defect count for the year is non-zero.
- Compute defects per 1000 man days by:

*S*= 1000 ∙_{all}*E*_{all}

*S*= 1000 ∙_{released}*E*_{released} - For each project where defect data is tracked, compute the quality score. Defect density is scored based on percentiles. If a project has the highest-measured value for defect density, it is in the 99th percentile, therefore its score is 99 − 99 = 0. If a project has the lowest measured value for defect density, it is in the 0th percentile, therefore its score is 99 − 0 = 99 .

*Q*=_{all}*percentile*(*S*)_{all}

*Q*=_{released}*percentile*(*S*)_{released} - The total quality score is the quality score for all defects:
*Q*=_{total}*Q*. Projects not tracking defects will have no quality score._{all}

### Throughput and productivity score

Throughput is a measure of how much work is completed in a given time period. Within a single team, throughput can be compared over time. However, the size of a work item can vary greatly by context so it is difficult to compare this across teams. It can also be compared across teams when the size of a work item is controlled. For instance, some organizations will require that each story should be between 0.5 and 3 man days of work. We do not know this information, so when calculating the score we look at number of completed stories normalized by the team size (FTE). Throughput per team member is scored based on percentiles. Higher values result in higher scores, and vice versa.

Type: Outcome

Variations:

- Defects, stories, or features
- Counts or story points: The formula below describes the computation by counts of these items. However, we also compute throughput (or velocity) for stories and defects using the sum of the story points of all work items that make the appropriate transition. We do not yet have a good mechanism to identify which teams consistently use story points so the counts are the preferred variation at this time. The development of iteration-based measures is underway and includes research to explore better use of story points.

Formula

*T*as the sum of:

- The count of all stories and defects that transitioned forward into the accepted state minus the count of all stories that transitioned backwards out of the accepted state.
- The count of all features that transitioned forward to 100% complete by story count minus the count of all features that were 100% complete by story count but transitioned backward into < 100% complete by story count.
- Compute the throughput per team member by dividing throughput by full-time equivalent:

- Score
*T*based on its percentile. If a project has the highest measured value for_{fte }*T*, it is in the 99th percentile, and 99 is its score. If a project has the lowest measured value for_{fte}*T*, it is in the 0th percentile, and 0 is its score._{fte}

### Throughput variation and predictability score

Having a stable throughput can be as important as having a high throughput. The coefficient of variation of throughput across several time periods is calculated and translated into a score.

Formula

- For each project, compute throughput for each month
*T*as the count of all stories that transitioned forward into the accepted state minus the count of all stories that transitioned backwards out of the accepted state._{i} - For each group of three and six adjacent months
*T*, compute the: - Score
*CoV*based on its percentile. If a project has the highest-measured value for*CoV*, it is in the 99th percentile, therefore its score is 99 − 99 = 0. If a project has the lowest measured value for*CoV*, it is in the 0th percentile, therefore its score is 99 − 0 = 99.