Surge Capacity in Labs

A good design will accommodate all known and predictable workload volumes and volatilities. This will include non-routine, event or issue-based workloads that, based on recent history, can reasonably be expected to occur at some point.   However, it will not normally include spare resources for exceptional or unprecedented spikes in workloads as this can involve carrying significant extra cost. Also, because the extra resources tend to get absorbed into the routine operations, it can actually reduce on-going productivity.   

A better approach therefore is to create a defined plan to rapidly increase short term capacity. This should be done ahead of time rather than trying to solve the problem from scratch when there is a sudden increase in Lab Workloads. A ‘Surge Capacity’ plan should identify incremental strategies to quickly increase the lab’s short-term capacity and be agreed in advance with all relevant stakeholders.

  • What is a surge?  - A surge is when the short-term Lab workload (i.e. the demand) is significantly above the normal capacity of the Lab
  • Surge Capacity is the ability of a lab to rapidly increase its short-term testing capacity above the normal capacity / levelled demand rate.

Typical causes of Surges in Demand

Event Based:

Surges in demand can occur due to exceptional events that generate additional testing workloads above the normal routine testing. Examples of this include

  • ‘Positive’ Micro Events (EM blooms / Sterility issues / Media Run failures etc.)
  • Manufacturing Equipment Issues and associated validations / re-validations
  • Adverse Events – Support for Investigations

Testing Backlogs:

By Definition a Backlog can only occur if the rate of testing falls below the incoming demand for a period of time. Equally, a backlog can only be cleared if the rate of testing exceeds the incoming demand for a period of time. Common reasons for testing backlogs include:

  • Lab Equipment failure or failure of the Test
  • Insufficient Analyst resources to test at the levelled demand rate. For example due to:
    • Vacation Season
    • Long term absence (your lab process design should include cover for normal short-term illness)
    • Exceptional short-term absence (e.g. flu season / your country wins the world cup)

Changes in Demand:

There can be significant increases in testing workloads due to unpredicted or uncommunicated increases in production volumes. The changes may be short term or temporary - for example if Production are building stock to cover a planned shutdown or longer term -for example if the sales demand for the product has increased. If the increase is permanent or long term, the levelled demand for the lab should be re-calculated and the process re-designed.  The lab resources should be permanently increased based on the new levelled demand. Surge capacity can be used to cover an interim period while this is happening or if the change to demand is short term.   

Constraints on Lab Capacity

The Capacity of Labs is normally constrained by

  • The amount of trained Analyst Resources available (measured in Analyst hours per period or FTE
  • The availability of Test Equipment (measured in System Hours per period)
  • The operating hours and days (e.g. 8/5, 24/7)

Generating additional capacity to deal with surges in workloads will normally require temporary increases in some or all of these constraints.

Creating Surge Capacity

At its simplest, you can only increase lab capacity by having more and/or larger runs. Of course, enabling larger or additional runs usually requires addressing some or all of the common lab constraints.     

Larger Runs will sometimes require additional analyst resources (the increase in the work content for each run may require an additional analyst to do some of the tasks).

More runs will typically require additional analyst resources plus extra equipment hours and/or increased hours of operation.

Increasing Testing Analyst Resources

Options for temporarily increasing the availability of analysts to support more or larger test runs include:

  • Temporary Suspension of non-testing activities e.g.

    • Project Work
    • ‘Non-Test Tasks’
  • Temp Transfer of Analysts from other Labs or sites
  • Annualised Hours contracts / increased hours from part time people
  • Simple overtime

Increasing Equipment Resources (i.e. available system hours)

In the short term, it is not usually feasible to source, buy, install, validate and / or qualify additional equipment. Therefore, the most viable option for temporarily increasing the availability of equipment is to temporarily extend the lab’s hours of operation. This can be done via

  • Overtime
  • Modified, Extended or Additional shifts
  • Weekend working

Note: If the equipment already runs unattended into the “off shift” (e.g. HPLC systems).  The additional shifts will need to cover the time after the run finishes.

Other Options

In some circumstances, it may be possible to transfer surge volumes to other entities. For example to:

  • To other labs on the site
  • To other sites within the company
  • To specialist contract testing labs

This would obviously require prior qualifications and approvals to be in place.

Defining a surge capacity Process / Plan:

Evaluate the risk of Surges in Workload in your lab related to each of the common causes and their potential impact on testing capacity

Test Backlogs - Equipment Failure:

For each key piece of equipment evaluate:

  • What is the likelihood of a significant failure in the next 12 months? (you can use historical data to inform your opinion)
  • How much redundancy do you already have? (i.e. compare the demand in system hours v the capacity in system hours)
  • How long will it take to repair (what is the service level agreement with your supplier)

You can then score the risk as follows:

Risk Score =    % risk of occurrence x likely downtime (in working days)

% of Demand in Equipment Redundancy

  • Example 1:  A 20% risk of failure in the next 12 months x 10 working days to fix / 40% existing redundancy = 5
  • Example 2: A 10% risk of failure x 10 working days   / 80% existing redundancy = .125

For equipment with higher scores, consider investing in additional redundancy and / or define a specific surge capacity strategy

Event Based:

For each potential cause of extraordinary workloads (e.g. Sterility positive, unplanned validation studies, major investigations etc.)

  • What is the likelihood of a significant event in the next 12 months? (you can use historical data to inform your opinion)
  • How much extraordinary work is an event likely to create (in additional test runs & associated effort or equipment time)

You can score the risk as follows

Risk Score = % risk of occurrence x additional work (in analyst testing days)

  • Example 1: a 20% risk of occurrence x 10 analyst days of extra testing = 2
  • Example 2: a 10% risk of occurrence x 10 analyst days of extra testing = 1

For events with higher scores, define a specific surge capacity strategy.

Our consultants can provide further information on the above and discuss any aspect of Real Lean Transformation, simply set-up a call today.