Exercise 2: MC estimation of Reliability & Availability

1,000
120 5
Problem statement
A continuously monitored component has constant failure rate λ = 3×10−3 h−1 and repair rate μ = 25×10−3 h−1.
Mission time TM = 1000 h. Use Monte Carlo simulation to estimate:
 1. Instantaneous availability   A(t)
 2. Time-dependent reliability   R(t)
Step A — Create time axis & counters
# Common to both problems mission_time = 1000 time_step = 1 time_axis = np.arange(0, mission_time + time_step, time_step) # Reliability counter (from EX_2_2) counter_f = np.zeros(len(time_axis)) # Unavailability counter (from EX_2_1) counter_unavailability = np.zeros(len(time_axis))
Visualization of the empty time axis
Step B — Sampling failure and repair times
# Inverse-transform sampling from exponential distribution # Failure time: t_f = -ln(1 - r) / λ (r ~ U(0,1)) # Repair time: t_r = -ln(1 - r) / μ (r ~ U(0,1)) t = t - np.log(1 - np.random.rand()) / rate

Key difference: Reliability vs Availability

Reliability R(t)

Sample one failure time tf.
All bins from tf onward get +1.
No repair — once failed, stays failed.
R(t) = 1 − counter_f / N

Availability A(t)

Alternating failure/repair cycle.
Only downtime bins get +1.
Component is repaired and can fail again.
A(t) = 1 − counter_unavail / N

Reliability — one MC trial
Sample a single failure time:   tf = −ln(1 − r) / λ
From tf onward, the component is permanently failed (no repair).
for i in range(N): t = 0 t = t - np.log(1 - np.random.rand()) / lambda_ # Increment failure count for ALL time steps after failure counter_f[int(np.ceil(t)):] += 1
Sampled tf (hours)
Bins affected (+1)
E[tf] = 1/λ
Why all bins after tf?
Reliability asks: "Has the component survived up to time t without any failure?"
Once it fails at tf, the answer is NO for every t ≥ tf. There is no repair.
Availability — one MC trial
The component alternates between working and failed states.
Each failure time: tf = −ln(1−r)/λ  |  Each repair time: tr = −ln(1−r)/μ
Only the downtime intervals (from failure to repair) get +1 in the counter.
for i in range(N): t = 0; state = 0 while t < mission_time: if state == 0: # Working → sample failure t = t - np.log(1 - np.random.rand()) / failure_rate state = 1 lower_bound = np.searchsorted(time_axis, t) else: # Failed → sample repair t = t - np.log(1 - np.random.rand()) / repair_rate state = 0 upper_bound = min(np.searchsorted(time_axis, t)-1, len(time_axis)-1) counter_unavailability[lower_bound:upper_bound+1] += 1
Transitions
Total downtime (h)
Fraction down
Ass = μ/(λ+μ)
Why only downtime bins?
Availability asks: "Is the component working right now at time t?"
Unlike reliability, the component gets repaired and returns to service. Only the intervals when it is actually down count against availability.
MC accumulation over N trials
After running N trials, estimate:
R(t) = 1 − counter_f / N   and   A(t) = 1 − counter_unavailability / N

Compare against analytical expressions:
Rtrue(t) = exp(−λt)   |   Atrue(t) = μ/(λ+μ) + (λ/(λ+μ))·exp(−(λ+μ)t)
# Reliability (from EX_2_2_solution.py) Rel_MC = 1 - counter_f / N Rel_true = np.exp(-lambda_ * Time_axis) # Availability (from EX_2_1_solution.py) availability_mc = 1 - counter_unavailability / N availability_true = (repair_rate)/(failure_rate+repair_rate) + \ (failure_rate/(failure_rate+repair_rate)) * np.exp(-(failure_rate+repair_rate)*time_axis)
Reliability: MC vs Analytical
Max |R error|
R(500) MC
R(500) true
Availability: MC vs Analytical
Max |A error|
A(500) MC
Ass true