Exercise 2: MC estimation of Reliability & Availability
1,000
1205
Problem statement
A continuously monitored component has constant failure rate λ = 3×10−3 h−1
and repair rate μ = 25×10−3 h−1.
Mission time TM = 1000 h. Use Monte Carlo simulation to estimate:
1. Instantaneous availability A(t)
2. Time-dependent reliability R(t)
Step A — Create time axis & counters
# Common to both problems
mission_time = 1000
time_step = 1time_axis = np.arange(0, mission_time + time_step, time_step)# Reliability counter (from EX_2_2)counter_f = np.zeros(len(time_axis))# Unavailability counter (from EX_2_1)counter_unavailability = np.zeros(len(time_axis))
counter_f: counts how many trials have already failed at each bin → used for reliability
counter_unavailability: counts how many trials are currently down at each bin → used for availability
Sample one failure time tf.
All bins from tf onward get +1. No repair — once failed, stays failed.
R(t) = 1 − counter_f / N
Availability A(t)
Alternating failure/repair cycle.
Only downtime bins get +1.
Component is repaired and can fail again.
A(t) = 1 − counter_unavail / N
Reliability — one MC trial
Sample a single failure time: tf = −ln(1 − r) / λ
From tf onward, the component is permanently failed (no repair).
for i in range(N):
t = 0t = t - np.log(1 - np.random.rand()) / lambda_# Increment failure count for ALL time steps after failurecounter_f[int(np.ceil(t)):] += 1
Green = component working | Red = component failed (forever)
Sampled tf (hours)
—
Bins affected (+1)
—
E[tf] = 1/λ
—
Why all bins after tf?
Reliability asks: "Has the component survived up to time t without any failure?"
Once it fails at tf, the answer is NO for every t ≥ tf. There is no repair.
Availability — one MC trial
The component alternates between working and failed states.
Each failure time: tf = −ln(1−r)/λ |
Each repair time: tr = −ln(1−r)/μ
Only the downtime intervals (from failure to repair) get +1 in the counter.
for i in range(N):
t = 0; state = 0while t < mission_time:
if state == 0: # Working → sample failuret = t - np.log(1 - np.random.rand()) / failure_rate
state = 1
lower_bound = np.searchsorted(time_axis, t)
else: # Failed → sample repairt = t - np.log(1 - np.random.rand()) / repair_rate
state = 0
upper_bound = min(np.searchsorted(time_axis, t)-1, len(time_axis)-1)
counter_unavailability[lower_bound:upper_bound+1] += 1
Green = UP (working) | Red = DOWN (failed, being repaired)
Top row: continuous sampled times tf, tr → Bottom row: discrete bin indices lower_bound, upper_bound
Transitions
—
Total downtime (h)
—
Fraction down
—
Ass = μ/(λ+μ)
—
Why only downtime bins?
Availability asks: "Is the component working right now at time t?"
Unlike reliability, the component gets repaired and returns to service. Only the intervals
when it is actually down count against availability.
MC accumulation over N trials
After running N trials, estimate: R(t) = 1 − counter_f / N and
A(t) = 1 − counter_unavailability / N