What is it?
It is the probability that the service will function as intended for a given time.
How is it measured?
Reliability is measured using Mean Time Between Failures and Mean time to repair.
Mean Time Between Failures
This is the (total elapsed time - total downtime) / total number of failures.
You want your mean time between failures to be very high - that shows your system is reliable.
Mean time to repair
This is the (total time taken to repair a problem in the service) / total number of repairs.
We want to be able to repair and recover from a problem really quickly, hence we would like this value to be as low as possible.
The relationship between availability and reliability
By now you have understood that both Availability and Reliability are two important metrics to measure compliance of service to an agreed upon service level objectives.
While availability is driven by loss of service over time, reliability is based on the frequency and impact of failures. The combination allows any user/stakeholder assess the health of the service.
So the more reliable a system, the less likely that it will be unavailable. Thus the lower the reliability, the lower the availability. Similar higher the availability, higher the reliability.