What is Latency?

What is Latency? KB ID 0001874

What is Latency?

I hear people use the word ‘Latency‘ a lot, mostly without ever really understanding what it is, unlike its close relations bandwidth and thoughput* which are measurments of data, latency is a measurment of TIME, and in a lot scenarios is variable depending on what’s happening.

*Note: Too low bandwidth and thoughput can increase latency.

There will always be latency, becasue we are bound by the laws of physics, to pass a ‘light pulse’ down a fibre optic cable from London to Paris, will take less time than it will to pass that same lightpulse from London to New York. We call this propogation delay.

  1. Propagation Delay: This is the time it takes for a signal to travel from the sender to the receiver through the physical medium (such as fiber optics or copper cables). The speed of propagation is close to the speed of light but can vary slightly depending on the medium.
  2. Transmission Delay: This is the time required to push all the packet’s bits onto the wire. It is influenced by the size of the packet and the transmission rate of the network.
  3. Processing Delay: This is the time taken by network devices like routers and switches to process the packet header and make forwarding decisions. Processing delays are generally very small but can add up across multiple devices.
  4. Queuing Delay: This occurs when a packet waits in a queue before it can be transmitted. Queuing delays can vary significantly depending on the network congestion and the configuration of the network devices.
  5. Propagation Distance: The physical distance between the source and destination plays a critical role in latency. Longer distances naturally result in higher latency due to the increased time it takes for signals to travel.
  6. Network Congestion: High traffic volumes can cause congestion in the network, leading to increased queuing delays and, consequently, higher overall latency.
  7. Bandwidth and Throughput: Although bandwidth is the maximum rate of data transfer, actual throughput can be lower due to various factors, including network congestion and overheads. Lower throughput can contribute to higher latency.
  8. Protocol Overheads: Different network protocols have various overheads associated with them. For instance, the Transmission Control Protocol (TCP) has higher overhead due to its error-checking and recovery features compared to the User Datagram Protocol (UDP).
  9. Hardware and Software Limitations: The performance of network hardware (like routers, switches, and network interface cards) and software (such as drivers and network stacks) can impact latency. Faster and more efficient hardware and software reduce latency.

Latency is typically measured in milliseconds (ms) and can be assessed using various tools and techniques, such as ping tests and traceroute commands. Lower latency is especially crucial for applications requiring real-time interaction, such as online gaming, video conferencing, and financial trading systems.

Minimizing network latency involves optimizing network infrastructure, improving hardware and software efficiency, and ensuring adequate bandwidth and throughput to handle the expected traffic load.

What is Latency and Why is this Important?

Well the complaint is nearly always “We are experiencing latency issues“, usually when the ‘users’ are having performance issues with ‘something’. Now sometimes the problem IS the network (shock & horror). But all the bandwidth/Thoughput and Low latency in the worlds will not help you if you have a poorley coded application, or your DNS is not seup correctly.

But it’s not just old and poorley coded applications that require low latency Some application platforms we take for granted can suffer for example.

  1. Online Gaming: Real-time multiplayer online games require low latency to ensure smooth gameplay and quick reactions. High latency can result in lag, making the gaming experience frustrating and uncompetitive.
  2. Video Conferencing: Applications like Zoom, Microsoft Teams, and Skype require low latency to facilitate real-time communication. High latency can cause delays, leading to awkward conversations and reduced communication quality.
  3. Voice over IP (VoIP): Services like Skype, WhatsApp, and other internet-based telephony services need low latency to provide clear and immediate voice communication. High latency can cause echo and delays, making conversations difficult.
  4. Financial Trading: Stock trading platforms and high-frequency trading systems rely on low latency to execute trades in milliseconds. Even minor delays can result in significant financial losses or missed trading opportunities.
  5. Telemedicine: Remote medical consultations, surgeries, and other healthcare services often require low latency to ensure accurate diagnostics and timely intervention.
  6. Augmented Reality (AR) and Virtual Reality (VR): AR and VR applications need low latency to provide immersive and responsive experiences. High latency can cause motion sickness and degrade the user experience.
  7. Industrial Automation and Control Systems: Manufacturing processes, robotics, and other industrial applications require low latency for precise control and real-time monitoring to ensure safety and efficiency.
  8. Autonomous Vehicles: Self-driving cars and drones rely on low latency for real-time data processing and decision-making to navigate safely and respond to dynamic environments.
  9. Cloud Gaming: Services like Google Stadia, NVIDIA GeForce Now, and Xbox Cloud Gaming stream games from the cloud to users’ devices. Low latency is critical to provide a responsive gaming experience comparable to playing on a local console or PC.
  10. Smart Grids: Advanced electrical grid systems require low latency for real-time monitoring and control to manage power distribution efficiently and respond to fluctuations in demand and supply.
  11. Remote Desktop Applications: Tools like Remote Desktop Protocol (RDP) and Virtual Network Computing (VNC) require low latency to provide a seamless and responsive experience when accessing and controlling a remote computer.
  12. Live Streaming: Interactive live streaming platforms like Twitch and YouTube Live require low latency to ensure minimal delay between the broadcaster and viewers, enabling real-time interaction through chat and other features.

Ensuring low latency for these applications often involves optimizing network infrastructure, using efficient communication protocols, and sometimes deploying edge computing to process data closer to the source.

Related Articles, References, Credits, or External Links

NA

 

What are IOPS?

What are IOPS KB ID 0001833

My IOPS History

I was on a call this morning where the IOPS (Input / Output Operations Per Second) were being discussed. I have a love / hate relationship with IOPS insofar as they are ONLY any use when you are comparing apples with apples, and more importanly (which is the bit we don’t talk about) that we have defined what an apple is. Because one mans Golden Delicious is another mans Bramley cooking apple, (that was deep eh?).

A few years ago when I was back on the tools I was installing a storage system for a client (it was a virtual storage array) and we had benchmarked it with some software at 95 thousand IOPS. the vendor that supplied the storage pulled the support for it, so we were left red faced trying to source an alternative. Everything we installed came out with a figure of less than 95 thousand IOPS – As far as the customer was concerned we had promised him one thing and delivered another.

So What Are IOPS?

Let’s say you want to buy a car, these days with environmental concerns and the cost of fuel, one of the things you might want to compare are the ‘Miles per Gallon‘ fuel consumption. Let’s say one of your choices has an MPG figure of 96Mpg (154.5Kpg). Well that’s dandy, but I guaranty that figure was tested in an environment that gave the manufacturer the best possible outcome, so unless you are going to drive at 56 miles an hour constantly, with the highest rated fuel, on a rolling road and never stop or brake, then ACTUAL RESULTS MAY VARY. And who is to say car vendor A used the same tests as car vendor B. And there’s also THREE DIFFERENT SIZES for a gallon, and for countries that don’t use gallons they will convert from litres which can’t be done to less than a lot of decimal places.

IOPS suffers from the similar problems e.g. Storage Vendor ‘A’ will say, “we deliver 1.2 million IOPS”, and Vendor B will say “we deliver 1.8 million IOPS” – so Vendor B is the better option, right? Well NO, that’s why you need to know how the figures are derived.

The figure that gets derived relies heavily on the following factors.

  • Block size / sector size of the storage.
  • Resiliency/RAID level of the storage.
  • Actual physical storage media (e.g. Spinning disk/nearline/midline/SSD).
  • Actual physical connection fabric (e.g. SAS/Fiber/iSCSI).
  • Size of data written.
  • Size of data read.
  • Sequential or random read/writes, or a blend of the two.
  • Concurrent Workload (Testing an array with no load, is like driving an F1 race car on a closed motorway).
  • Storage QoS If you’re in a ‘shared’ storage environment your IOPS may be ‘capped’.

What are IOPS: Throughput and Latency

THROUGHPUT is normally used in conjunction with IOPS, throughput is a figure measured in bps bits per second, or Bps bytes per second. So If we know this figure AND we have an IOPS figure (that we know how it was derived.) Then we can make a comparison? Well no, there’s a third thing we didn’t take into consideration – LATENCY, this is the amount of time it takes to get an operation to and from the storage array. Why is that important? Let’s say we have an ‘All SSD’ array with blistering throughput and IOPS figures, but your 10+ year old Solaris 7 servers cannot match that through their 5+ year old HBAs then your ‘experience’ is going to be bad. OK that’s a severe example. But put that in a real world scenario, I work for a service provider, we provide storage, If we say we will warrant X thousand IOPS and a customer that just consumes storage from us connects their Solaris 7 servers to that storage and says, “we are only getting half that performance”. Whose responsibility is it to investigate why? This is why if you look at the large hyperscalers, when they give you performance info, they will give you IOPS (without telling you what those IOPS are!) and they will give you throughput (That they will cap, usually at xMbps). Because latency is not really their problem  – search their documentation and they deliberately only use the word latency to say things like ‘Ultra low latency SSD” or that “SSD provides lower latency than HDD“.

So Why the Ben Affleck Meme?

Because of the three things you need to take into consideration when looking at storage performance, (and remember this is storage performance, not application performance, because a poorly coded DB application from 1987 can be on the best hardware in the world and still be awful – and your DB consultant will blame the storage or the network because he can earn several hundred pounds a day while you bust a gut proving otherwise). Because it’s a figure that without any definition, means nothing.

I do like an analogy, (as you’ve seen). What are IOPS? IOPS are the digital equivalent of giving 50 teenage boys some ribbon and a sharpie, and telling them to all to make a tape measure and find out who is the best endowed, then deciding (without seeing the tape measures), based on who came up with the biggest number.

Related Articles, References, Credits, or External Links

NA

SBS – Alert – ‘The following disk has low idle time’

KB ID 0000583 

Problem

I got this alert forwarded to me, from a client that I’d put in new hard drives for a few week ago.

Alert:

The following disk has low idle time, which may cause slow response time when reading or writing files to the disk. Disk: {Number} {Drive Letter}: Review the Disk Transfers/sec and % Idle Time counters for the PhysicalDisk performance object. If the Disk Transfers/sec counter is consistently below 150 while the % Idle Time counter remains very low (close to 0), there may be a problem with the disk driver or hardware. If the review shows that the disk is functioning properly, use Task Manager to determine which processes are causing the majority of the disk activity. You can attempt to correct the problem by stopping and then restarting those processes. You can disable this alert or change its threshold by using the Change Alert Notifications task in the Server Management Monitoring and Reporting tasked.

Solution

1. It’s telling me review some counters (Start > Run >Perfmon {Enter}). I added in the counters that it asked me to, and sure enough this disk was getting thrashed with a very high disk latency.

2. While discussing it in the office, a colleague suggested I check the BBWC on the RAID card. Sure enough a quick look at the System Management Homepage shows;

4. The battery has failed on the internal E200i RAID card. The server in question was an HP ML350 (G5). So my first thought was to update the firmware for the RAID card, (If for no other reason than it’s the first thing HP would ask me to do, if I logged a call). This did not resolve the problem, so I logged the call for a replacement (The server is under care pack).

5. After fitting, I left it 24 hours for the battery to charge, and checked it again.

Note: Latency has dropped from 1100 to 70).

Related Articles, References, Credits, or External Links

NA