Load balancing (computing): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 09:38, 17 June 2025 edit Nyq (talk \| contribs) Extended confirmed users 5,846 edits m lc common nouns Tags: Visual edit Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Latest revision as of 12:30, 6 August 2025 edit undo Erel Segal (talk \| contribs) Extended confirmed users, IP block exemptions 14,576 edits →See also Tag: Visual edit: Switched
(9 intermediate revisions by 4 users not shown)
Line 50: For [[shared-memory]] computers, managing write conflicts greatly slows down the speed of individual execution of each computing unit. However, they can work perfectly well in parallel. Conversely, in the case of message exchange, each of the processors can work at full speed. On the other hand, when it comes to collective message exchange, all processors are forced to wait for the slowest processors to start the communication phase. In reality, few systems fall into exactly one of the categories. In general, the processors each have an internal memory to store the data needed for the next calculations and are organized in successive [[Computer cluster\|clusters]]. Often, these processing elements are then coordinated through [[distributed memory]] and [[message passing]]. Therefore, the load balancing algorithm should be uniquely adapted to a parallel architecture. Otherwise, there is a risk that the efficiency of parallel [[problem solving]] will be greatly reduced. ====Hierarchy==== Line 65: ===Fault tolerance=== Especially in large-scale [[computing cluster]]s, it is not tolerable to execute a [[parallel algorithm]] that cannot withstand the failure of one single component. Therefore, [[fault tolerant]] algorithms are being developed which can detect outages of processors and recover the computation.<ref>{{cite book \|last1=Punetha Sarmila \|first1=G. \|last2=Gnanambigai \|first2=N. \|last3=Dinadayalan \|first3=P. \|title=2015 2nd International Conference on Electronics and Communication Systems (ICECS) \|chapter=Survey on fault tolerant — Load balancing algorithmsin cloud computing \|date=2015 \|pages=1715–1720 \|doi=10.1109/ECS.2015.7124879 \|isbn=978-1-4799-7225-8 \|s2cid=30175022 }}</ref> ==Approaches== Line 95: ====Others==== Of course, there are other methods of assignment as well: * Less work: ~~Assign~~assign more tasks to the servers by performing less{{clarify\|date=November 2022}} (the method can also be weighted). * Hash: allocates queries according to a [[hash table]]. * Power of ~~Two~~two ~~Choices~~choices: pick two servers at random and choose the better of the two options.<ref>{{cite web \|title=NGINX and the "Power of Two Choices" Load-Balancing Algorithm \|url=https://www.nginx.com/blog/nginx-power-of-two-choices-load-balancing-algorithm/ \|website=nginx.com \|archive-url=https://web.archive.org/web/20191212194243/https://www.nginx.com/blog/nginx-power-of-two-choices-load-balancing-algorithm/ \|archive-date=2019-12-12 \|date=2018-11-12}}</ref><ref>{{cite web \|title=Test Driving "Power of Two Random Choices" Load Balancing \|url=https://www.haproxy.com/blog/power-of-two-load-balancing/ \|website=haproxy.com \|archive-url=https://web.archive.org/web/20190215173140/https://www.haproxy.com/blog/power-of-two-load-balancing/ \|archive-date=2019-02-15 \|date=2019-02-15}}</ref> ===Master-~~Worker~~worker ~~Scheme~~scheme=== [[Master/slave (technology)\|Master-~~Worker~~worker]] schemes are among the simplest dynamic load balancing algorithms. A master distributes the workload to all workers (also sometimes referred to as "slaves"). Initially, all workers are idle and report this to the master. The master answers worker requests and distributes the tasks to them. When he has no more tasks to give, he informs the workers so that they stop asking for tasks. The advantage of this system is that it distributes the burden very fairly. In fact, if one does not take into account the time needed for the assignment, the execution time would be comparable to the prefix sum seen above. The problem with this algorithm is that it has difficulty adapting to a large number of processors because of the high amount of necessary communications. This lack of [[scalability]] makes it quickly inoperable in very large servers or very large parallel computers. The master acts as a [[Bottleneck (software)\|bottleneck]]. [[File:Master-Worker_and_bottleneck.png\|thumb\|Master-~~Worker~~worker and bottleneck]] However, the quality of the algorithm can be greatly improved by replacing the master with a task list that can be used by different processors. Although this algorithm is a little more difficult to implement, it promises much better scalability, although still insufficient for very large computing centers. Line 145: :www.example.org NS two.example.org However, the [[zone file]] for {{mono\|www.example.org}} on each server is different such that each server resolves its own IP Address as the A-record.<ref>{{Cite web\|url=https://www.zytrax.com/books/dns/ch8/a.html\|title=Chapter 8 - IPv4 Address (A) Record\|website=www.zytrax.com}}</ref> On server ''one'' the zone file for {{mono\|www.example.org}} reports: :@ in a 192.0.2.1 Line 191: ; Priority activation : When the number of available servers drops below a certain number, or the load gets too high, standby servers can be brought online. ; [[TLS acceleration\|TLS ~~Offload~~offload and ~~Acceleration~~acceleration]] : TLS (or its predecessor SSL) acceleration is a technique of offloading cryptographic protocol calculations onto specialized hardware. Depending on the workload, processing the encryption and authentication requirements of a [[Transport Layer Security\|TLS]] request can become a major part of the demand on the Web Server's CPU; as the demand increases, users will see slower response times, as the TLS overhead is distributed among Web servers. To remove this demand on Web servers, a balancer can terminate TLS connections, passing HTTPS requests as HTTP requests to the Web servers. If the balancer itself is not overloaded, this does not noticeably degrade the performance perceived by end-users. The downside of this approach is that all of the TLS processing is concentrated on a single device (the balancer) which can become a new bottleneck. Some load balancer appliances include specialized hardware to process TLS. Instead of upgrading the load balancer, which is quite expensive dedicated hardware, it may be cheaper to forgo TLS offload and add a few web servers. Also, some server vendors such as Oracle/Sun now incorporate cryptographic acceleration hardware into their CPUs such as the T2000. F5 Networks incorporates a dedicated TLS acceleration hardware card in their local traffic manager (LTM) which is used for encrypting and decrypting TLS traffic. One clear benefit to TLS offloading in the balancer is that it enables it to do balancing or content switching based on data in the HTTPS request. ; [[Distributed denial of service~~\|Distributed Denial of Service~~]] (DDoS) attack protection : Load balancers can provide features such as [[SYN cookies]] and delayed-binding (the back-end servers don't see the client until it finishes its TCP handshake) to mitigate [[SYN flood]] attacks and generally offload work from the servers to a more efficient platform. ; [[HTTP compression]] : HTTP compression reduces the amount of data to be transferred for HTTP objects by utilising gzip compression available in all modern web browsers. The larger the response and the further away the client is, the more this feature can improve response times. The trade-off is that this feature puts additional CPU demand on the load balancer and could be done by web servers instead. ; [[TCP offload]] : Different vendors use different terms for this, but the idea is that normally each HTTP request from each client is a different TCP connection. This feature utilises HTTP/1.1 to consolidate multiple HTTP requests from multiple clients into a single TCP socket to the back-end servers. ; TCP buffering : The load balancer can buffer responses from the server and spoon-feed the data out to slow clients, allowing the webserver to free a thread for other tasks faster than it would if it had to send the entire request to the client directly. ; Direct ~~Server~~server ~~Return~~return : An option for asymmetrical load distribution, where request and reply have different network paths. ; Health checking Line 284: Load balancing is often used to implement [[failover]]—the continuation of service after the failure of one or more of its components. The components are monitored continually (e.g., web servers may be monitored by fetching known pages), and when one becomes unresponsive, the load balancer is informed and no longer sends traffic to it. When a component comes back online, the load balancer starts rerouting traffic to it. For this to work, there must be at least one component in excess of the service's capacity ([[N+1 redundancy]]). This can be much less expensive and more flexible than failover approaches where every single live component is paired with a single backup component that takes over in the event of a failure ([[dual modular redundancy]]). Some [[RAID]] systems can also utilize [[hot spare]] for a similar effect.<ref name="IBM">{{cite web \|url=https://www.ibm.com/support/knowledgecenter/en/SSVJJU_6.4.0/com.ibm.IBMDS.doc_6.4/ds_ag_srv_adm_dd_failover_load_balancing.html \|title=Failover and load balancing \|website=IBM \|accessdate=6 January 2019}}</ref> This technique can increase [[fault tolerance]] by enabling quick substitutions for the most complicated, most failure-prone parts of a system. However, it can make the load balancer itself a [[single point of failure]]. === Data Ingestion for AI Model Training ===▼ ▲=== Data ~~Ingestion~~ingestion for AI ~~Model~~model ~~Training~~training === Increasingly, load balancing techniques are being used to manage high-volume data ingestion pipelines that feed [[artificial intelligence]] [[AI training\|training]] and [[inference]] systems—sometimes referred to as “[[AI Factory\|AI factories]].” These AI-driven environments require continuous processing of vast amounts of structured and unstructured data, placing heavy demands on networking, storage, and computational resources.<ref>{{Cite web \|title=Optimize Traffic Management for AI Factory Data Ingest \|url=https://www.f5.com/company/blog/ai-factory-traffic-management-data-ingest \|access-date=2025-01-30 \|website=F5, Inc. \|language=en-US}}</ref> To maintain the necessary high throughput and low latency, organizations commonly deploy load balancing tools capable of advanced TCP optimizations, connection pooling, and adaptive scheduling. Such features help distribute incoming data requests evenly across servers or nodes, prevent congestion, and ensure that compute resources remain efficiently utilized.<ref>{{Cite web \|title=Optimize, Scale, and Secure AI Interactions \|url=https://www.f5.com/solutions/use-cases/optimize-scale-and-secure-ai \|access-date=2025-01-30 \|website=F5, Inc. \|language=en-US}}</ref> Line 291 ⟶ 293: ==See also== ~~{{Div col\|colwidth=25em}}~~ * [[Affinity mask]] * [[Application ~~Delivery~~delivery ~~Controller~~controller]] * [[Autoscaling]] * [[Cloud computing]] Line 300 ⟶ 301: * [[Edge computing]] * [[InterPlanetary File System]] * [[Network ~~Load~~load ~~Balancing~~balancing]] * [[Optimal job scheduling]] - the computational problem of finding an optimally-balanced schedule. * [[SRV record]] ~~{{div col end}}~~ ==References==