Content deleted Content added
Citation bot (talk | contribs) Alter: title, template type. Add: chapter. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox2 | #UCB_webform_linked 871/2384 |
Erel Segal (talk | contribs) |
||
(43 intermediate revisions by 28 users not shown) | |||
Line 1:
{{short description|Set of techniques to improve the distribution of workloads across multiple computing resources}}
[[File:Elasticsearch Cluster August 2014.png|thumb
In [[computing]], '''load balancing''' is the process of distributing a set of [[Task (computing)|tasks]] over a set of [[System resource|resources]] (computing units), with the aim of making their overall processing more efficient. Load balancing can optimize
Load balancing is the subject of research in the field of [[parallel computers]]. Two main approaches exist: static algorithms, which do not take into account the state of the different machines, and dynamic algorithms, which are usually more general and more efficient but require exchanges of information between the different computing units, at the risk of a loss of efficiency.
Line 13:
====Size of tasks====
Perfect knowledge of the [[execution time]] of each of the tasks allows to reach an optimal load distribution (see algorithm of [[prefix sum]]).<ref name="Sequential and parallel algorithms">{{cite book |last1=Sanders |first1=Peter |last2=Mehlhorn |first2=Kurt |last3=Dietzfelbinger |first3=Martin |last4=Dementiev |first4=Roman |title=Sequential and parallel algorithms and data structures : the basic toolbox |date=11 September 2019 |publisher=Springer |isbn=978-3-030-25208-3}}</ref> Unfortunately, this is in fact an idealized case. Knowing the exact [[execution time]] of each task is an extremely rare situation.
For this reason, there are several techniques to get an idea of the different execution times. First of all, in the fortunate scenario of having tasks of relatively homogeneous size, it is possible to consider that each of them will require approximately the average execution time. If, on the other hand, the execution time is very irregular, more sophisticated techniques must be used. One technique is to add some [[metadata]] to each task. Depending on the previous execution time for similar metadata, it is possible to make inferences for a future task based on statistics.<ref>{{cite journal |last1=Liu |first1=Qi |last2=Cai |first2=Weidong |last3=Jin |first3=Dandan |last4=Shen |first4=Jian |last5=Fu |first5=Zhangjie |last6=Liu |first6=Xiaodong |last7=Linge |first7=Nigel |title=Estimation Accuracy on Execution Time of Run-Time Tasks in a Heterogeneous Distributed Environment |journal=Sensors |date=30 August 2016 |volume=16 |issue=9 |pages=1386 |doi=10.3390/s16091386|pmid=27589753 |pmc=5038664 |bibcode=2016Senso..16.1386L |s2cid=391429 |doi-access=free }}</ref>
Line 23:
====Segregation of tasks====
Another feature of the tasks critical for the design of a load balancing algorithm is their ability to be broken down into subtasks during execution. The
===Static and dynamic algorithms===
Line 30:
A load balancing algorithm is "static" when it does not take into account the state of the system for the distribution of tasks. Thereby, the system state includes measures such as the [[Load (computing)|load level]] (and sometimes even overload) of certain processors. Instead, assumptions about the overall system are made beforehand, such as the arrival times and resource requirements of incoming tasks. In addition, the number of processors, their respective power and communication speeds are known. Therefore, static load balancing aims to associate a known set of tasks with the available processors in order to minimize a certain performance function. The trick lies in the concept of this performance function.
Static load balancing techniques are commonly centralized around a router, or [[Master/slave (technology)|
The advantage of static algorithms is that they are easy to set up and extremely efficient in the case of fairly regular tasks (such as processing [[HTTP]] requests from a website). However, there is still some statistical variance in the assignment of tasks which can lead to the overloading of some computing units.
====Dynamic====
Unlike static load distribution algorithms, dynamic algorithms take into account the current load of each of the computing units (also called nodes) in the system. In this approach, tasks can be moved dynamically from an overloaded node to an underloaded node in order to receive faster processing. While these algorithms are much more complicated to design, they can produce excellent results, in particular, when the execution time varies greatly from one task to another.
Dynamic load balancing architecture can be more [[Modular design|modular]] since it is not mandatory to have a specific node dedicated to the distribution of work. When tasks are uniquely assigned to a processor according to their state at a given moment, it is a unique assignment. If, on the other hand, the tasks can be permanently redistributed according to the state of the system and its evolution, this is called dynamic assignment.<ref>{{cite journal |last1=Alakeel |first1=Ali |title=A Guide to Dynamic Load Balancing in Distributed Computer Systems |journal=International Journal of Computer Science and Network Security
===Hardware architecture===
====
[[Parallel computing]] infrastructures are often composed of units of different [[computing power]], which should be taken into account for the load distribution.
For example, lower-powered units may receive requests that require a smaller amount of computation, or, in the case of homogeneous or unknown request sizes, receive fewer requests than larger units.
Line 50:
For [[shared-memory]] computers, managing write conflicts greatly slows down the speed of individual execution of each computing unit. However, they can work perfectly well in parallel. Conversely, in the case of message exchange, each of the processors can work at full speed. On the other hand, when it comes to collective message exchange, all processors are forced to wait for the slowest processors to start the communication phase.
In reality, few systems fall into exactly one of the categories. In general, the processors each have an internal memory to store the data needed for the next calculations and are organized in successive [[Computer cluster|clusters]]. Often, these processing elements are then coordinated through [[distributed memory]] and [[message passing]]. Therefore, the load balancing algorithm should be uniquely adapted to a parallel architecture. Otherwise, there is a risk that the efficiency of parallel [[problem solving]] will be greatly reduced.
====Hierarchy====
Adapting to the hardware structures seen above, there are two main categories of load balancing algorithms. On the one hand, the one where tasks are assigned by “master” and executed by “workers” who keep the master informed of the progress of their work, and the master can then take charge of assigning or reassigning the workload in case of the dynamic algorithm. The literature refers to this as [[Master/slave (technology)|
Since the design of each load balancing algorithm is unique, the previous distinction must be qualified. Thus, it is also possible to have an intermediate strategy, with, for example, "master" nodes for each sub-cluster, which are themselves subject to a global "master". There are also multi-level organizations, with an alternation between master-slave and distributed control strategies. The latter strategies quickly become complex and are rarely encountered. Designers prefer algorithms that are easier to control.
Line 65:
===Fault tolerance===
Especially in large-scale [[computing cluster]]s, it is not tolerable to execute a [[parallel algorithm]] that cannot withstand the failure of one single component. Therefore, [[fault tolerant]] algorithms are being developed which can detect outages of processors and recover the computation.<ref>{{cite book |last1=Punetha Sarmila |first1=G. |last2=Gnanambigai |first2=N. |last3=Dinadayalan |first3=P. |title=2015 2nd International Conference on Electronics and Communication Systems (ICECS) |chapter=Survey on fault tolerant — Load balancing algorithmsin cloud computing |date=2015 |pages=1715–1720 |doi=10.1109/ECS.2015.7124879 |isbn=978-1-4799-7225-8 |s2cid=30175022 }}</ref>
==Approaches==
Line 72:
If the tasks are independent of each other, and if their respective execution time and the tasks can be subdivided, there is a simple and optimal algorithm.
[[File:Load_Balancing_divisible_tasks.png|thumb
By dividing the tasks in such a way as to give the same amount of computation to each processor, all that remains to be done is to group the results together. Using a [[prefix sum]] algorithm, this division can be calculated in [[logarithmic time]] with respect to the number of processors.{{fact|date=October 2022}}▼
▲By dividing the tasks in such a way as to give the same amount of computation to each processor, all that remains to be done is to group the results together. Using a [[prefix sum]] algorithm, this division can be calculated in [[logarithmic time]] with respect to the number of processors.{{
▲[[File:Load_Balancing_divisible_tasks.png|thumb|370px|Load balancing algorithm depending on divisibility of tasks]]
If, however, the tasks cannot be subdivided (i.e., they are [[Linearizability|atomic]]), although optimizing task assignment is a difficult problem, it is still possible to approximate a relatively fair distribution of tasks, provided that the size of each of them is much smaller than the total computation performed by each of the nodes.<ref name="Sequential and parallel algorithms"/>
Line 95:
====Others====
Of course, there are other methods of assignment as well:
* Less work:
* Hash: allocates queries according to a [[hash table]].
* Power of
===Master-
[[Master/slave (technology)|Master-
The advantage of this system is that it distributes the burden very fairly. In fact, if one does not take into account the time needed for the assignment, the execution time would be comparable to the prefix sum seen above.
The problem with this algorithm is that it has difficulty adapting to a large number of processors because of the high amount of necessary communications. This lack of [[scalability]] makes it quickly inoperable in very large servers or very large parallel computers. The master acts as a [[Bottleneck (software)|bottleneck]].
[[File:Master-Worker_and_bottleneck.png|thumb|Master-
However, the quality of the algorithm can be greatly improved by replacing the master with a task list that can be used by different processors. Although this algorithm is a little more difficult to implement, it promises much better scalability, although still insufficient for very large computing centers.
Line 128:
===Internet-based services===
{{more citations needed|article's section|date=December 2010}}
One of the most commonly used applications of load balancing is to provide a single Internet service from multiple [[Server (computing)|server]]s, sometimes known as a [[server farm]]. Commonly load-balanced systems include popular [[web site]]s, large [[Internet Relay Chat]] networks, high-bandwidth [[File Transfer Protocol]] (FTP) sites, [[Network News Transfer Protocol]] (NNTP) servers, [[Domain Name System]] (DNS) servers, and databases.
Line 143 ⟶ 140:
Another more effective technique for load-balancing using DNS is to delegate {{mono|www.example.org}} as a sub-___domain whose zone is served by each of the same servers that are serving the website. This technique works particularly well where individual servers are spread geographically on the Internet. For example:
:two.example.org A 203.0.113.2
:www.example.org NS one.example.org
:www.example.org NS two.example.org
However, the [[zone file]] for {{mono|www.example.org}} on each server is different such that each server resolves its own IP Address as the A-record.<ref>
On server ''two'' the same zone file contains:
This way, when a server is down, its DNS will not respond and the web service does not receive any traffic. If the line to one server is congested, the unreliability of DNS ensures less HTTP traffic reaches that server. Furthermore, the quickest DNS response to the resolver is nearly always the one from the network's closest server, ensuring geo-sensitive load-balancing {{Citation needed|date=November 2014}}. A short [[Time to live|TTL]] on the A-record helps to ensure traffic is quickly diverted when a server goes down. Consideration must be given to the possibility that this technique may cause individual clients to switch between individual servers in mid-session.
====Client-side random load balancing====
Another approach to load balancing is to deliver a list of server IPs to the client, and then to have the client randomly select the IP from the list on each connection.<ref>
With this approach, the method of delivery of a list of IPs to the client can vary and may be implemented as a DNS list (delivered to all the clients without any round-robin), or via hardcoding it to the list. If a "smart client" is used, detecting that a randomly selected server is down and connecting randomly again, it also provides [[fault tolerance]].
Line 171 ⟶ 167:
=====Scheduling algorithms=====
Numerous [[scheduling algorithm]]s, also called load-balancing methods, are used by load balancers to determine which back-end server to send a request to.
Simple algorithms include random choice, [[Round-robin scheduling|round robin]], or least connections.<ref name=":0">{{Cite web|url=https://f5.com/resources/white-papers/load-balancing-101-nuts-and-bolts|archive-url=https://web.archive.org/web/20171205223948/https://f5.com/resources/white-papers/load-balancing-101-nuts-and-bolts|url-status=dead|archive-date=2017-12-05|title=Load Balancing 101: Nuts and Bolts|date=2017-12-05|publisher=[[F5,
=====Persistence=====
Line 195 ⟶ 191:
; Priority activation
: When the number of available servers drops below a certain number, or the load gets too high, standby servers can be brought online.
; [[TLS acceleration|TLS
: TLS (or its predecessor SSL) acceleration is a technique of offloading cryptographic protocol calculations onto specialized hardware. Depending on the workload, processing the encryption and authentication requirements of a [[Transport Layer Security|TLS]] request can become a major part of the demand on the Web Server's CPU; as the demand increases, users will see slower response times, as the TLS overhead is distributed among Web servers. To remove this demand on Web servers, a balancer can terminate TLS connections, passing HTTPS requests as HTTP requests to the Web servers. If the balancer itself is not overloaded, this does not noticeably degrade the performance perceived by end-users. The downside of this approach is that all of the TLS processing is concentrated on a single device (the balancer) which can become a new bottleneck. Some load balancer appliances include specialized hardware to process TLS. Instead of upgrading the load balancer, which is quite expensive dedicated hardware, it may be cheaper to forgo TLS offload and add a few web servers. Also, some server vendors such as Oracle/Sun now incorporate cryptographic acceleration hardware into their CPUs such as the T2000. F5 Networks incorporates a dedicated TLS acceleration hardware card in their local traffic manager (LTM) which is used for encrypting and decrypting TLS traffic. One clear benefit to TLS offloading in the balancer is that it enables it to do balancing or content switching based on data in the HTTPS request.
; [[Distributed denial of service
: Load balancers can provide features such as [[SYN cookies]] and delayed-binding (the back-end servers don't see the client until it finishes its TCP handshake) to mitigate [[SYN flood]] attacks and generally offload work from the servers to a more efficient platform.
; [[HTTP compression]]
: HTTP compression reduces the amount of data to be transferred for HTTP objects by utilising gzip compression available in all modern web browsers.
; [[TCP offload]]
: Different vendors use different terms for this, but the idea is that normally each HTTP request from each client is a different TCP connection. This feature utilises HTTP/1.1 to consolidate multiple HTTP requests from multiple clients into a single TCP socket to the back-end servers.
; TCP buffering
: The load balancer can buffer responses from the server and spoon-feed the data out to slow clients, allowing the webserver to free a thread for other tasks faster than it would if it had to send the entire request to the client directly.
; Direct
: An option for asymmetrical load distribution, where request and reply have different network paths.
; Health checking
Line 234 ⟶ 230:
====Shortest Path Bridging====
[[
The IEEE approved the [[IEEE 802.1aq]] standard in May 2012,<ref>{{cite web
Line 278 ⟶ 274:
Many telecommunications companies have multiple routes through their networks or to external networks. They use sophisticated load balancing to shift traffic from one path to another to avoid [[network congestion]] on any particular link, and sometimes to minimize the cost of transit across external networks or improve [[Reliability (computer networking)|network reliability]].
Another way of using load balancing is in [[network monitoring]] activities. Load balancers can be used to split huge data flows into several sub-flows and use several network analyzers, each reading a part of the original data. This is very useful for monitoring fast networks like [[10 Gigabit Ethernet|10GbE]] or STM64, where complex processing of the data may not be possible at [[wire speed]].<ref>
===Data center networks===
Load balancing is widely used in [[data center]] networks to distribute traffic across many existing paths between any two servers.<ref name=architecture>
Static load balancing distributes traffic by computing a hash of the source and destination addresses and port numbers of traffic flows and using it to determine how flows are assigned to one of the existing paths. Dynamic load balancing assigns traffic flows to paths by monitoring bandwidth use on different paths. Dynamic assignments can also be proactive or reactive. In the former case, the assignment is fixed once made, while in the latter the network logic keeps monitoring available paths and shifts flows across them as network utilization changes (with arrival of new flows or completion of existing ones). A comprehensive overview of load balancing in datacenter networks has been made available.<ref name=architecture/>
===Failovers===
Load balancing is often used to implement [[failover]]—the continuation of service after the failure of one or more of its components. The components are monitored continually (e.g., web servers may be monitored by fetching known pages), and when one becomes unresponsive, the load balancer is informed and no longer sends traffic to it. When a component comes back online, the load balancer starts rerouting traffic to it. For this to work, there must be at least one component in excess of the service's capacity ([[N+1 redundancy]]). This can be much less expensive and more flexible than failover approaches where every single live component is paired with a single backup component that takes over in the event of a failure ([[dual modular redundancy]]). Some [[RAID]] systems can also utilize [[hot spare]] for a similar effect.<ref name="IBM">
This technique can increase [[fault tolerance]] by enabling quick substitutions for the most complicated, most failure-prone parts of a system. However, it can make the load balancer itself a [[single point of failure]].
=== Data ingestion for AI model training ===
Increasingly, load balancing techniques are being used to manage high-volume data ingestion pipelines that feed [[artificial intelligence]] [[AI training|training]] and [[inference]] systems—sometimes referred to as “[[AI Factory|AI factories]].” These AI-driven environments require continuous processing of vast amounts of structured and unstructured data, placing heavy demands on networking, storage, and computational resources.<ref>{{Cite web |title=Optimize Traffic Management for AI Factory Data Ingest |url=https://www.f5.com/company/blog/ai-factory-traffic-management-data-ingest |access-date=2025-01-30 |website=F5, Inc. |language=en-US}}</ref> To maintain the necessary high throughput and low latency, organizations commonly deploy load balancing tools capable of advanced TCP optimizations, connection pooling, and adaptive scheduling. Such features help distribute incoming data requests evenly across servers or nodes, prevent congestion, and ensure that compute resources remain efficiently utilized.<ref>{{Cite web |title=Optimize, Scale, and Secure AI Interactions |url=https://www.f5.com/solutions/use-cases/optimize-scale-and-secure-ai |access-date=2025-01-30 |website=F5, Inc. |language=en-US}}</ref>
When deployed in large-scale or high-performance AI environments, load balancers also mitigate bandwidth constraints and accommodate varying data governance requirements—particularly when sensitive training data cannot be sent to third-party cloud services. By routing data locally (on-premises) or across private clouds, load balancers allow AI workflows to avoid public-cloud bandwidth limits, reduce transit costs, and maintain compliance with regulatory standards. As AI models expand in size (often measured by billions or even trillions of parameters), load balancing for data ingestion has grown in importance for maintaining the reliability, scalability, and cost efficiency of AI factories.
==See also==
* [[Affinity mask]]
* [[Application
* [[Autoscaling]]
* [[Cloud computing]]
Line 299 ⟶ 301:
* [[Edge computing]]
* [[InterPlanetary File System]]
* [[Network
* [[Optimal job scheduling]] - the computational problem of finding an optimally-balanced schedule.
* [[SRV record]]
==References==
Line 308 ⟶ 310:
==External links==
{{commons category|Load balancing (computing)}}
*
{{Authority control}}
|