Load balancing (computing): Difference between revisions

Content deleted Content added
m Reverted edits by 14.194.36.210 (talk) (HG) (3.4.12)
 
(32 intermediate revisions by 20 users not shown)
Line 2:
[[File:Elasticsearch Cluster August 2014.png|thumb|Diagram illustrating user requests to an [[Elasticsearch]] cluster being distributed by a load balancer. (Example for [[Wikipedia]].)]]
 
In [[computing]], '''load balancing''' is the process of distributing a set of [[Task (computing)|tasks]] over a set of [[System resource|resources]] (computing units), with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.
 
Load balancing is the subject of research in the field of [[parallel computers]]. Two main approaches exist: static algorithms, which do not take into account the state of the different machines, and dynamic algorithms, which are usually more general and more efficient but require exchanges of information between the different computing units, at the risk of a loss of efficiency.
Line 23:
 
====Segregation of tasks====
Another feature of the tasks critical for the design of a load balancing algorithm is their ability to be broken down into subtasks during execution. The "Tree''tree-Shapedshaped Computation"computation'' algorithm presented later takes great advantage of this specificity.
 
===Static and dynamic algorithms===
Line 30:
A load balancing algorithm is "static" when it does not take into account the state of the system for the distribution of tasks. Thereby, the system state includes measures such as the [[Load (computing)|load level]] (and sometimes even overload) of certain processors. Instead, assumptions about the overall system are made beforehand, such as the arrival times and resource requirements of incoming tasks. In addition, the number of processors, their respective power and communication speeds are known. Therefore, static load balancing aims to associate a known set of tasks with the available processors in order to minimize a certain performance function. The trick lies in the concept of this performance function.
 
Static load balancing techniques are commonly centralized around a router, or [[Master/slave (technology)|Mastermaster]], which distributes the loads and optimizes the performance function. This minimization can take into account information related to the tasks to be distributed, and derive an expected execution time.
 
The advantage of static algorithms is that they are easy to set up and extremely efficient in the case of fairly regular tasks (such as processing [[HTTP]] requests from a website). However, there is still some statistical variance in the assignment of tasks which can lead to the overloading of some computing units.
 
====Dynamic====
Unlike static load distribution algorithms, dynamic algorithms take into account the current load of each of the computing units (also called nodes) in the system. In this approach, tasks can be moved dynamically from an overloaded node to an underloaded node in order to receive faster processing. While these algorithms are much more complicated to design, they can produce excellent results, in particular, when the execution time varies greatly from one task to another.
 
Dynamic load balancing architecture can be more [[Modular design|modular]] since it is not mandatory to have a specific node dedicated to the distribution of work. When tasks are uniquely assigned to a processor according to their state at a given moment, it is a unique assignment. If, on the other hand, the tasks can be permanently redistributed according to the state of the system and its evolution, this is called dynamic assignment.<ref>{{cite journal |last1=Alakeel |first1=Ali |title=A Guide to Dynamic Load Balancing in Distributed Computer Systems |journal=International Journal of Computer Science and Network Security (IJCSNS) |date=November 2009 |volume=10 |url=https://www.researchgate.net/publication/268200851}}</ref> Obviously, a load balancing algorithm that requires too much communication in order to reach its decisions runs the risk of slowing down the resolution of the overall problem.
 
===Hardware architecture===
====HeterogenousHeterogeneous machines====
[[Parallel computing]] infrastructures are often composed of units of different [[computing power]], which should be taken into account for the load distribution.
 
Line 50:
For [[shared-memory]] computers, managing write conflicts greatly slows down the speed of individual execution of each computing unit. However, they can work perfectly well in parallel. Conversely, in the case of message exchange, each of the processors can work at full speed. On the other hand, when it comes to collective message exchange, all processors are forced to wait for the slowest processors to start the communication phase.
In reality, few systems fall into exactly one of the categories. In general, the processors each have an internal memory to store the data needed for the next calculations and are organized in successive [[Computer cluster|clusters]]. Often, these processing elements are then coordinated through [[distributed memory]] and [[message passing]]. Therefore, the load balancing algorithm should be uniquely adapted to a parallel architecture. Otherwise, there is a risk that the efficiency of parallel [[problem solving]] will be greatly reduced.
 
====Hierarchy====
Adapting to the hardware structures seen above, there are two main categories of load balancing algorithms. On the one hand, the one where tasks are assigned by “master” and executed by “workers” who keep the master informed of the progress of their work, and the master can then take charge of assigning or reassigning the workload in case of the dynamic algorithm. The literature refers to this as [[Master/slave (technology)|"Mastermaster-Worker"worker]] architecture. On the other hand, the control can be distributed between the different nodes. The load balancing algorithm is then executed on each of them and the responsibility for assigning tasks (as well as re-assigning and splitting as appropriate) is shared. The last category assumes a dynamic load balancing algorithm.
 
Since the design of each load balancing algorithm is unique, the previous distinction must be qualified. Thus, it is also possible to have an intermediate strategy, with, for example, "master" nodes for each sub-cluster, which are themselves subject to a global "master". There are also multi-level organizations, with an alternation between master-slave and distributed control strategies. The latter strategies quickly become complex and are rarely encountered. Designers prefer algorithms that are easier to control.
Line 65:
 
===Fault tolerance===
Especially in large-scale [[computing cluster]]s, it is not tolerable to execute a [[parallel algorithm]] that cannot withstand the failure of one single component. Therefore, [[fault tolerant]] algorithms are being developed which can detect outages of processors and recover the computation.<ref>{{cite book |last1=Punetha Sarmila |first1=G. |last2=Gnanambigai |first2=N. |last3=Dinadayalan |first3=P. |title=2015 2nd International Conference on Electronics and Communication Systems (ICECS) |chapter=Survey on fault tolerant &mdash; Load balancing algorithmsin cloud computing |date=2015 |pages=1715–1720 |doi=10.1109/ECS.2015.7124879 |isbn=978-1-4799-7225-8 |s2cid=30175022 }}</ref>
 
==Approaches==
Line 74:
[[File:Load_Balancing_divisible_tasks.png|thumb|Load balancing algorithm depending on divisibility of tasks]]
 
By dividing the tasks in such a way as to give the same amount of computation to each processor, all that remains to be done is to group the results together. Using a [[prefix sum]] algorithm, this division can be calculated in [[logarithmic time]] with respect to the number of processors.{{factcitation needed|date=October 2022}}
 
If, however, the tasks cannot be subdivided (i.e., they are [[Linearizability|atomic]]), although optimizing task assignment is a difficult problem, it is still possible to approximate a relatively fair distribution of tasks, provided that the size of each of them is much smaller than the total computation performed by each of the nodes.<ref name="Sequential and parallel algorithms"/>
Line 95:
====Others====
Of course, there are other methods of assignment as well:
* Less work: Assignassign more tasks to the servers by performing less{{clarify|date=November 2022}} (the method can also be weighted).
* Hash: allocates queries according to a [[hash table]].
* Power of Twotwo Choiceschoices: pick two servers at random and choose the better of the two options.<ref>{{cite web |title=NGINX and the "Power of Two Choices" Load-Balancing Algorithm |url=https://www.nginx.com/blog/nginx-power-of-two-choices-load-balancing-algorithm/ |website=nginx.com |archive-url=https://web.archive.org/web/20191212194243/https://www.nginx.com/blog/nginx-power-of-two-choices-load-balancing-algorithm/ |archive-date=2019-12-12 |date=2018-11-12}}</ref><ref>{{cite web |title=Test Driving "Power of Two Random Choices" Load Balancing |url=https://www.haproxy.com/blog/power-of-two-load-balancing/ |website=haproxy.com |archive-url=https://web.archive.org/web/20190215173140/https://www.haproxy.com/blog/power-of-two-load-balancing/ |archive-date=2019-02-15 |date=2019-02-15}}</ref>
 
===Master-Workerworker Schemescheme===
[[Master/slave (technology)|Master-Workerworker]] schemes are among the simplest dynamic load balancing algorithms. A master distributes the workload to all workers (also sometimes referred to as "slaves"). Initially, all workers are idle and report this to the master. The master answers worker requests and distributes the tasks to them. When he has no more tasks to give, he informs the workers so that they stop asking for tasks.
 
The advantage of this system is that it distributes the burden very fairly. In fact, if one does not take into account the time needed for the assignment, the execution time would be comparable to the prefix sum seen above.
 
The problem with this algorithm is that it has difficulty adapting to a large number of processors because of the high amount of necessary communications. This lack of [[scalability]] makes it quickly inoperable in very large servers or very large parallel computers. The master acts as a [[Bottleneck (software)|bottleneck]].
[[File:Master-Worker_and_bottleneck.png|thumb|Master-Workerworker and bottleneck]]
 
However, the quality of the algorithm can be greatly improved by replacing the master with a task list that can be used by different processors. Although this algorithm is a little more difficult to implement, it promises much better scalability, although still insufficient for very large computing centers.
Line 128:
 
===Internet-based services===
{{Multiple issues|section=y|
{{cleanup|section|date=December 2010}}
{{more citations needed|article's section|date=December 2010}}
}}
 
One of the most commonly used applications of load balancing is to provide a single Internet service from multiple [[Server (computing)|server]]s, sometimes known as a [[server farm]]. Commonly load-balanced systems include popular [[web site]]s, large [[Internet Relay Chat]] networks, high-bandwidth [[File Transfer Protocol]] (FTP) sites, [[Network News Transfer Protocol]] (NNTP) servers, [[Domain Name System]] (DNS) servers, and databases.
Line 143 ⟶ 140:
 
Another more effective technique for load-balancing using DNS is to delegate {{mono|www.example.org}} as a sub-___domain whose zone is served by each of the same servers that are serving the website. This technique works particularly well where individual servers are spread geographically on the Internet. For example:
<pre>:one.example.org A 192.0.2.1
:two.example.org A 203.0.113.2
:www.example.org NS one.example.org
:www.example.org NS two.example.org
 
</pre>
However, the [[zone file]] for {{mono|www.example.org}} on each server is different such that each server resolves its own IP Address as the A-record.<ref>[http{{Cite web|url=https://www.zytrax.com/books/dns/ch8/a.html|title=Chapter 8 - IPv4 Address Record (A)] Record|website=www.zytrax.com}}</ref> On server ''one'' the zone file for {{mono|www.example.org}} reports:
<pre>:@ in a 192.0.2.1
 
</pre>
On server ''two'' the same zone file contains:
<pre>:@ in a 203.0.113.2
</pre>
 
This way, when a server is down, its DNS will not respond and the web service does not receive any traffic. If the line to one server is congested, the unreliability of DNS ensures less HTTP traffic reaches that server. Furthermore, the quickest DNS response to the resolver is nearly always the one from the network's closest server, ensuring geo-sensitive load-balancing {{Citation needed|date=November 2014}}. A short [[Time to live|TTL]] on the A-record helps to ensure traffic is quickly diverted when a server goes down. Consideration must be given to the possibility that this technique may cause individual clients to switch between individual servers in mid-session.
 
====Client-side random load balancing====
Another approach to load balancing is to deliver a list of server IPs to the client, and then to have the client randomly select the IP from the list on each connection.<ref>[{{cite web |url=https://gameserverarchitecture.com/2015/10/pattern-client-side-load-balancing/ |title=Pattern: Client Side Load Balancing] |date=October 15, 2015 |archive-url=https://web.archive.org/web/20201129020628/https://gameserverarchitecture.com/2015/10/pattern-client-side-load-balancing/?shared=email&msg=fail |archive-date=2020-11-29 |url-status=usurped}}</ref><ref name="ithare">[{{Cite web|url=http://ithare.com/chapter-vib-server-side-architecture-front-end-servers-and-client-side-random-load-balancing/ MMOG |title=Server-Side Architecture. Front-End Servers and Client-Side Random Load Balancing]|date=December 28, 2015|website=IT Hare on Soft.ware}}</ref> This essentially relies on all clients generating similar loads, and the [[Lawlaw of Largelarge Numbersnumbers]]<ref name="ithare" /> to achieve a reasonably flat load distribution across servers. It has been claimed that client-side random load balancing tends to provide better load distribution than round-robin DNS; this has been attributed to caching issues with round-robin DNS, that in the case of large DNS caching servers, tend to skew the distribution for round-robin DNS, while client-side random selection remains unaffected regardless of DNS caching.<ref name="ithare" />
 
With this approach, the method of delivery of a list of IPs to the client can vary and may be implemented as a DNS list (delivered to all the clients without any round-robin), or via hardcoding it to the list. If a "smart client" is used, detecting that a randomly selected server is down and connecting randomly again, it also provides [[fault tolerance]].
Line 171 ⟶ 167:
=====Scheduling algorithms=====
Numerous [[scheduling algorithm]]s, also called load-balancing methods, are used by load balancers to determine which back-end server to send a request to.
Simple algorithms include random choice, [[Round-robin scheduling|round robin]], or least connections.<ref name=":0">{{Cite web|url=https://f5.com/resources/white-papers/load-balancing-101-nuts-and-bolts|archive-url=https://web.archive.org/web/20171205223948/https://f5.com/resources/white-papers/load-balancing-101-nuts-and-bolts|url-status=dead|archive-date=2017-12-05|title=Load Balancing 101: Nuts and Bolts|date=2017-12-05|publisher=[[F5, NetworksInc.|F5]]|access-date=2018-03-23}}</ref> More sophisticated load balancers may take additional factors into account, such as a server's reported load, least response times, up/down status (determined by a monitoring poll of some kind), a number of active connections, geographic ___location, capabilities, or how much traffic it has recently been assigned.
 
=====Persistence=====
Line 195 ⟶ 191:
; Priority activation
: When the number of available servers drops below a certain number, or the load gets too high, standby servers can be brought online.
; [[TLS acceleration|TLS Offloadoffload and Accelerationacceleration]]
: TLS (or its predecessor SSL) acceleration is a technique of offloading cryptographic protocol calculations onto specialized hardware. Depending on the workload, processing the encryption and authentication requirements of a [[Transport Layer Security|TLS]] request can become a major part of the demand on the Web Server's CPU; as the demand increases, users will see slower response times, as the TLS overhead is distributed among Web servers. To remove this demand on Web servers, a balancer can terminate TLS connections, passing HTTPS requests as HTTP requests to the Web servers. If the balancer itself is not overloaded, this does not noticeably degrade the performance perceived by end-users. The downside of this approach is that all of the TLS processing is concentrated on a single device (the balancer) which can become a new bottleneck. Some load balancer appliances include specialized hardware to process TLS. Instead of upgrading the load balancer, which is quite expensive dedicated hardware, it may be cheaper to forgo TLS offload and add a few web servers. Also, some server vendors such as Oracle/Sun now incorporate cryptographic acceleration hardware into their CPUs such as the T2000. F5 Networks incorporates a dedicated TLS acceleration hardware card in their local traffic manager (LTM) which is used for encrypting and decrypting TLS traffic. One clear benefit to TLS offloading in the balancer is that it enables it to do balancing or content switching based on data in the HTTPS request.
; [[Distributed denial of service|Distributed Denial of Service]] (DDoS) attack protection
: Load balancers can provide features such as [[SYN cookies]] and delayed-binding (the back-end servers don't see the client until it finishes its TCP handshake) to mitigate [[SYN flood]] attacks and generally offload work from the servers to a more efficient platform.
; [[HTTP compression]]
: HTTP compression reduces the amount of data to be transferred for HTTP objects by utilising gzip compression available in all modern web browsers. The larger the response and the further away the client is, the more this feature can improve response times. The trade-off is that this feature puts additional CPU demand on the load balancer and could be done by web servers instead.
; [[TCP offload]]
: Different vendors use different terms for this, but the idea is that normally each HTTP request from each client is a different TCP connection. This feature utilises HTTP/1.1 to consolidate multiple HTTP requests from multiple clients into a single TCP socket to the back-end servers.
; TCP buffering
: The load balancer can buffer responses from the server and spoon-feed the data out to slow clients, allowing the webserver to free a thread for other tasks faster than it would if it had to send the entire request to the client directly.
; Direct Serverserver Returnreturn
: An option for asymmetrical load distribution, where request and reply have different network paths.
; Health checking
Line 234 ⟶ 230:
 
====Shortest Path Bridging====
[[TRILL (computing)|TRILL]] (TRansparentTransparent Interconnection of Lots of Links) facilitates an [[Ethernet]] to have an arbitrary topology, and enables per flow pair-wise load splitting by way of [[Dijkstra's algorithm]], without configuration and user intervention. The catalyst for TRILL was an event at [[Beth Israel Deaconess Medical Center]] which began on 13 November 2002.<ref>{{cite web |title=All Systems Down |url=https://community.cisco.com/legacyfs/online/legacy/0/9/8/140890-All%20Systems%20Down%20-%20Scott%20Berinato(CIO).pdf |website=cio.com |publisher=IDG Communications, Inc. |access-date=9 January 2022 |archive-url=https://web.archive.org/web/20200923200221if_/https://community.cisco.com/legacyfs/online/legacy/0/9/8/140890-All%20Systems%20Down%20-%20Scott%20Berinato(CIO).pdf |archive-date=23 September 2020 |url-status=dead}}</ref><ref>{{cite web |title=All Systems Down |url=https://www.computerworld.com/article/2581420/all-systems-down.html |website=cio.com |publisher=IDG Communications, Inc. |access-date=9 January 2022 |archive-url=https://web.archive.org/web/20220109020703/https://www.computerworld.com/article/2581420/all-systems-down.html |archive-date=9 January 2022 |url-status=dead}}</ref> The concept of Rbridges<ref>{{cite web |title=Rbridges: Transparent Routing |url=https://courses.cs.washington.edu/courses/cse590l/05sp/papers/rbridges.pdf |website=courses.cs.washington.edu |publisher=Radia Perlman, Sun Microsystems Laboratories |access-date=9 January 2022 |archive-url=https://web.archive.org/web/20220109030037/https://courses.cs.washington.edu/courses/cse590l/05sp/papers/rbridges.pdf |archive-date=9 January 2022 |url-status=dead}}</ref> [sic] was first proposed to the [[Institute of Electrical and Electronics Engineers]] in the year 2004,<ref>{{cite web |title=Rbridges: Transparent Routing |url=https://www.researchgate.net/publication/4102976 |website=researchgate.net |publisher=Radia Perlman, Sun Microsystems; Donald Eastlake 3rd, Motorola}}</ref> whom in 2005<ref>{{cite web |title=TRILL Tutorial |url=http://www.postel.org/rbridge/trill-tutorial.pdf |website=postel.org |publisher=Donald E. Eastlake 3rd, Huawei |access-date=2022-01-14 |archive-date=2023-03-29 |archive-url=https://web.archive.org/web/20230329233902/http://www.postel.org/rbridge/trill-tutorial.pdf |url-status=dead }}</ref> rejected what came to be known as TRILL, and in the years 2006 through 2012<ref>{{cite web |title=IEEE 802.1: 802.1aq - Shortest Path Bridging |url=https://ieee802.org/1/pages/802.1aq.html |website=ieee802.org |publisher=Institute of Electrical and Electronics Engineers }}</ref> devised an incompatible variation known as [[Shortest Path Bridging]].
 
The IEEE approved the [[IEEE 802.1aq]] standard in May 2012,<ref>{{cite web
Line 278 ⟶ 274:
Many telecommunications companies have multiple routes through their networks or to external networks. They use sophisticated load balancing to shift traffic from one path to another to avoid [[network congestion]] on any particular link, and sometimes to minimize the cost of transit across external networks or improve [[Reliability (computer networking)|network reliability]].
 
Another way of using load balancing is in [[network monitoring]] activities. Load balancers can be used to split huge data flows into several sub-flows and use several network analyzers, each reading a part of the original data. This is very useful for monitoring fast networks like [[10 Gigabit Ethernet|10GbE]] or STM64, where complex processing of the data may not be possible at [[wire speed]].<ref>Mohammad{{cite conference | last1=Noormohammadpour, | first1=Mohammad | last2=Raghavendra | first2=Cauligi S. Raghavendra| title=IEEE INFOCOM 2018 [https://www.researchgate.net/publication/323723167_Minimizing_Flow_Completion_Times_using_Adaptive_Routing_over_Inter-Datacenter_Wide_Area_Networks IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) | chapter=Poster abstract: Minimizing Flowflow Completioncompletion Timestimes using Adaptiveadaptive Routingrouting over Interinter-Datacenterdatacenter Widewide Areaarea Networks]networks ''| publisher=IEEE INFOCOM| date=2018 Poster| Sessions,isbn=978-1-5386-5979-3 DOI:| doi=10.131401109/RGINFCOMW.22018.2.36009.90720''8406853 6| Januarydoi-access=free | pages=1–2| arxiv=1802.09080 2019}}</ref>
 
===Data center networks===
Load balancing is widely used in [[data center]] networks to distribute traffic across many existing paths between any two servers.<ref name=architecture>M.{{cite journal | last1=Noormohammadpour, C.| S.first1=Mohammad | last2=Raghavendra, [https://www| first2=Cauligi S.researchgate.net/publication/321744877_Datacenter_Traffic_Control_Understanding_Techniques_and_Trade-offs "| title=Datacenter Traffic Control: Understanding Techniques and Trade-offs,"]Tradeoffs ''| journal=IEEE Communications Surveys & Tutorials'', vol.| PP,volume=20 no.| 99,issue=2 pp| date=2018 | issn=1553-877X | doi=10.1109/COMST.2017.2782753 1| doi-1access=free | pages=1492–1525 | arxiv=1712.03530 | url=https://hal.science/hal-01811647/document}}</ref> It allows more efficient use of network bandwidth and reduces provisioning costs. In general, load balancing in datacenter networks can be classified as either static or dynamic.
 
Static load balancing distributes traffic by computing a hash of the source and destination addresses and port numbers of traffic flows and using it to determine how flows are assigned to one of the existing paths. Dynamic load balancing assigns traffic flows to paths by monitoring bandwidth use on different paths. Dynamic assignments can also be proactive or reactive. In the former case, the assignment is fixed once made, while in the latter the network logic keeps monitoring available paths and shifts flows across them as network utilization changes (with arrival of new flows or completion of existing ones). A comprehensive overview of load balancing in datacenter networks has been made available.<ref name=architecture/>
 
===Failovers===
Load balancing is often used to implement [[failover]]—the continuation of service after the failure of one or more of its components. The components are monitored continually (e.g., web servers may be monitored by fetching known pages), and when one becomes unresponsive, the load balancer is informed and no longer sends traffic to it. When a component comes back online, the load balancer starts rerouting traffic to it. For this to work, there must be at least one component in excess of the service's capacity ([[N+1 redundancy]]). This can be much less expensive and more flexible than failover approaches where every single live component is paired with a single backup component that takes over in the event of a failure ([[dual modular redundancy]]). Some [[RAID]] systems can also utilize [[hot spare]] for a similar effect.<ref name="IBM">[{{cite web |url=https://www.ibm.com/support/knowledgecenter/en/SSVJJU_6.4.0/com.ibm.IBMDS.doc_6.4/ds_ag_srv_adm_dd_failover_load_balancing.html |title=Failover and load balancing] ''|website=IBM'' |accessdate=6 January 2019}}</ref>
 
This technique can increase [[fault tolerance]] by enabling quick substitutions for the most complicated, most failure-prone parts of a system. However, it can make the load balancer itself a [[single point of failure]].
 
=== Data ingestion for AI model training ===
Increasingly, load balancing techniques are being used to manage high-volume data ingestion pipelines that feed [[artificial intelligence]] [[AI training|training]] and [[inference]] systems—sometimes referred to as “[[AI Factory|AI factories]].” These AI-driven environments require continuous processing of vast amounts of structured and unstructured data, placing heavy demands on networking, storage, and computational resources.<ref>{{Cite web |title=Optimize Traffic Management for AI Factory Data Ingest |url=https://www.f5.com/company/blog/ai-factory-traffic-management-data-ingest |access-date=2025-01-30 |website=F5, Inc. |language=en-US}}</ref> To maintain the necessary high throughput and low latency, organizations commonly deploy load balancing tools capable of advanced TCP optimizations, connection pooling, and adaptive scheduling. Such features help distribute incoming data requests evenly across servers or nodes, prevent congestion, and ensure that compute resources remain efficiently utilized.<ref>{{Cite web |title=Optimize, Scale, and Secure AI Interactions |url=https://www.f5.com/solutions/use-cases/optimize-scale-and-secure-ai |access-date=2025-01-30 |website=F5, Inc. |language=en-US}}</ref>
 
When deployed in large-scale or high-performance AI environments, load balancers also mitigate bandwidth constraints and accommodate varying data governance requirements—particularly when sensitive training data cannot be sent to third-party cloud services. By routing data locally (on-premises) or across private clouds, load balancers allow AI workflows to avoid public-cloud bandwidth limits, reduce transit costs, and maintain compliance with regulatory standards. As AI models expand in size (often measured by billions or even trillions of parameters), load balancing for data ingestion has grown in importance for maintaining the reliability, scalability, and cost efficiency of AI factories.
 
==See also==
 
{{Div col|colwidth=25em}}
* [[Affinity mask]]
* [[Application Deliverydelivery Controllercontroller]]
* [[Autoscaling]]
* [[Cloud computing]]
Line 299 ⟶ 301:
* [[Edge computing]]
* [[InterPlanetary File System]]
* [[Network Loadload Balancingbalancing]]
* [[Optimal job scheduling]] - the computational problem of finding an optimally-balanced schedule.
* [[SRV record]]
{{div col end}}
 
==References==
Line 308 ⟶ 310:
==External links==
{{commons category|Load balancing (computing)}}
* [{{webarchive |url=https://web.archive.org/web/20230329234046/http://www.udaparts.com/document/articles/snpisec.htm |title=Server routing for load balancing with full auto failure recovery]}}
{{Authority control}}