I've decided to try to adapt a series of presentations that I did on BGP for the blog. So here goes with the first installment.
So, let's start with the real basics. What is BGP? I'm going to skip the historical stuff, other than to say BGP-4 (or the 4th iteration of BGP) is the current Border Gateway Protocol and was designed to fill the need for a routing protocol that could communicate routes and make routing decisions based upon policy for the Internet backbone.
There's quite a bit there, so let's look at a few things a little more. Most people, if they're familiar with routing protocols at all, are familiar with what are called "IGPs" or Interior Gateway Protocols. These protocols are designed to provide the information needed inside of an organization (or Autonomous System - you'll see this term more later) in order for my computer, for instance, to talk to another computer in that same organization.
Example:
My computer, 10.1.1.1/24, wants to talk to a server on 10.3.3.3/24. We're not in the same subnet, which usually means that there are router hops between my machine and that server. How do the routers know where these subnets are? They use IGPs (usually) to share that information with each other. Router A, which knows about 10.1.1.0/24 can tell router B about it while router B, which knows about 10.3.3.0/24 can tell router A about it. At that point, packets know where to go and communication via IP can ensue. Fast recovery of routes is important here, because we don't want stretches of network down for long periods of time. Overhead be darned - we're not talking about a lot of routes here in the grand scheme of things. Detailed information about how to reach networks is also important so that optimal routing decisions are made.
BGP isn't like that. BGP is for communicating BETWEEN the various organizations, and here policy can become very important. So, BGP is, as one example, used to communicate to the greater Internet what routes are available via your organization. It's used by ISPs to tell each other what routes they have. It's also used to pick, based upon whatever policy you have, which route into or out of your organization should be used.
So, IGP - how do I get around inside of my organization/company/collective. BGP - what do I want other organizations to know about my connectivity and how do I want them to connect to me if there are several options.
Now, because BGP is responsible for communicating routes for the entire Internet (there are
a lot), scale is really important. Scale, scale, scale - did I mention scale? You try to carry the Internet tables in, say, OSPF, and you'll have a lot of pools of molten metal in your data center where a router used to be. Bad idea.
Here's the thing, though. A routing protocol can be scalable, fast, reliable - pick any two. BGP picked scalable and reliable. This means that it isn't that fast. Sure, there are things you can do to speed it up, but the more of that you do, the more you subtract from its scalability. Not that big a deal if you're using it for a private MPLS network connection and you only have, say, 20 routes. If you're carrying multiple copies of the BGP table because you have several connections to various ISPs then that's another story.
Reliability:
BGP is built on top of TCP. It uses port 179 to communicate with its peers (statically defined peers - none of that multicasting stuff that IGPs do) and this gives its updates tremendous reliability. If TCP says that an update is going to get there then it's going to get there. BGP doesn't have to worry about updates being lost, which makes it very reliable. It also allows it to be very quiet. It only talks when there's an actual change, unlike most IGPs, which periodically refresh their tables whether there's a change or not.
Scalable:
BGP does a number of things to maximize its scalability. As mentioned above, it doesn't talk much, which keeps network traffic down. It also puts in delays before it notifies a neighbor of a change. BGP batches up updates and only sends them when the time to come around for the next batch to be sent. Contrast this with OSPF, which immediately sends out updates as soon at it knows anything, which is great for speedy convergence, but bad for scale, as every update increased CPU load on the receiving router. BGP generally also has longer "hello" intervals to keep track of its neighbors than other protocols. This again lowers the amount of traffic BGP is putting on the wire, but also means that convergence (or how soon the network is completely aware of changes) takes longer.