Routing and resource discovery Peer-to-peer networks generally implement some form of virtual
overlay network on top of the physical network topology, where the nodes in the overlay form a
subset of the nodes in the physical network. Data is still exchanged directly over the underlying
TCP/IP network, but at the
application layer peers can communicate with each other directly, via the logical overlay links (each of which corresponds to a path through the underlying physical network). Overlays are used for indexing and peer discovery, and make the P2P system independent from the physical network topology. Based on how the nodes are linked to each other within the overlay network, and how resources are indexed and located, we can classify networks as
unstructured or
structured (or as a hybrid between the two).
Unstructured networks Unstructured peer-to-peer networks do not impose a particular structure on the overlay network by design, but rather are formed by nodes that randomly form connections to each other. (
Gnutella,
Gossip, and
Kazaa are examples of unstructured P2P protocols). Because there is no structure globally imposed upon them, unstructured networks are easy to build and allow for localized optimizations to different regions of the overlay. Also, because the role of all peers in the network is the same, unstructured networks are highly robust in the face of high rates of "churn"—that is, when large numbers of peers are frequently joining and leaving the network. However, the primary limitations of unstructured networks also arise from this lack of structure. In particular, when a peer wants to find a desired piece of data in the network, the search query must be flooded through the network to find as many peers as possible that share the data. Flooding causes a very high amount of signaling traffic in the network, uses more
CPU/memory (by requiring every peer to process all search queries), and does not ensure that search queries will always be resolved. Furthermore, since there is no correlation between a peer and the content managed by it, there is no guarantee that flooding will find a peer that has the desired data. Popular content is likely to be available at several peers and any peer searching for it is likely to find the same thing. But if a peer is looking for rare data shared by only a few other peers, then it is highly unlikely that the search will be successful.
Structured networks (DHT) to identify and locate nodes/resources In
structured peer-to-peer networks the overlay is organized into a specific topology, and the protocol ensures that any node can efficiently search the network for a file/resource, even if the resource is extremely rare. in which a variant of
consistent hashing is used to assign ownership of each file to a particular peer. This enables peers to search for resources on the network using a
hash table: that is, (
key,
value) pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given key. However, in order to route traffic efficiently through the network, nodes in a structured overlay must maintain lists of neighbors that satisfy specific criteria. This makes them less robust in networks with a high rate of
churn (i.e. with large numbers of nodes frequently joining and leaving the network). More recent evaluation of P2P resource discovery solutions under real workloads have pointed out several issues in DHT-based solutions such as high cost of advertising/discovering resources and static and dynamic load imbalance. Notable distributed networks that use DHTs include
Tixati, an alternative to
BitTorrent's distributed tracker, the
Kad network, the
Storm botnet, and the
YaCy. Some prominent research projects include the
Chord project,
Kademlia,
PAST storage utility,
P-Grid, a self-organized and emerging overlay network, and
CoopNet content distribution system. DHT-based networks have also been widely utilized for accomplishing efficient resource discovery for
grid computing systems, as it aids in resource management and scheduling of applications.
Hybrid models Hybrid models are a combination of peer-to-peer and
client–server models. A common hybrid model is to have a central server that helps peers find each other.
Spotify was an example of a hybrid model [until 2014]. There are a variety of hybrid models, all of which make trade-offs between the centralized functionality provided by a structured server/client network and the node equality afforded by the pure peer-to-peer unstructured networks. Currently, hybrid models have better performance than either pure unstructured networks or pure structured networks because certain functions, such as searching, do require a centralized functionality but benefit from the decentralized aggregation of nodes provided by unstructured networks.
CoopNet content distribution system CoopNet (Cooperative Networking) was a proposed system for off-loading serving to peers who have recently
downloaded content, proposed by computer scientists Venkata N. Padmanabhan and Kunwadee Sripanidkulchai, working at
Microsoft Research and
Carnegie Mellon University. When a
server experiences an increase in load it redirects incoming peers to other peers who have agreed to
mirror the content, thus off-loading balance from the server. All of the information is retained at the server. This system makes use of the fact that the bottleneck is most likely in the outgoing bandwidth than the
CPU, hence its server-centric design. It assigns peers to other peers who are 'close in
IP' to its neighbors [same prefix range] in an attempt to use locality. If multiple peers are found with the same
file it designates that the node choose the fastest of its neighbors.
Streaming media is transmitted by having clients
cache the previous stream, and then transmit it piece-wise to new nodes.
Security and trust Peer-to-peer systems pose unique challenges from a
computer security perspective. Like any other form of
software, P2P applications can contain
vulnerabilities. What makes this particularly dangerous for P2P software, however, is that peer-to-peer applications act as servers as well as clients, meaning that they can be more vulnerable to
remote exploits.
Routing attacks Since each node plays a role in routing traffic through the network, malicious users can perform a variety of "routing attacks", or
denial of service attacks. Examples of common routing attacks include "incorrect lookup routing" whereby malicious nodes deliberately forward requests incorrectly or return false results, "incorrect routing updates" where malicious nodes corrupt the routing tables of neighboring nodes by sending them false information, and "incorrect routing network partition" where when new nodes are joining they bootstrap via a malicious node, which places the new node in a partition of the network that is populated by other malicious nodes. Studies analyzing the spread of malware on P2P networks found, for example, that 63% of the answered download requests on the
gnutella network contained some form of malware, whereas only 3% of the content on
OpenFT contained malware. In both cases, the top three most common types of malware accounted for the large majority of cases (99% in gnutella, and 65% in OpenFT). Another study analyzing traffic on the
Kazaa network found that 15% of the 500,000 file sample taken were infected by one or more of the 365 different
computer viruses that were tested for. Corrupted data can also be distributed on P2P networks by modifying files that are already being shared on the network. For example, on the
FastTrack network, the
RIAA managed to introduce faked chunks into downloads and downloaded files (mostly
MP3 files). Files infected with the RIAA virus were unusable afterwards and contained malicious code. The RIAA is also known to have uploaded fake music and movies to P2P networks in order to deter illegal file sharing. Consequently, the P2P networks of today have seen an enormous increase of their security and file verification mechanisms. Modern
hashing,
chunk verification and different encryption methods have made most networks resistant to almost any type of attack, even when major parts of the respective network have been replaced by faked or nonfunctional hosts.
Resilient and scalable computer networks The decentralized nature of P2P networks increases robustness because it removes the
single point of failure that can be inherent in a client–server based system. As nodes arrive and demand on the system increases, the total capacity of the system also increases, and the likelihood of failure decreases. If one peer on the network fails to function properly, the whole network is not compromised or damaged. In contrast, in a typical client–server architecture, clients share only their demands with the system, but not their resources. In this case, as more clients join the system, fewer resources are available to serve each client, and if the central server fails, the entire network is taken down.
Distributed storage and search " using
YaCy, a free
distributed search engine that runs on a peer-to-peer network instead of making requests to centralized index servers There are both advantages and disadvantages in P2P networks related to the topic of data
backup, recovery, and availability. In a centralized network, the system administrators are the only forces controlling the availability of files being shared. If the administrators decide to no longer distribute a file, they simply have to remove it from their servers, and it will no longer be available to users. Along with leaving the users powerless in deciding what is distributed throughout the community, this makes the entire system vulnerable to threats and requests from the government and other large forces. For example,
YouTube has been pressured by the
RIAA,
MPAA, and entertainment industry to filter out copyrighted content. Although server-client networks are able to monitor and manage content availability, they can have more stability in the availability of the content they choose to host. A client should not have trouble accessing obscure content that is being shared on a stable centralized network. P2P networks, however, are more unreliable in sharing unpopular files because sharing files in a P2P network requires that at least one node in the network has the requested data, and that node must be able to connect to the node requesting the data. This requirement is occasionally hard to meet because users may delete or stop sharing data at any point. In a P2P network, the community of users is entirely responsible for deciding which content is available. Unpopular files eventually disappear and become unavailable as fewer people share them. Popular files, however, are highly and easily distributed. Popular files on a P2P network are more stable and available than files on central networks. In a centralized network, a simple loss of connection between the server and clients can cause a failure, but in P2P networks, the connections between every node must be lost to cause a data-sharing failure. In a centralized system, the administrators are responsible for all data recovery and backups, while in P2P systems, each node requires its backup system. Because of the lack of central authority in P2P networks, forces such as the recording industry,
RIAA,
MPAA, and the government are unable to delete or stop the sharing of content on P2P systems. ==Applications==