Supplementary Materials for Distributed Processing
Supplementary Materials for Distributed Processing
The materials in this page are solely used for teaching purpose and provided as is. Use at your own risk. You're welcomed to use it for non-commercial purposes.- 資料處理的演進
- 分散式處理的特性
- 分散式處理的目標
- 何謂通透性?
- 分散式處理的優缺點
- 分散式處理的架構
- Distributed Processing vs. Parallel Processing
- Tightly coupled vs. loosely coupled
- 常見的分散式應用系統
- socket and RPC
- Challenges
資料處理的演進
- In the 1970s, data or file processing with minicomputers. Problems are
- consistency.
- duplication.
- In the 1980s, database processing with mainframes. However, this architecture has a serious limitation: centralization (bottleneck and availability). To overcome the problem, and due to the following reasons,
- cheap and fast personal computers,
- friendly of GUI (graphical user interface),
- popularity of the Internet,
- ARPANET (sharing thru telnet/ftp/email)
- Gopher, WAIS
- World Wide Web (HTML by Tim Berners-Lee)
- globalization.
- integration.
- 分散式處理 (distributed processing).
- distributed object technology (COM/DCOM, CORBA, RMI, etc.)
- web services
- mobile and distributed processing??
- clustering
-- Abraham Kang, JavaWorld
A cluster is a group of application servers that transparently run
your J2EE application as if it were a single entity. To scale, you
should include additional machines within the cluster. To minimize
downtime, make sure every component of the cluster is redundant.
- grid computing??
-- Vladimir Silva, IBM developerWorks
Grids are environments that enable software applications to integrate
instruments, displays, computational and information resources that are
managed by diverse organizations in widespread locations. Grid
computing is all about sharing resources that are located in different
places based on different architectures and belonging to different
management domains. Computer Grids will create a powerful pool of
computing resources capable of running the most demanding scientific
and engineering applications required by researchers and businesses
today.
Based on IBM's observation, business utilizing the Internet technology is natually going through three stages: (Alfredo Guitierrez, "e-business on demand: A developer's roadmap", IBM developerWorks, 02/17/2002.)
- Access: Enable transactions against core business systems using simple Web publishing and point solutions.
- Enterprise integration: Use the Web to integrate business processes across enterprises. Link internal and external systems, both across enterprises and beyond enterprise boundaries.
- e-business on demand: Use the Web to adapt dynamically to customer and market requirements. Change business models. Combine people, technologies, and processes in new ways.
分散式處理的特性
- 多於一部的處理器(可為個人電腦、工作站、mainframe、超級/平行電腦等),並可集中於一處或者分散在各地。
- 資料是分散的
- process logic 是分散的
- control 是分散的
- 共同完成所指定的工作
- 資源是共享的
分散式處理的目標
The goal is to permit the user, application programmer (and programs), and administrator to deal with a collection of computers very much as if the set of machines were a single computer; the so-called single system image. This objective is usually accomplished through the followings:- move data and processing functions closer to the users that need those services, thereby to improve the system's responsiveness and reliability.
- data or processing are to be accompllished by another node is to make transparent to the system user. Transparency 中文翻譯為 「通透性」,在理想的分散式系統中, transparency 指的是
Transparency is defined as the concealent of separation from the user and the application programmers (並不一定包含系統管理員), so that the system is perceived as a whole rather than as a collection of independent components.
何謂通透性?
ANSA and ISO RM-ODP 定義了八種形式的通透性,而 Coulouris et. al. 改寫成下列 定義:- Access transparency enables local and remote files and other objects to be accessed using identical operations.
- Location transparency enables objects to be accessed without knowledge of their location.
- Concurrency transparency enables several processes to operate concurrently on shared data without interference between them.
- Replication transparency enables multiple instances of files and other data to be used to increase reliability and performance without knowledge of the replcas by users or application programs.
- Failure transparency enables the concealment of faults, allowing users and application programs to complete their tasks despite the failure of hardware or software components. A good example is email failure.
- Mobility transparency allows the movement of resources and clients within a system without affecting the operation of users or programs. A good example is mobile phone users.
- Performance transparency allows the system to be reconfigured to improve performance as loads vary.
- Scaling transparency allows the system and applications to expand in scale without change to the system structure or the application algorithms.
分散式處理的優缺點
- 優點:
- reliable: if the large, centralized system fails, the entire system fails. On the other hand, in the distributed approach, only one node fails.
- 沒有 performance 上的瓶頸
- 由於 locality,可將系統管理分散(當然整合又是一個問題)
- upgrade 比較容易:大系統會整體受到影響,分散式系統卻只有部分受到影響。
- 缺點:
- multiple-node transactions are slower
- contention and deadlock: user A locks resource X and is trying to access resource Y, while user B locks resource Y and is trying to access resource X.
- potential for failure: since communication line is slow, this increases the probablity of failure.
分散式處理的架構
常見的分散式處理架構有:- client-server (2 tier;主從式架構): thin or fat client.
- 3 tier or N tier(多層式架構): a service is provided by multiple servers.
- variations on the client-server model
- mobile code (for exampe: Java applets)
- mobile agent: a running program (including code and data) that travels from one computer to another in a network carrying out a task on someone's behalf, such as collecting information, evetually returning with the results.
- mobile devices and spontaneous networking
- Spontaneous networking is used to encompass applications that involve the connection of both mobile and non-mobile devices to networks in a more informal manner. (IMHO, it is also called pervasive computing) The key features are (1) easy connection to a local network and (2) easy integration with local services. For example, there is a user with a digital camera and MP3 player. When s/he steps into a conference room, the user can easily pipe the MP3 music to the conference room's speaker and re-play the images in her/his digital camera through a (wireless) network.
- P2P. The central component of the P2P application is the resource. A resource can be anything addressable -- a filesystem, a phone book, a database, or a directory.
- Features
- Direct interactions between peers.
- The number of peers is large and the number of different roles is small.
- Discovery: individual peers pop in and out of existence.
- explicit point-to-point configuration
- dynamic discovery models:
- the directory service model
- the network model: No single peer knows the structure of the entire network or the identity of every peer participating in the network. Instead, peers know only of the neighbor peers.
- the multicast model: The sender does not need to know how many receivers exist or any exist at all. All clients that are tuned to the proper channel (a combination of special IP address and port number) will receive a copy of the message. Pro: easy; Cons: complicated when routing multicast traffic across the subnets.
- Examples
- USENET, 1979, by Tom Truscott and Jim Ellis, exchange files.
- FidoNet, 1984, by Tom Jinnings, exchange messages between BBS systems.
- ICQ
- Gnutella Network
- Try out Todd's very simple P2P programs.
- Features
Distributed Processing vs. Parallel Processing
- 為什麼要平行處理 (parallel processing)?
- 常見的平行處理架構:
- multiprocessors
- multicomputers
- parallel computer and virtual parallel computer (PVM, parallel virtual machine; MPI, message passing interfaces; Linda).
Tightly coupled vs. loosely coupled
- tightly coupled: integrated a number of processors into an integrated hardware system under the control of a single OS. (一般來說,其特色為 shared memory and high-speed communication links.)
- loosely coupled: shared resources are provided by some of the computers and are accessed by system software that run in all of the computers, using network to coordinate their work and to transfer data between them. (一般來說,其特色為 message passing (or non-shared memory) and slower communication links.)
- Pros and cons: (source: Bloor Research NA - May, 2002)
- Tight coupling is comparatively cumbersome (but is inherently reliable, secure, and tunable).
- Loose coupling provides benefits such as dynamic lookup and heterogeneous, cross-platform interoperability (but may require an organization find and integrate supplemental software for security, reliability, manageability, and other mission-critical purposes).

Source: Bloor Research NA - May, 2002
常見的分散式應用系統
- NFS (Network File System): distributed file systems
- NIS/NIS+ (or Yellow Page)
- DNS (Domain Name System)
- Proxy Server
socket and RPC
既然在分散式處理的環境中有多於一部以上的電腦,那麼電腦和電腦之間 要如何溝通呢?況且每一部電腦上,大多同時都有一個以上的 process 在執行中,我們要如何決定呢?一般來說,要達到 process 與 process 之間能夠溝通,必須借由 Inter-Process Communication (IPC) 的機制。 而要能達到 IPC 的其中一個最基本的方式就是使用 BSD socket(另一個 是 System V Transport Layer Interface)。想要深入了解 IPC 請去修系上開的「網路程式設計」, 在此我們只用一個簡單的範例說明。這個範例是從 "Interprocess Communication in the UNIX 4.2BSD Environment" by Chung-Ta King and Lionel M. Ni 的文章修改而來。
- 在 sun20.im.cyut.edu.tw 上執行 daemon。(原始碼 daemon.c)記錄 socket port number, 假設是 1111.
- 在 mail.cyut.edu.tw 上執行 client sun20.im.cyut.edu.tw 1111 (原始碼 client.c)
- 反過來執行也可以。
- 討論 www, telnet, ftp 等程式是如何達成的。
- blocking vs. non-blocking (synchronous vs. asynchronous)

Source: George Coulouris, Jean Dollimore, and Tim Kindberg, Distributed Systems: Concepts and Design, 3rd Edition, Addison-Wesley, 2001.
RPC 的基本架構如上圖所示(這個圖以及下列的範例程式取至 George Coulouris, Jean Dollimore, and Tim Kindberg, Distributed Systems: Concepts and Design, 3rd Edition, Addison-Wesley, 2001)。 若以 Sun RPC 為例,程式開發人員只需要寫出 client program、service procedure、以及 XDR (eXternal Data Representation) 的介面定義程式。 在此我們利用 Coulouris et. al. 書上的範例作說明,假設在 service procedure 提供了 read_2 的服務, 這個服務會傳回 "RPC is amazing" 的字串給呼叫他的 client program。 而要完成這項需求我們需要
- Sun XDR (an interface definition language):Sun XDR 是以 program number 以及 version number 來判斷呼叫哪一個介面。在這個 範例中,program number 是 9999 而 version number 是 2。
/*
* FileReadWrite service interface definition in file FileReadWrite.x
*/
const MAX = 1000;
typedef int FileIdentifier;
typedef int FilePointer;
typedef int Length;
struct Data {
int length;
char buffer[MAX];
};
struct writeargs {
FileIdentifier f;
FilePointer position;
Data data;
};
struct readargs {
FileIdentifier f;
FilePointer position;
Length length;
};
program FILEREADWRITE {
version VERSION {
void WRITE(writeargs)=1;
Data READ()=2;
}=2;
} = 9999; - client program
/* File : C.c - Simple client of the ReadWrite service. */
#include
#include
#include "FileReadWrite.h"
main(int argc, char ** argv)
{
CLIENT *clientHandle;
char *serverName;
Data *data;
if(argc != 2) {
printf("usage: %s hostname\n", argv[0]);
exit(1);
}
serverName = argv[1];
clientHandle= clnt_create(serverName, FILEREADWRITE,
VERSION, "udp"); /* creates socket and a client handle*/
if (clientHandle==NULL){
clnt_pcreateerror(serverName); /* unable to contact server */
exit(1);
}
data = read_2(clientHandle); /* call to remote read procedure */
/* print out the received data */
if(data != NULL)
printf("%s\n", data->buffer);
else
printf("Received nothing from RPC\n");
clnt_destroy(clientHandle); /* closes socket */
} - service procedure: 注意,這個程式裡面並沒有 main() 函數。
/* File S.c - server procedures for the FileReadWrite service */
#include
#include
#include
#include "FileReadWrite.h"
void * write_2(writeargs *a)
{
/* do the writing to the file */
printf("write_2 is called\n");
}
Data * read_2()
{
static Data result; /* must be static */
strcpy(result.buffer, "RPC is amazing"); /* do the reading from the file */
result.length = 15; /* amount read from the file */
return &result;
}
- 執行 rpcgen FileReadWrite.x 以產生 header file (FileReadWrite.h), client stub procedure (FileReadWrite_clnt.c), server stub procedure (FileReadWrite_svc.c, 包含 main()), 以及 communication module (FileReadWrite_xdr.c).
- compile client program gcc -o C C.c FileReadWrite_clnt.c FileReadWrite_xdr.c -lnsl
- compile service procedure gcc -o S S.c FileReadWrite_svr.c FileReadWrite_xdr.c -lnsl
Challenges
The challenges arising from the construction of distributed systems are- the heterogeneity of components (both hardware and software)
- standards:
- may take months or even years to complete
- will there be one universal standard? Beta vs. VHS, DVD-R/W vs. DVD+R/W vs. DVD-RAM, ebXML vs. BizTalk
- open systems
- standards:
- openness: allows components to be added or replaced.
- the key interfaces are published.
- security
- Encryption
- Trust: authentication and authorization.
- scalability: the ability to work well when the number of users increases.
- failure handling: if the completion of one transaction involves more than two database servers? Techniques for dealing failures:
- detecting failures
- masking failures: some failures that have been detected can be hidden or made less severe. For example, re-transmit packets whenever they were corrupted.
- tolerating failures: for example, connection failure --> re-try automatically or have user to decide.
- recovery from failures: for example, roll back.
- redundancy: for example, backup DNS or database replication.
- concurrency of components:
- database: concurrency control.
- problems with caching and replication.
- version control on programs or documents: projects such as CVS or WebDAV.
- transparency
Last Updated: Friday, 02-May-2003 09:13:10 CST
Written by: Eric Jui-Lin Lu