Tech Blog

Wednesday, February 22, 2006

Identifying unused indexes in Oracle9i

PURPOSE
-------

The purpose of this document is to explain how to find unused indexes
using the new feature in Oracle9: "Identifying Unused Indexes" via
ALTER INDEX MONITORING USAGE, as mentioned in Oracle9i Database
Administrator's Guide, Chapter 11.

The clause MONITORING / NOMONITORING USAGE is useful in determining
whether an index is being used.

SCOPE & APPLICATION
-------------------

This article is intended for database Administrators who want to
identify unused indexes in their database.

IDENTIFYING UNUSED INDEXES
--------------------------

You can find indexes that are not being used by using the ALTER INDEX
MONITORING USAGE functionality over a period of time that is
representative of your workload.

PART 1 will demonstrate the new feature using a simple example.

PART 2 will give a detailed instruction how to identify all unused
indexes in the database.

PART 1 - Monitoring usage of indexes - a simple example
---------------------------------------------------------

To demonstrate the new feature, you can use the following example:
(a) Create and populate a small test table
(b) Create Primary Key index on that table
(c) Query v$object_usage: the monitoring has not started yet
(d) Start monitoring of the index usage
(e) Query v$object_usage to see the monitoring in progress
(f) Issue the SELECT statement which uses the index
(g) Query v$object_usage again to see that the index has been used
(h) Stop monitoring of the index usage
(i) Query v$object_usage to see that the monitoring stopped

Detailed steps:

(a) Create and populate a small test table

create table products
(prod_id number(3),
prod_name_code varchar2(5));

insert into products values(1,'aaaaa');
insert into products values(2,'bbbbb');
insert into products values(3,'ccccc');
insert into products values(4,'ddddd');
commit;

(b) Create Primary Key index on that table

alter table products
add (constraint products_pk primary key (prod_id));

(c) Query v$object_usage: the monitoring has not started yet

column index_name format a12
column monitoring format a10
column used format a4
column start_monitoring format a19
column end_monitoring format a19
select index_name,monitoring,used,start_monitoring,end_monitoring
from v$object_usage;

no rows selected

(d) Start monitoring of the index usage

alter index products_pk monitoring usage;

Index altered.

(e) Query v$object_usage to see the monitoring in progress

select index_name,monitoring,used,start_monitoring,end_monitoring
from v$object_usage;

INDEX_NAME MONITORING USED START_MONITORING END_MONITORING
------------ ---------- ---- ------------------- -------------------
PRODUCTS_PK YES NO 04/25/2001 15:43:13

Note: Column MONITORING='YES', START_MONITORING gives the timestamp.

(f) Issue the SELECT statement which uses the index

First, make sure that index will be used for this statement.
Create plan_table in your schema, as required by Oracle Autotrace
utility:

@$ORACLE_HOME/rdbms/admin/utlxplan

Table created.

Use Oracle Autotrace utility to obtain the execution plan:

set autotrace on explain
select * from products where prod_id = 2;
.
.
Execution Plan
------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'PRODUCTS'
2 1 INDEX (UNIQUE SCAN) OF 'PRODUCTS_PK' (UNIQUE)

set autotrace off

Now, since you know the index will be used for this query, issue the
actual SELECT statement:

select * from products where prod_id = 2;

PROD_ID PROD_
---------- -----
2 bbbbb

(g) Query v$object_usage again to see that the index has been used

select index_name,monitoring,used,start_monitoring,end_monitoring
from v$object_usage;

INDEX_NAME MONITORING USED START_MONITORING END_MONITORING
------------ ---------- ---- ------------------- -------------------
PRODUCTS_PK YES YES 04/25/2001 15:43:13

Note: Column USED='YES'.

(h) Stop monitoring of the index usage

alter index products_pk nomonitoring usage;

Index altered.

(i) Query v$object_usage to see that the monitoring stopped

select index_name,monitoring,used,start_monitoring,end_monitoring
from v$object_usage;

INDEX_NAME MONITORING USED START_MONITORING END_MONITORING
------------ ---------- ---- ------------------- -------------------
PRODUCTS_PK NO YES 04/25/2001 15:43:13 04/25/2001 15:48:44

Note: Column MONITORING='NO', END_MONITORING gives the timestamp.

PART 2 - How to identify all unused indexes in the database
-------------------------------------------------------------

To identify all unused indexes in the database, you can do the
following:
(a) Create a SQL script to start monitoring all indexes except those
owned by users SYS and SYSTEM
(b) Create another script to stop monitoring all indexes except those
owned by users SYS and SYSTEM
(c) Connect as a user with ALTER ANY INDEX system privilege and run
the start monitoring script
(d) Perform normal activities in your database
(e) After a period of time that is representative of your workload,
run the stop monitoring script
(f) Query v$object_usage to see what indexes have not been used

Detailed steps:

(a) Create a SQL script to start monitoring all indexes except those
owned by users SYS and SYSTEM

set heading off
set echo off
set feedback off
set pages 10000
spool startmonitor.sql
select 'alter index '||owner||'.'||index_name||' monitoring usage;'
from dba_indexes
where owner not in ('SYS','SYSTEM');
spool off

(b) Create another script to stop monitoring all indexes except those
owned by users SYS and SYSTEM

set heading off
set echo off
set feedback off
set pages 10000
spool stopmonitor.sql
select 'alter index '||owner||'.'||index_name||' nomonitoring usage;'
from dba_indexes
where owner not in ('SYS','SYSTEM');
spool off

(c) Connect as a user with ALTER ANY INDEX system privilege and run
the newly created script to start monitoring.

@startmonitor

(d) Perform normal activities in your database

(e) After a period of time that is representative of your workload,
connect as a user with ALTER ANY INDEX system privilege and run
the script to stop monitoring.

@stopmonitor

(f) Query v$object_usage in join with dba_indexes, to see what indexes
have not been used

select d.owner, v.index_name
from dba_indexes d, v$object_usage v
where v.used='NO' and d.index_name=v.index_name;

Tuesday, December 13, 2005

Ajax: A New Approach to Web Applications

by Jesse James Garrett
February 18, 2005

If anything about current interaction design can be called “glamorous,” it’s creating Web applications. After all, when was the last time you heard someone rave about the interaction design of a product that wasn’t on the Web? (Okay, besides the iPod.) All the cool, innovative new projects are online.

Despite this, Web interaction designers can’t help but feel a little envious of our colleagues who create desktop software. Desktop applications have a richness and responsiveness that has seemed out of reach on the Web. The same simplicity that enabled the Web’s rapid proliferation also creates a gap between the experiences we can provide and the experiences users can get from a desktop application.

That gap is closing. Take a look at Google Suggest. Watch the way the suggested terms update as you type, almost instantly. Now look at Google Maps. Zoom in. Use your cursor to grab the map and scroll around a bit. Again, everything happens almost instantly, with no waiting for pages to reload.

Google Suggest and Google Maps are two examples of a new approach to web applications that we at Adaptive Path have been calling Ajax. The name is shorthand for Asynchronous JavaScript + XML, and it represents a fundamental shift in what’s possible on the Web.

Defining Ajax

Ajax isn’t a technology. It’s really several technologies, each flourishing in its own right, coming together in powerful new ways. Ajax incorporates:

standards-based presentation using XHTML and CSS;
dynamic display and interaction using the Document Object Model;
data interchange and manipulation using XML and XSLT;
asynchronous data retrieval using XMLHttpRequest;
and JavaScript binding everything together.

The classic web application model works like this: Most user actions in the interface trigger an HTTP request back to a web server. The server does some processing — retrieving data, crunching numbers, talking to various legacy systems — and then returns an HTML page to the client. It’s a model adapted from the Web’s original use as a hypertext medium, but as fans of The Elements of User Experience know, what makes the Web good for hypertext doesn’t necessarily make it good for software applications.

Figure 1: The traditional model for web applications (left) compared to the Ajax model (right).

This approach makes a lot of technical sense, but it doesn’t make for a great user experience. While the server is doing its thing, what’s the user doing? That’s right, waiting. And at every step in a task, the user waits some more.

Obviously, if we were designing the Web from scratch for applications, we wouldn’t make users wait around. Once an interface is loaded, why should the user interaction come to a halt every time the application needs something from the server? In fact, why should the user see the application go to the server at all?

How Ajax is Different

An Ajax application eliminates the start-stop-start-stop nature of interaction on the Web by introducing an intermediary — an Ajax engine — between the user and the server. It seems like adding a layer to the application would make it less responsive, but the opposite is true.

Instead of loading a webpage, at the start of the session, the browser loads an Ajax engine — written in JavaScript and usually tucked away in a hidden frame. This engine is responsible for both rendering the interface the user sees and communicating with the server on the user’s behalf. The Ajax engine allows the user’s interaction with the application to happen asynchronously — independent of communication with the server. So the user is never staring at a blank browser window and an hourglass icon, waiting around for the server to do something.

Figure 2: The synchronous interaction pattern of a traditional web application (top) compared with the asynchronous pattern of an Ajax application (bottom).

Every user action that normally would generate an HTTP request takes the form of a JavaScript call to the Ajax engine instead. Any response to a user action that doesn’t require a trip back to the server — such as simple data validation, editing data in memory, and even some navigation — the engine handles on its own. If the engine needs something from the server in order to respond — if it’s submitting data for processing, loading additional interface code, or retrieving new data — the engine makes those requests asynchronously, usually using XML, without stalling a user’s interaction with the application.

Who’s Using Ajax

Google is making a huge investment in developing the Ajax approach. All of the major products Google has introduced over the last year — Orkut, Gmail, the latest beta version of Google Groups, Google Suggest, and Google Maps — are Ajax applications. (For more on the technical nuts and bolts of these Ajax implementations, check out these excellent analyses of Gmail, Google Suggest, and Google Maps.) Others are following suit: many of the features that people love in Flickr depend on Ajax, and Amazon’s A9.com search engine applies similar techniques.

These projects demonstrate that Ajax is not only technically sound, but also practical for real-world applications. This isn’t another technology that only works in a laboratory. And Ajax applications can be any size, from the very simple, single-function Google Suggest to the very complex and sophisticated Google Maps.

At Adaptive Path, we’ve been doing our own work with Ajax over the last several months, and we’re realizing we’ve only scratched the surface of the rich interaction and responsiveness that Ajax applications can provide. Ajax is an important development for Web applications, and its importance is only going to grow. And because there are so many developers out there who already know how to use these technologies, we expect to see many more organizations following Google’s lead in reaping the competitive advantage Ajax provides.

Moving Forward

The biggest challenges in creating Ajax applications are not technical. The core Ajax technologies are mature, stable, and well understood. Instead, the challenges are for the designers of these applications: to forget what we think we know about the limitations of the Web, and begin to imagine a wider, richer range of possibilities.

It’s going to be fun.

Ajax Q&A

March 13, 2005: Since we first published Jesse’s essay, we’ve received an enormous amount of correspondence from readers with questions about Ajax. In this Q&A, Jesse responds to some of the most common queries.

Q. Did Adaptive Path invent Ajax? Did Google? Did Adaptive Path help build Google’s Ajax applications?

A. Neither Adaptive Path nor Google invented Ajax. Google’s recent products are simply the highest-profile examples of Ajax applications. Adaptive Path was not involved in the development of Google’s Ajax applications, but we have been doing Ajax work for some of our other clients.

Q. Is Adaptive Path selling Ajax components or trademarking the name? Where can I download it?

A. Ajax isn’t something you can download. It’s an approach — a way of thinking about the architecture of web applications using certain technologies. Neither the Ajax name nor the approach are proprietary to Adaptive Path.

Q. Is Ajax just another name for XMLHttpRequest?

A. No. XMLHttpRequest is only part of the Ajax equation. XMLHttpRequest is the technical component that makes the asynchronous server communication possible; Ajax is our name for the overall approach described in the article, which relies not only on XMLHttpRequest, but on CSS, DOM, and other technologies.

Q. Why did you feel the need to give this a name?

A. I needed something shorter than “Asynchronous JavaScript+CSS+DOM+XMLHttpRequest” to use when discussing this approach with clients.

Q. Techniques for asynchronous server communication have been around for years. What makes Ajax a “new” approach?

A. What’s new is the prominent use of these techniques in real-world applications to change the fundamental interaction model of the Web. Ajax is taking hold now because these technologies and the industry’s understanding of how to deploy them most effectively have taken time to develop.

Q. Is Ajax a technology platform or is it an architectural style?

A. It’s both. Ajax is a set of technologies being used together in a particular way.

Q. What kinds of applications is Ajax best suited for?

A. We don’t know yet. Because this is a relatively new approach, our understanding of where Ajax can best be applied is still in its infancy. Sometimes the traditional web application model is the most appropriate solution to a problem.

Q. Does this mean Adaptive Path is anti-Flash?

A. Not at all. Macromedia is an Adaptive Path client, and we’ve long been supporters of Flash technology. As Ajax matures, we expect that sometimes Ajax will be the better solution to a particular problem, and sometimes Flash will be the better solution. We’re also interested in exploring ways the technologies can be mixed (as in the case of Flickr, which uses both).

Q. Does Ajax have significant accessibility or browser compatibility limitations? Do Ajax applications break the back button? Is Ajax compatible with REST? Are there security considerations with Ajax development? Can Ajax applications be made to work for users who have JavaScript turned off?

A. The answer to all of these questions is “maybe”. Many developers are already working on ways to address these concerns. We think there’s more work to be done to determine all the limitations of Ajax, and we expect the Ajax development community to uncover more issues like these along the way.

Q. Some of the Google examples you cite don’t use XML at all. Do I have to use XML and/or XSLT in an Ajax application?

A. No. XML is the most fully-developed means of getting data in and out of an Ajax client, but there’s no reason you couldn’t accomplish the same effects using a technology like JavaScript Object Notation or any similar means of structuring data for interchange.

Q. Are Ajax applications easier to develop than traditional web applications?

A. Not necessarily. Ajax applications inevitably involve running complex JavaScript code on the client. Making that complex code efficient and bug-free is not a task to be taken lightly, and better development tools and frameworks will be needed to help us meet that challenge.

Q. Do Ajax applications always deliver a better experience than traditional web applications?

A. Not necessarily. Ajax gives interaction designers more flexibility. However, the more power we have, the more caution we must use in exercising it. We must be careful to use Ajax to enhance the user experience of our applications, not degrade it.

Wednesday, October 05, 2005

Supplementary Materials for Distributed Processing

The materials in this page are solely used for teaching purpose and provided as is. Use at your own risk. You're welcomed to use it for non-commercial purposes.

Table of Content

資料處理的演進

In the 1970s, data or file processing with minicomputers. Problems are
- consistency.
- duplication.
In the 1980s, database processing with mainframes. However, this architecture has a serious limitation: centralization (bottleneck and availability). To overcome the problem, and due to the following reasons,
- cheap and fast personal computers,
- friendly of GUI (graphical user interface),
- popularity of the Internet,
  1. ARPANET (sharing thru telnet/ftp/email)
  2. Gopher, WAIS
  3. World Wide Web (HTML by Tim Berners-Lee)
- globalization.
- integration.
Newer database management systems are RDBMS, object-oriented DB, object-relational DB, distributed DB, multimedia DB, parallel DB, etc.

分散式處理 (distributed processing).

distributed object technology (COM/DCOM, CORBA, RMI, etc.)
web services
mobile and distributed processing??

clustering

    -- Abraham Kang, JavaWorld

  A cluster is a group of application servers that transparently run
  your J2EE application as if it were a single entity. To scale, you
  should include additional machines within the cluster. To minimize
  downtime, make sure every component of the cluster is redundant.

grid computing??

    -- Vladimir Silva, IBM developerWorks

  Grids are environments that enable software applications to integrate
  instruments, displays, computational and information resources that are
  managed by diverse organizations in widespread locations. Grid
  computing is all about sharing resources that are located in different
  places based on different architectures and belonging to different
  management domains. Computer Grids will create a powerful pool of
  computing resources capable of running the most demanding scientific
  and engineering applications required by researchers and businesses
  today.

Based on IBM's observation, business utilizing the Internet technology is natually going through three stages: (Alfredo Guitierrez, "e-business on demand: A developer's roadmap", IBM developerWorks, 02/17/2002.)

Access: Enable transactions against core business systems using simple Web publishing and point solutions.
Enterprise integration: Use the Web to integrate business processes across enterprises. Link internal and external systems, both across enterprises and beyond enterprise boundaries.
e-business on demand: Use the Web to adapt dynamically to customer and market requirements. Change business models. Combine people, technologies, and processes in new ways.

分散式處理的特性

多於一部的處理器（可為個人電腦、工作站、mainframe、超級/平行電腦等），並可集中於一處或者分散在各地。
資料是分散的
process logic 是分散的
control 是分散的
共同完成所指定的工作
資源是共享的

分散式處理的目標

The goal is to permit the user, application programmer (and programs), and administrator to deal with a collection of computers very much as if the set of machines were a single computer; the so-called single system image. This objective is usually accomplished through the followings:

move data and processing functions closer to the users that need those services, thereby to improve the system's responsiveness and reliability.
data or processing are to be accompllished by another node is to make transparent to the system user. Transparency 中文翻譯為「通透性」，在理想的分散式系統中， transparency 指的是 Transparency is defined as the concealent of separation from the user and the application programmers (並不一定包含系統管理員), so that the system is perceived as a whole rather than as a collection of independent components.

為了達成這個通透性，目前已經有了部份的結果，如 distributed file systems, single sign-on, etc.

何謂通透性?

ANSA and ISO RM-ODP 定義了八種形式的通透性，而 Coulouris et. al. 改寫成下列定義：

Access transparency enables local and remote files and other objects to be accessed using identical operations.
Location transparency enables objects to be accessed without knowledge of their location.
Concurrency transparency enables several processes to operate concurrently on shared data without interference between them.
Replication transparency enables multiple instances of files and other data to be used to increase reliability and performance without knowledge of the replcas by users or application programs.
Failure transparency enables the concealment of faults, allowing users and application programs to complete their tasks despite the failure of hardware or software components. A good example is email failure.
Mobility transparency allows the movement of resources and clients within a system without affecting the operation of users or programs. A good example is mobile phone users.
Performance transparency allows the system to be reconfigured to improve performance as loads vary.
Scaling transparency allows the system and applications to expand in scale without change to the system structure or the application algorithms.

分散式處理的優缺點

reliable: if the large, centralized system fails, the entire system fails. On the other hand, in the distributed approach, only one node fails.
沒有 performance 上的瓶頸
由於 locality，可將系統管理分散（當然整合又是一個問題）
upgrade 比較容易：大系統會整體受到影響，分散式系統卻只有部分受到影響。

multiple-node transactions are slower
contention and deadlock: user A locks resource X and is trying to access resource Y, while user B locks resource Y and is trying to access resource X.
potential for failure: since communication line is slow, this increases the probablity of failure.

分散式處理的架構

常見的分散式處理架構有：

client-server (2 tier；主從式架構): thin or fat client.
3 tier or N tier（多層式架構）: a service is provided by multiple servers.
variations on the client-server model
- mobile code (for exampe: Java applets)
- mobile agent: a running program (including code and data) that travels from one computer to another in a network carrying out a task on someone's behalf, such as collecting information, evetually returning with the results.
- mobile devices and spontaneous networking
  - Spontaneous networking is used to encompass applications that involve the connection of both mobile and non-mobile devices to networks in a more informal manner. (IMHO, it is also called pervasive computing) The key features are (1) easy connection to a local network and (2) easy integration with local services. For example, there is a user with a digital camera and MP3 player. When s/he steps into a conference room, the user can easily pipe the MP3 music to the conference room's speaker and re-play the images in her/his digital camera through a (wireless) network.
P2P. The central component of the P2P application is the resource. A resource can be anything addressable -- a filesystem, a phone book, a database, or a directory.
- Features
  - Direct interactions between peers.
  - The number of peers is large and the number of different roles is small.
  - Discovery: individual peers pop in and out of existence.
    - explicit point-to-point configuration
    - dynamic discovery models:
      1. the directory service model
      2. the network model: No single peer knows the structure of the entire network or the identity of every peer participating in the network. Instead, peers know only of the neighbor peers.
      3. the multicast model: The sender does not need to know how many receivers exist or any exist at all. All clients that are tuned to the proper channel (a combination of special IP address and port number) will receive a copy of the message. Pro: easy; Cons: complicated when routing multicast traffic across the subnets.
- Examples
  - USENET, 1979, by Tom Truscott and Jim Ellis, exchange files.
  - FidoNet, 1984, by Tom Jinnings, exchange messages between BBS systems.
  - ICQ
  - Gnutella Network
- Try out Todd's very simple P2P programs.

Distributed Processing vs. Parallel Processing

為什麼要平行處理 (parallel processing)?
常見的平行處理架構：
- multiprocessors
- multicomputers
parallel computer and virtual parallel computer (PVM, parallel virtual machine; MPI, message passing interfaces; Linda).

Tightly coupled vs. loosely coupled

tightly coupled: integrated a number of processors into an integrated hardware system under the control of a single OS. (一般來說，其特色為 shared memory and high-speed communication links.)
loosely coupled: shared resources are provided by some of the computers and are accessed by system software that run in all of the computers, using network to coordinate their work and to transfer data between them. (一般來說，其特色為 message passing (or non-shared memory) and slower communication links.)
Pros and cons: (source: Bloor Research NA - May, 2002)
- Tight coupling is comparatively cumbersome (but is inherently reliable, secure, and tunable).
- Loose coupling provides benefits such as dynamic lookup and heterogeneous, cross-platform interoperability (but may require an organization find and integrate supplemental software for security, reliability, manageability, and other mission-critical purposes).

Figure. Tightly coupled versus loosely coupled application behavior
Figure 2. Tightly coupled versus loosely coupled application behavior

Source: Bloor Research NA - May, 2002

常見的分散式應用系統

NFS (Network File System): distributed file systems
NIS/NIS+ (or Yellow Page)
DNS (Domain Name System)
Proxy Server

socket and RPC

既然在分散式處理的環境中有多於一部以上的電腦，那麼電腦和電腦之間要如何溝通呢？況且每一部電腦上，大多同時都有一個以上的 process 在執行中，我們要如何決定呢？一般來說，要達到 process 與 process 之間能夠溝通，必須借由 Inter-Process Communication (IPC) 的機制。而要能達到 IPC 的其中一個最基本的方式就是使用 BSD socket（另一個是 System V Transport Layer Interface）。

想要深入了解 IPC 請去修系上開的「網路程式設計」，在此我們只用一個簡單的範例說明。這個範例是從 "Interprocess Communication in the UNIX 4.2BSD Environment" by Chung-Ta King and Lionel M. Ni 的文章修改而來。

在 sun20.im.cyut.edu.tw 上執行 daemon。（原始碼 daemon.c）記錄 socket port number, 假設是 1111.
在 mail.cyut.edu.tw 上執行 client sun20.im.cyut.edu.tw 1111 （原始碼 client.c）
反過來執行也可以。
討論 www, telnet, ftp 等程式是如何達成的。
blocking vs. non-blocking (synchronous vs. asynchronous)

稍微看一下 socket 的寫法，你會發現對於每一次連線，程式開發人員都需要花費不少精力去解決例外情形（例如連不上的時候要等多久、要不要 retry、要 retry 幾次等），非常的不方便。於是在 1984 年的時候， Birrell and Nelson 提出了 remote procedure call (RPC) 的想法，他們希望 RPC "as much like local procedure calls as possible, with no distinction in syntax between a local and a remote procedure call. All the necessary calls to marshalling and message-passing procedures were hidden from the programmer making the call."

Source: George Coulouris, Jean Dollimore, and Tim Kindberg, Distributed Systems: Concepts and Design, 3rd Edition, Addison-Wesley, 2001.

RPC 的基本架構如上圖所示（這個圖以及下列的範例程式取至 George Coulouris, Jean Dollimore, and Tim Kindberg, Distributed Systems: Concepts and Design, 3rd Edition, Addison-Wesley, 2001）。若以 Sun RPC 為例，程式開發人員只需要寫出 client program、service procedure、以及 XDR (eXternal Data Representation) 的介面定義程式。在此我們利用 Coulouris et. al. 書上的範例作說明，假設在 service procedure 提供了 read_2 的服務，這個服務會傳回 "RPC is amazing" 的字串給呼叫他的 client program。而要完成這項需求我們需要

Sun XDR (an interface definition language)：Sun XDR 是以 program number 以及 version number 來判斷呼叫哪一個介面。在這個範例中，program number 是 9999 而 version number 是 2。

/*
* FileReadWrite service interface definition in file FileReadWrite.x
*/

const MAX = 1000;
typedef int FileIdentifier;
typedef int FilePointer;
typedef int Length;

struct Data {
int length;
char buffer[MAX];
};

struct writeargs {
FileIdentifier f;
FilePointer position;
Data data;
};

struct readargs {
FileIdentifier f;
FilePointer position;
Length length;
};

program FILEREADWRITE {
version VERSION {
  void WRITE(writeargs)=1;
  Data READ()=2;
}=2;
} = 9999;

client program

/* File : C.c - Simple client of the ReadWrite service. */

#include 
#include 
#include "FileReadWrite.h"

main(int argc, char ** argv)
{
CLIENT *clientHandle;
char *serverName;
Data *data;

if(argc != 2) {
  printf("usage: %s hostname\n", argv[0]);
  exit(1);
}
serverName = argv[1];

clientHandle= clnt_create(serverName, FILEREADWRITE,
              VERSION, "udp"); /* creates socket and a client handle*/
if (clientHandle==NULL){
  clnt_pcreateerror(serverName); /* unable to contact server */
  exit(1);
}

data = read_2(clientHandle); /* call to remote read procedure */

/* print out the received data */
if(data != NULL)
  printf("%s\n", data->buffer);
else
  printf("Received nothing from RPC\n");

clnt_destroy(clientHandle); /* closes socket */
}

service procedure: 注意，這個程式裡面並沒有 main() 函數。

/* File S.c - server procedures for the FileReadWrite service */

#include 
#include 
#include 
#include "FileReadWrite.h"

void * write_2(writeargs *a)
{
/* do the writing to the file */
printf("write_2 is called\n");
}

Data * read_2()
{
static Data result; /* must be static */
strcpy(result.buffer, "RPC is amazing"); /* do the reading from the file */
result.length = 15;               /* amount read from the file */
return &result;
}

要執行以上的程式請在 Sun 工作站執行，下列執行的過程是以學校 mail 這部機器為準（SunOS 5.8）：

執行 rpcgen FileReadWrite.x 以產生 header file (FileReadWrite.h), client stub procedure (FileReadWrite_clnt.c), server stub procedure (FileReadWrite_svc.c, 包含 main()), 以及 communication module (FileReadWrite_xdr.c).
compile client program gcc -o C C.c FileReadWrite_clnt.c FileReadWrite_xdr.c -lnsl
compile service procedure gcc -o S S.c FileReadWrite_svr.c FileReadWrite_xdr.c -lnsl

Challenges

The challenges arising from the construction of distributed systems are

the heterogeneity of components (both hardware and software)
- standards:
  - may take months or even years to complete
  - will there be one universal standard? Beta vs. VHS, DVD-R/W vs. DVD+R/W vs. DVD-RAM, ebXML vs. BizTalk
- open systems
openness: allows components to be added or replaced.
- the key interfaces are published.
security
- Encryption
- Trust: authentication and authorization.
scalability: the ability to work well when the number of users increases.
failure handling: if the completion of one transaction involves more than two database servers? Techniques for dealing failures:
- detecting failures
- masking failures: some failures that have been detected can be hidden or made less severe. For example, re-transmit packets whenever they were corrupted.
- tolerating failures: for example, connection failure --> re-try automatically or have user to decide.
- recovery from failures: for example, roll back.
- redundancy: for example, backup DNS or database replication.
concurrency of components:
- database: concurrency control.
- problems with caching and replication.
- version control on programs or documents: projects such as CVS or WebDAV.
transparency

Last Updated: Friday, 02-May-2003 09:13:10 CST
Written by: Eric Jui-Lin Lu

Tuesday, June 21, 2005

Password Hashing

Password Hashing
by James McGlinn

In this article I'm going to cover password hashing, a subject which is often poorly understood by newer developers. Recently I've been asked to look at several web applications which all had the same security issue - user profiles stored in a database with plain text passwords. Password hashing is a way of encrypting a password before it's stored so that if your database gets into the wrong hands, the damage is limited. Hashing is nothing new - it's been in use in Unix system password files since long before my time, and quite probably in other systems long before that. In this article I'll explain what a hash is, why you want to use them instead of storing real passwords in your applications, and give you some examples of how to implement password hashing in PHP and MySQL.

Foreword
As you read on you'll see that I advocate the use of a hashing algorithm called Secure Hashing Algorithm 1 (or SHA-1). Since I wrote this article, a team of researchers - Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu - have shown SHA-1 to be weaker than was previously thought. This means that for certain purposes such as digital signatures, stronger algorithms like SHA-256 and SHA-512 are now being recommended. For generating password hashes, SHA-1 still provides a more than adequate level of security for most applications today. You should be aware of this issue however and begin to think about using stronger algorithms in your code as they become more readily available.

For more information please see Bruce Schneier's analysis of the issue at http://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html

What Is A Hash?
A hash (also called a hash code, digest, or message digest) can be thought of as the digital fingerprint of a piece of data. You can easily generate a fixed length hash for any text string using a one-way mathematical process. It is next to impossible to (efficiently) recover the original text from a hash alone. It is also vastly unlikely that any different text string will give you an identical hash - a 'hash collision'. These properties make hashes ideally suited for storing your application's passwords. Why? Because although an attacker may compromise a part of your system and reveal your list of password hashes, they can't determine from the hashes alone what the real passwords are.

So How Do I Authenticate Users?
We've established that it's incredibly difficult to recover the original password from a hash, so how will your application know if a user has entered the correct password or not? Quite simply - by generating a hash of the user-supplied password and comparing this 'fingerprint' with the hash stored in your user profile, you'll know whether or not the passwords match. Let's look at an example:

User Registration And Password Verification
During the registration process our new user will provide their desired password (preferably with verification and through a secure session). Using code similar to the following, we store their username and password hash in our database:

Figure 1. Our user enters their preferred access details

/* Store user details */

$passwordHash = sha1($_POST['password']);

$sql = 'INSERT INTO user (username,passwordHash) VALUES (?,?)';
$result = $db->query($sql, array($_POST['username'], $passwordHash));

?>The next time our user logs in, we check their access credentials using similar code as follows:

Figure 2. Logging back in

/* Check user details */

$passwordHash = sha1($_POST['password']);

$sql = 'SELECT username FROM user WHERE username = ? AND passwordHash = ?';
$result = $db->query($sql, array($_POST['username'], $passwordHash));
if ($result->numRows() < 1)
{
/* Access denied */
echo 'Sorry, your username or password was incorrect!';
}
else
{
/* Log user in */
printf('Welcome back %s!', $_POST['username']);
}

?>Types Of Hashes
There are a number of strong hashing algorithms in use, the most common of which are MD5 and SHA-1. Older systems - including many Linux variants - used Data Encryption Standard (DES) hashes. With only 56 bits this is no longer considered an acceptably strong hashing algorithm and should be avoided.

Examples
In PHP you can generate hashes using the md5() and sha1 functions. md5() returns a 128-bit hash (32 hexadecimal characters), whereas sha1() returns a 160-bit hash (40 hexadecimal characters). For example:

$string = 'PHP & Information Security';
printf("Original string: %s\n", $string);
printf("MD5 hash: %s\n", md5($string));
printf("SHA-1 hash: %s\n", sha1($string));

?>This code will output the following:

Original string: PHP & Information Security
MD5 hash: 88dd8f282721af2c704e238e7f338c41
SHA-1 hash: b47210605096b9aa0129f88695e229ce309dd362In MySQL you can generate hashes internally using the password(), md5(), or sha1 functions. password() is the function used for MySQL's own user authentication system. It returns a 16-byte string for MySQL versions prior to 4.1, and a 41-byte string (based on a double SHA-1 hash) for versions 4.1 and up. md5() is available from MySQL version 3.23.2 and sha1() was added later in 4.0.2.

mysql> select PASSWORD( 'PHP & Information Security' );
+------------------------------------------+
| PASSWORD( 'PHP & Information Security' ) |
+------------------------------------------+
| 379693e271cd3bd6 |
+------------------------------------------+
1 row in set (0.00 sec)

mysql> select MD5( 'PHP & Information Security' );
+-------------------------------------+
| MD5( 'PHP & Information Security' ) |
+-------------------------------------+
| 88dd8f282721af2c704e238e7f338c41 |
+-------------------------------------+
1 row in set (0.01 sec)Note: Using MySQL's password() function in your own applications isn't recommended - the algorithm used has changed over time and prior to 4.1 was particularly weak.

You may decide to use MySQL to calculate your hash rather than PHP. The example of storing our user's registration details from the previous section then becomes:

/* Store user details */

$sql = 'INSERT INTO user (username, passwordHash) VALUES (?, SHA1(?))';
$result = $db->query($sql, array($_POST['username'], $_POST['password']));

?>Weaknesses
As a security measure, storing only hashes of passwords in your database will ensure that an attacker's job is made that much more difficult. Let's look at the steps they'll now take in an effort to compromise your system. Assuming that they've managed to access your user database and list of hashes, there's no way that they can then recover the original passwords to your system. Or is there?

The attacker will be able to look at your hashes and immediately know that any accounts with the same password hash must therefore also have the same password. Not such a problem if neither of the account passwords is known - or is it? A common technique employed to recover the original plain text from a hash is cracking, otherwise known as 'brute forcing'. Using this methodology an attacker will generate hashes for numerous potential passwords (either generated randomly or from a source of potential words, for example a dictionary attack). The hashes generated are compared with those in your user database and any matches will reveal the password for the user in question.

Modern computer hardware can generate MD5 and SHA-1 hashes very quickly - in some cases at rates of thousands per second. Hashes can be generated for every word in an entire dictionary (possibly including alpha-numeric variants) well in advance of an attack. Whilst strong passwords and longer pass phrases provide a reasonable level of protection against such attacks, you cannot always guarantee that your users will be well informed about such practices. It's also less than ideal that the same password used on multiple accounts (or multiple systems for that matter) will reveal itself with an identical hash.

Making It Better
Both of these weaknesses in the hashing strategy can be overcome by making a small addition to our hashing algorithm. Before generating the hash we create a random string of characters of a predetermined length, and prepend this string to our plain text password. Provided the string (called a "salt") is of sufficient length - and of course sufficiently random - the resulting hash will almost certainly be different each time we execute the function. Of course we must also store the salt we've used in the database along with our hash but this is generally no more of an issue than extending the width of the field by a few characters.

When we validate a user's login credentials we follow the same process, only this time we use the salt from our database instead of generating a new random one. We add the user supplied password to it, run our hashing algorithm, then compare the result with the hash stored in that user's profile.

define('SALT_LENGTH', 9);

function generateHash($plainText, $salt = null)
{
if ($salt === null)
{
$salt = substr(md5(uniqid(rand(), true)), 0, SALT_LENGTH);
}
else
{
$salt = substr($salt, 0, SALT_LENGTH);
}

return $salt . sha1($salt . $plainText);
}

?>Note: The function above is limited in that the maximum salt length is 32 characters. You may wish to write your own salt generator to overcome this limit and increase the entropy of the string.

Calling generateHash() with a single argument (the plain text password) will cause a random string to be generated and used for the salt. The resulting string consists of the salt followed by the SHA-1 hash - this is to be stored away in your database. When you're checking a user's login, the situation is slightly different in that you already know the salt you'd like to use. The string stored in your database can be passed to generateHash() as the second argument when generating the hash of a user-supplied password for comparison.

Using a salt overcomes the issue of multiple accounts with the same password revealing themselves with identical hashes in your database. Although two passwords may be the same the salts will almost certainly be different, so the hashes will look nothing alike.

Dictionary attacks with pre-generated lists of hashes will be useless for the same reason - the attacker will now have to recalculate their entire dictionary for every individual account they're attempting to crack.

Summary
We've seen now what hashes are and why you should store them instead of the plain text passwords they represent in your database. The examples above are a starting point and will get you on the right track with using hashes in your PHP applications. A little bit of work now may well mean much less of a headache further down the track!

About The Author
James McGlinn is a developer and project manager for Servers.co.nz where he provides application design, development and auditing services for a range of clients in New Zealand and abroad. PHP has been his language of choice since 1999. He is a Zend Certified Engineer and founded and facilitates the NZ PHP Users Group.

For more information he can be reached through his home page at http://james.mcglinn.org/.

Storing Password in DB

Storing Passwords In The Database

When security is managed within applications there is often a need to store passwords in database tables. This in itself can lead to security issues since people with appropriate privileges can read the contents of the security tables. A common approach to solving this is to encrypt the password before storing it. The problem with encryption is that it implies a possible decryption mechanism that could expose a hole in your security. A safer alternative is to store a hash of the username and password. In this article I'll present a simple example of this process using the DBMS_OBFUSCATION_TOOLKIT package that is available in Oracle8i and Oracle9i:

Security Table
Security Package
Testing
Security Table
First we must build a table to hold the security information:

CREATE TABLE app_users (
id NUMBER(10) NOT NULL,
username VARCHAR2(30) NOT NULL,
password VARCHAR2(16) NOT NULL
)
/

ALTER TABLE app_users ADD (
CONSTRAINT app_users_pk PRIMARY KEY (id)
)
/

ALTER TABLE app_users ADD (
CONSTRAINT app_users_uk UNIQUE (username)
)
/

CREATE SEQUENCE app_users_seq
/
Security Package
Next we create the package that contains the specification of the security code:

CREATE OR REPLACE PACKAGE app_user_security AS

FUNCTION get_hash (p_username IN VARCHAR2,
p_password IN VARCHAR2)
RETURN VARCHAR2;

PROCEDURE add_user (p_username IN VARCHAR2,
p_password IN VARCHAR2);

PROCEDURE change_password (p_username IN VARCHAR2,
p_old_password IN VARCHAR2,
p_new_password IN VARCHAR2);

PROCEDURE valid_user (p_username IN VARCHAR2,
p_password IN VARCHAR2);

FUNCTION valid_user (p_username IN VARCHAR2,
p_password IN VARCHAR2)
RETURN BOOLEAN;

END;
/
We then create the package body to define the actual operations:

CREATE OR REPLACE PACKAGE BODY app_user_security AS

FUNCTION get_hash (p_username IN VARCHAR2,
p_password IN VARCHAR2)
RETURN VARCHAR2 AS
BEGIN
RETURN DBMS_OBFUSCATION_TOOLKIT.MD5(
input_string => UPPER(p_username) || '/' || UPPER(p_password));
END;

PROCEDURE add_user (p_username IN VARCHAR2,
p_password IN VARCHAR2) AS
BEGIN
INSERT INTO app_users (
id,
username,
password
)
VALUES (
app_users_seq.NEXTVAL,
UPPER(p_username),
get_hash(p_username, p_password)
);

COMMIT;
END;

PROCEDURE change_password (p_username IN VARCHAR2,
p_old_password IN VARCHAR2,
p_new_password IN VARCHAR2) AS
v_rowid ROWID;
BEGIN
SELECT rowid
INTO v_rowid
FROM app_users
WHERE username = UPPER(p_username)
AND password = get_hash(p_username, p_old_password)
FOR UPDATE;

UPDATE app_users
SET password = get_hash(p_username, p_new_password)
WHERE rowid = v_rowid;

COMMIT;
EXCEPTION
WHEN NO_DATA_FOUND THEN
RAISE_APPLICATION_ERROR(-20000, 'Invalid username/password.');
END;

PROCEDURE valid_user (p_username IN VARCHAR2,
p_password IN VARCHAR2) AS
v_dummy VARCHAR2(1);
BEGIN
SELECT '1'
INTO v_dummy
FROM app_users
WHERE username = UPPER(p_username)
AND password = get_hash(p_username, p_password);
EXCEPTION
WHEN NO_DATA_FOUND THEN
RAISE_APPLICATION_ERROR(-20000, 'Invalid username/password.');
END;

FUNCTION valid_user (p_username IN VARCHAR2,
p_password IN VARCHAR2)
RETURN BOOLEAN AS
BEGIN
valid_user(p_username, p_password);
RETURN TRUE;
EXCEPTION
WHEN OTHERS THEN
RETURN FALSE;
END;

END;
/
The overloads of VALID_USER allow the security check to be performed in a different manner.

The GET_HASH function is used to hash the combination of the username and password. It always returns a VARCHAR2(16) regardless of the length of the input parameters. This level of compression means that the hash value may not be unique, hence the unique constraint on the USERNAME column.

The DBMS_UTILITY.GET_HASH_VALUE function could be used to replace the DBMS_OBFUSCATION_TOOLKIT.MD5 function, but the hashing algorithm of the former is not garaunteed to stay constant between database versions.

Testing
First we create a new user:

SQL> exec app_user_security.add_user('tim','hall');

PL/SQL procedure successfully completed.

SQL> select * from app_users;

ID USERNAME PASSWORD
---------- ------------------------------ ----------------
1 TIM [w?44Z䪿?8fE??pre>
Next we check the VALID_USER procedure:

SQL> EXEC app_user_security.valid_user('tim','hall');

PL/SQL procedure successfully completed.

SQL> EXEC app_user_security.valid_user('tim','abcd');
BEGIN app_user_security.valid_user('tim','hall1'); END;

*
ERROR at line 1:
ORA-20000: Invalid username/password.
ORA-06512: at "W2K1.APP_USER_SECURITY", line 37
ORA-06512: at line 1
Next we check the VALID_USER function:

SQL> SET SERVEROUTPUT ON
SQL> BEGIN
2 IF app_user_security.valid_user('tim','hall') TH
3 DBMS_OUTPUT.PUT_LINE('TRUE');
4 ELSE
5 DBMS_OUTPUT.PUT_LINE('FALSE');
6 END IF;
7 END;
8 /
TRUE

PL/SQL procedure successfully completed.

SQL> BEGIN
2 IF app_user_security.valid_user('tim','abcd') T
3 DBMS_OUTPUT.PUT_LINE('TRUE');
4 ELSE
5 DBMS_OUTPUT.PUT_LINE('FALSE');
6 END IF;
7 END;
8 /
FALSE

PL/SQL procedure successfully completed.

SQL>
Finally we check the CHANGE_PASSWORD procedure:

SQL> exec app_user_security.change_password('tim','hall','hall1');

PL/SQL procedure successfully completed.

SQL> exec app_user_security.change_password('tim','abcd','abcd1');
BEGIN app_user_security.change_password('tim','abcd','abcd1'); END;

*
ERROR at line 1:
ORA-20000: Invalid username/password.
ORA-06512: at "W2K1.APP_USER_SECURITY", line 47
ORA-06512: at line 1

SQL>
For more information see:

DBMS_OBFUSCATION_TOOLKIT
DBMS_CRYPTO in Oracle 10g

Tuesday, March 15, 2005

Ajax: A New Approach to Web Applications

by Jesse James Garrett
February 18, 2005

If anything about current interaction design can be called “glamorous,” it’s creating Web applications. After all, when was the last time you heard someone rave about the interaction design of a product that wasn’t on the Web? (Okay, besides the iPod.) All the cool, innovative new projects are online.

Despite this, Web interaction designers can’t help but feel a little envious of our colleagues who create desktop software. Desktop applications have a richness and responsiveness that has seemed out of reach on the Web. The same simplicity that enabled the Web’s rapid proliferation also creates a gap between the experiences we can provide and the experiences users can get from a desktop application.

That gap is closing. Take a look at Google Suggest. Watch the way the suggested terms update as you type, almost instantly. Now look at Google Maps. Zoom in. Use your cursor to grab the map and scroll around a bit. Again, everything happens almost instantly, with no waiting for pages to reload.

Google Suggest and Google Maps are two examples of a new approach to web applications that we at Adaptive Path have been calling Ajax. The name is shorthand for Asynchronous JavaScript + XML, and it represents a fundamental shift in what’s possible on the Web.

Defining Ajax

Ajax isn’t a technology. It’s really several technologies, each flourishing in its own right, coming together in powerful new ways. Ajax incorporates:

standards-based presentation using XHTML and CSS;

dynamic display and interaction using the Document Object Model;

data interchange and manipulation using XML and XSLT;

asynchronous data retrieval using XMLHttpRequest;

and JavaScript binding everything together.

Wednesday, March 09, 2005

VC++ Tool for Debugging and Testing

http://www.mindcracker.com/mindcracker/c_cafe/debug/tools.asp

Thursday, March 03, 2005

Good C++ Interview Questions

Link: http://www.techinterviews.com/index.php?p=69
1. How do you decide which integer type to use?
2. What should the 64-bit integer type on new, 64-bit machines be?
3. What’s the best way to declare and define global variables?
4. What does extern mean in a function declaration?
5. What’s the auto keyword good for?
6. I can’t seem to define a linked list node which contains a pointer to itself.
7. How do I declare an array of N pointers to functions returning pointers to functions returning pointers to characters?
8. How can I declare a function that returns a pointer to a function of its own type?
9. My compiler is complaining about an invalid redeclaration of a function, but I only define it once and call it once. What’s happening?
10. What can I safely assume about the initial values of variables which are not explicitly initialized?
11. Why can’t I initialize a local array with a string?
12. What is the difference between char a[] = “string"; and char *p = “string"; ?
13. How do I initialize a pointer to a function?

Tech Blog

Wednesday, February 22, 2006

Identifying unused indexes in Oracle9i

Tuesday, December 13, 2005

Ajax: A New Approach to Web Applications

Defining Ajax

How Ajax is Different

Who’s Using Ajax

Moving Forward

Ajax Q&A

Wednesday, October 05, 2005

Supplementary Materials for Distributed Processing

Supplementary Materials for Distributed Processing

資料處理的演進

分散式處理的特性

分散式處理的目標

何謂通透性?

分散式處理的優缺點

分散式處理的架構

Distributed Processing vs. Parallel Processing

Tightly coupled vs. loosely coupled

常見的分散式應用系統

socket and RPC

Challenges

Tuesday, June 21, 2005

Password Hashing

Storing Password in DB

Tuesday, March 15, 2005

Ajax: A New Approach to Web Applications

Defining Ajax

Wednesday, March 09, 2005

VC++ Tool for Debugging and Testing

Thursday, March 03, 2005

Good C++ Interview Questions

Links

Previous Posts

Archives