allinone

Sunday, October 6, 2013

Java Web Services

Java Web Services:

What is a web service?

A web service is a service which is available over internet use a standardized xml messaging system and is not tied to any operating system.

Main purpose of web service:

The business logic can be written once it can be accessed from various User Interface of different operating systems and different devices. Such as Mobile, Computers, etc.

Main Component of Web service :(XML)

The main advantage that a web service provides by means of XML.XML can be accessed and viewed from different Interface without worrying about platform or operating system.

Ex: A web service written in Java can be accessed by a .Net or Perl Client by XML (request & response) vice versa.

In Java they are variety of ways from which we can create, deploy and access web service.

We will be seeing the below concepts:

I. Basics of Web service:

a. Web Service Architecture

b. Basic flow of Web service

c. SOAP

d. WSDL

e. UDDI

II. Java way of handling web services:

a. JAX-WS ( Java web service API)

b. JAX-RPC (Java remote procedure call)

c. JAXP (Java Parsing API)

d. JAXM (Java Messaging API)

e. JAXB (Java XML Binding API)

f. JAXR (Java Metadata Registry API)

g. JAX-RS (Java Restful Services)

They Beginning part would be more concept wise it would be boring but it is important to understand these terminologies.

a. Web Service Architecture :

They are two ways to examine the Web Service Architecture they are:

- Examining individual role of a each webservie actor

- Web Service protocol stack

Roles of Web service :

It has three components :

Service provider

This is the provider of the web service. The service provider implements the service

and makes it available on the Internet.

Service requestor

This is any consumer of the web service. The requestor utilizes an existing web

service by opening a network connection and sending an XML request.

Service registry

This is a logically centralized directory of services. The registry provides a central

place where developers can publish new services or find existing ones. It therefore

serves as a centralized clearinghouse for companies and their services.

Web Service protocol stack :

They web service protocol stack is still evolving .

It has four main layers

Service Transport :

This layer is responsible for the transporting of Message between Applications currently have HTTP,SMTP,FTP,BEEP.

XML Messaging :

Responsible for encoding message in a common xml format that message can be understood by either end .includes XML-RPC and SOAP.

Service description

This layer is responsible for describing the public interface to a specific web service.

Currently, service description is handled via the Web Service Description Language

(WSDL).

Service discovery

This layer is responsible for centralizing services into a common registry, and

providing easy publish/find functionality. Currently, service discovery is handled

via Universal Description, Discovery, and Integration (UDDI).

As web services evolve, additional layers may be added, and additional technologies may

be added to each layer.

b. Basic flow of Web service :

To understand the basic flow we must know this term

SOAP, WSDL, UDDI [We will see in detail about these in later part as for know we will see what is what]

SOAP : Simple Object Access protocol .It is a XML based protocol for exchanging information .It can be used along with variety of protocol such as HTTP,FTP etc along with various operating system.

Given below is the simple request & response obtain from a web service created by JAX-WS technique in java. We will cover in detail about JAX-WS in later part.

package test;

import java.util.InputMismatchException;

import javax.jws.WebMethod;

import javax.jws.WebParam;

import javax.jws.WebService;

@WebService()

public class ss {

@WebMethod(operationName = "hello")

public String hello(@WebParam(name = "input") // Method name hello accept a String Paramter

String input) {

return input;

}

After deploying this business method .We checks the request and response.

EX : SOAP REQUEST

<?xml version="1.0" encoding="UTF-8"?>

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">

<soap:Body>

<ns0:hello xmlns:ns0="http://test/">

<input>Hello World</input>

</ns0:hello>

</soap:Body>

</soap:Envelope>

EX: SOAP RESPONSE

<?xml version="1.0" encoding="UTF-8"?>

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">

<soap:Body>

<ns0:helloResponse xmlns:ns0="http://test/">

<return>Hello World</return>

</ns0:helloResponse>

</soap:Body>

</soap:Envelope>

WSDL :

- Web Service Description Language.

- It is a web grammar of specifying a public interface of web service. This public interface can include information on all publicly available functions, data type information for all XML messages, binding information about the specific transport protocol to be used, and address information for locating the specified Service.

- WSDL is not necessarily tied to a specific XML messaging system, but it does include built-in extensions for describing SOAP services.

UDDI :

- UDDI currently represents the discovery layer within the web service protocol stack.

- UDDI was originally created by Microsoft, IBM, and Ariba, and represents a technical specification for publishing and finding businesses and web services.

- UDDI consists of two parts. First, UDDI is a technical specification for building a distribute directory of businesses and web services. Data is stored within a specific XML format. The UDDI specification includes API details for searching existing data and publishing new data.

- Second, the UDDI Business Registry is a fully operational implementation of the UDDI specification.

Accessing a Web Service: Service Requester Perspective

Find a Web service from UDDI

Retrieve WSDL or XML-RPC human Instructions

Create a XML-RPC or SOAP Client

Invoke Remote Procedure Call

Creating a Web service: Service Developer Perspective

Find the business functionality(business logic)

Create a XML-RPC or SOAP Service Wrapper

Create a WSDL or XML RPC Instructions

Deploy Service

Register new service via UDDI

SOAP :

- Soap is a XML -Based Protocol for exchanging information between computers.

- It can be used in variety of Messaging System and can be delivered in variety of Transport Protocols.

- It is Platform Independent.

- Soap can be implemented in any technologies such as Java,.NET,C++,Perl etc.

Soap Specification has "3" Major Parts.

1. Soap Envelope

2.Data Encoding Rules

3. RPC Conventions

1. Soap Envelope : A Soap Envelope contains the specific rules for encapsulating data being transferred between computers.This includes Application-Specific data ,such as Method name to invoke,Method Parameters and return types.

2. Data Encoding Rules:

To exchange data, computers must agree on rules for encoding specific data types.For example, two computers that process stock quotes need an agreed-upon rule for encoding float data types; likewise, two computers that process multiple stock quotes need an agreed-upon rule for encoding arrays. SOAP therefore includes its own set of conventions for encoding data types. Most of these conventions are based on the W3C XML Schema specification.

3. RPC Conventions :

SOAP can be used in a variety of messaging systems, including one-way and two way messaging. For two-way messaging, SOAP defines a simple convention for representing remote procedure calls and responses. This enables a client application to specify a remote method name, include any number of parameters, and receive a response from the server.

Example :

A simple soap example which has the method "Temperature " and passes the zip code in the request and gets the temperature in Response.

Request :

<?xml version='1.0' encoding='UTF-8'?>

<SOAP-ENV:Envelope

xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<SOAP-ENV:Body>

<ns1:getTemp

xmlns:ns1="urn:xmethods-Temperature"

SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

</ns1:getTemp>

</SOAP-ENV:Body>

</SOAP-ENV:Envelope>

In this example couple of things have to be notice :

- The first line is the XML declaration with version and encoding.

- The second line contains mandatory "Soap Envelope"which has the details of all information which we need to pass.

- The Envelope contains different namespace which are required for making a soap request.

- The next line we have Soap body tag which again a mandatory data element.

- Next we have the Method name "Temperature" and parameter zip-code as String.

Note : We don't exactly need to know every thing happens in soap, but basic understanding is important. So when we request a service from any language we just access the method and pass the value behind the scenes , request will formed in this manner and send to the service.

Response :

<SOAP-ENV:Envelope

xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<SOAP-ENV:Body>

<ns1:getTempResponse

xmlns:ns1="urn:xmethods-Temperature"

SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

</ns1:getTempResponse>

</SOAP-ENV:Body>

</SOAP-ENV:Envelope>

As we can see , a response with float return type of Temp is returned.From this Response we have to extract the data. This is how the request and response happens in real world.

But, as i said when we access a web service from languages such as java,c++ etc we don't really see what is happening in background we formed the request using various web service API's and framework and extract the response by the same way.

Now we will see about the elements of SOAP Messaging :

A Soap Message basically contains the following elements

1. Envelope - Mandatory

2. Headers - Optional

3. Body - Mandatory

4. Fault - Optional

1. Envelope :

- It is a root element of SOAP Messages.

- Soap uses XML namespaces to differentiate versions.

Ex:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"

The SOAP 1.1 namespace URI is http://schemas.xmlsoap.org/soap/envelope/, whereas the SOAP 1.2 namespace URI is http://www.w3.org/2001/09/soap-envelope.If the Envelope is in any other namespace, it is considered a versioning error.

2. Headers :

- It is an optional element is Soap Messaging.

- The optional Header element offers a flexible framework for specifying additional
application-level requirements.

- For example, the Header element can be used to specify a digital signature for password-protected services.

-The Header framework provides an open mechanism for authentication, transaction management, and payment authorization.

It has "2" Main Attributes :

- Actor

- Must understand Attribute

Actor attribute:

The SOAP protocol defines a message path as a list of SOAP service nodes. Each of these intermediate nodes can perform some processing and then forward the message to the next node in the chain. By setting the Actor attribute, the client can specify the recipient of the SOAP header.

MustUnderstand attribute :

Indicates whether a Header element is optional or mandatory. If set to true , the recipient must understand and process the Header attribute according to its defined semantics, or return a fault.

SOAP 1.1 uses integer values of 1/0 for the MustUnderstand attribute; SOAP 1.2 uses Boolean values of true/1/false/0.

The Header specifies a payment account, which must be understood and processed by the SOAP server.

Here is an example Header :

<SOAP-ENV:Header>
<ns1:PaymentAccount xmlns:ns1="urn:ecerami" SOAP-ENV: mustUnderstand="true">
orsenigo473
</ns1:PaymentAccount >
</SOAP-ENV:Header>

3. Body :

The Body element is mandatory for all SOAP messages. As we have already seen, typical
uses of the Body element include RPC requests and responses.

<SOAP-ENV:Body>

<ns1:getTemp

xmlns:ns1="urn:xmethods-Temperature"

SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

</ns1:getTemp>

</SOAP-ENV:Body>

4. Fault :

- It is an optional element.

- In the event of an error, the Body element will include a Fault element.

- The following code is a sample Fault. The client has requested a method named
ValidateCreditCard , but the service does not support such a method.

This represents a client request error, and the server returns the following SOAP response:

<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<SOAP-ENV:Body>
<SOAP-ENV:Fault>
<faultcode xsi:type="xsd:string">SOAP-ENV:Client</faultcode>
<faultstring xsi:type="xsd:string">
Failed to locate method (ValidateCreditCard) in class
(examplesCreditCard) at /usr/local/ActivePerl-5.6/lib/
site_perl/5.6.0/SOAP/Lite.pm line 1555.
</faultstring>
</SOAP-ENV:Fault></SOAP-ENV:Body>
</SOAP-ENV:Envelope>

SOAP Fault codes :

SOAPENV: VersionMismatch
Indicates that the SOAP Envelope element included an invalid namespace, signifying a version mismatch.

SOAPENV:MustUnderstand
Indicates that the recipient is unable to properly process a Header-element with a must-understand attribute set to true. This ensures that must-understand elements are not silently ignored.

SOAP-ENV:Client
Indicates that the client request contained an error. For example,the client has specified a nonexistent method name, or has supplied the incorrect parameters to the method.

SOAP-ENV:Server
Indicates that the server is unable to process the client request. For example, a service providing product data may be unable to connect to the database.

SOAP :

- Soap is a XML -Based Protocol for exchanging information between computers.

- It can be used in variety of Messaging System and can be delivered in variety of Transport Protocols.

- It is Platform Independent.

- Soap can be implemented in any technologies such as Java,.NET,C++,Perl etc.

Soap Specification has "3" Major Parts.

1. Soap Envelope

2.Data Encoding Rules

3. RPC Conventions

2. Data Encoding Rules:

3. RPC Conventions :

Example :

A simple soap example which has the method "Temperature " and passes the zip code in the request and gets the temperature in Response.

Request :

<?xml version='1.0' encoding='UTF-8'?>

<SOAP-ENV:Envelope

xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<SOAP-ENV:Body>

<ns1:getTemp

xmlns:ns1="urn:xmethods-Temperature"

SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

</ns1:getTemp>

</SOAP-ENV:Body>

</SOAP-ENV:Envelope>

In this example couple of things have to be notice :

- The first line is the XML declaration with version and encoding.

- The second line contains mandatory "Soap Envelope"which has the details of all information which we need to pass.

- The Envelope contains different namespace which are required for making a soap request.

- The next line we have Soap body tag which again a mandatory data element.

- Next we have the Method name "Temperature" and parameter zip-code as String.

Response :

<SOAP-ENV:Envelope

xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<SOAP-ENV:Body>

<ns1:getTempResponse

xmlns:ns1="urn:xmethods-Temperature"

SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

</ns1:getTempResponse>

</SOAP-ENV:Body>

</SOAP-ENV:Envelope>

As we can see , a response with float return type of Temp is returned.From this Response we have to extract the data. This is how the request and response happens in real world.

Now we will see about the elements of SOAP Messaging :

A Soap Message basically contains the following elements

1. Envelope - Mandatory

2. Headers - Optional

3. Body - Mandatory

4. Fault - Optional

1. Envelope :

- It is a root element of SOAP Messages.

- Soap uses XML namespaces to differentiate versions.

Ex:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"

2. Headers :

- It is an optional element is Soap Messaging.

- The optional Header element offers a flexible framework for specifying additional
application-level requirements.

- For example, the Header element can be used to specify a digital signature for password-protected services.

-The Header framework provides an open mechanism for authentication, transaction management, and payment authorization.

It has "2" Main Attributes :

- Actor

- Must understand Attribute

Actor attribute:

MustUnderstand attribute :

Indicates whether a Header element is optional or mandatory. If set to true , the recipient must understand and process the Header attribute according to its defined semantics, or return a fault.

SOAP 1.1 uses integer values of 1/0 for the MustUnderstand attribute; SOAP 1.2 uses Boolean values of true/1/false/0.

The Header specifies a payment account, which must be understood and processed by the SOAP server.

Here is an example Header :

<SOAP-ENV:Header>
<ns1:PaymentAccount xmlns:ns1="urn:ecerami" SOAP-ENV: mustUnderstand="true">
orsenigo473
</ns1:PaymentAccount >
</SOAP-ENV:Header>

3. Body :

The Body element is mandatory for all SOAP messages. As we have already seen, typical
uses of the Body element include RPC requests and responses.

<SOAP-ENV:Body>

<ns1:getTemp

xmlns:ns1="urn:xmethods-Temperature"

SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

</ns1:getTemp>

</SOAP-ENV:Body>

4. Fault :

- It is an optional element.

- In the event of an error, the Body element will include a Fault element.

- The following code is a sample Fault. The client has requested a method named
ValidateCreditCard , but the service does not support such a method.

This represents a client request error, and the server returns the following SOAP response:

SOAP Fault codes :

SOAPENV: VersionMismatch
Indicates that the SOAP Envelope element included an invalid namespace, signifying a version mismatch.

SOAP-ENV:Client
Indicates that the client request contained an error. For example,the client has specified a nonexistent method name, or has supplied the incorrect parameters to the method.

SOAP-ENV:Server
Indicates that the server is unable to process the client request. For example, a service providing product data may be unable to connect to the database.

WSDL :

WSDL stands for Web Services Description Language

WSDL is an XML based protocol for information exchange in decentralized and distributed environments.

WSDL is the standard format for describing a web service.

WSDL is an interface provided by the webservice publisher to the clients .Using this interface client can understands what are they type of operation the webservice is offering.

WSDL is declard as a part of UDDI while registering the webservice in UDDI.

WSDL Elements:

Three major elements of WSDL that can be defined separately and they are:

a.Types - It contains all the datatypes specification what type of values methods accepts.

b.Operations - Define the Methods provided by the Webservice and it's input parameters and it's result.

c.Binding - Contains the end point URL of the webservice .

Apart from these WSDL also have other elements.

Definition: This element must be the root element of all WSDL documents. It defines the name of the web service, declares multiple namespaces used throughout the remainder of the document, and contains all the service elements described here.

Data types(xsd): the data types - in the form of XML schemas(xsd) or possibly some other mechanism - to be used in the messages. This can define inside the WSDL or can refer from external location.

Message: an abstract definition of the data, in the form of a message presented either as an entire document or as arguments to be mapped to a method invocation.

Operation: the abstract definition of the operation for a message, such as naming a method, message queue, or business process, that will accept and process the message

Port type : an abstract set of operations mapped to one or more end points, defining the collection of operations for a binding; the collection of operations, because it is abstract, can be mapped to multiple transports through various bindings.

Binding: the concrete protocol and data formats for the operations and messages defined for a particular port type.

Port: a combination of a binding and a network address, providing the target address of the service communication.

Service: a collection of related end points encompassing the service definitions in the file; the services map the binding to the port and include any extensibility definitions.

In addition to these major elements, the WSDL specification also defines the following utility elements:

Documentation: element is used to provide human-readable documentation and can be included inside any other WSDL element.

Import: element is used to import other WSDL documents or XML Schemas.

Ex:

<?xml version="1.0" encoding="utf-8"?>
<wsdl:definitions xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:tm="http://microsoft.com/wsdl/mime/textMatching/" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:mime="http://schemas.xmlsoap.org/wsdl/mime/" xmlns:tns="http://tempuri.org/" xmlns:s="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://schemas.xmlsoap.org/wsdl/soap12/" xmlns:http="http://schemas.xmlsoap.org/wsdl/http/" targetNamespace="http://tempuri.org/" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/">
<wsdl:types>
<s:schema elementFormDefault="qualified" targetNamespace="http://tempuri.org/">
<s:element name="FahrenheitToCelsius">
<s:complexType>
<s:sequence>
<s:element minOccurs="0" maxOccurs="1" name="Fahrenheit" type="s:string" />
</s:sequence>
</s:complexType>
</s:element>
<s:element name="FahrenheitToCelsiusResponse">
<s:complexType>
<s:sequence>
<s:element minOccurs="0" maxOccurs="1" name="FahrenheitToCelsiusResult" type="s:string" />
</s:sequence>
</s:complexType>
</s:element>
<s:element name="CelsiusToFahrenheit">
<s:complexType>
<s:sequence>
<s:element minOccurs="0" maxOccurs="1" name="Celsius" type="s:string" />
</s:sequence>
</s:complexType>
</s:element>
<s:element name="CelsiusToFahrenheitResponse">
<s:complexType>
<s:sequence>
<s:element minOccurs="0" maxOccurs="1" name="CelsiusToFahrenheitResult" type="s:string" />
</s:sequence>
</s:complexType>
</s:element>
<s:element name="string" nillable="true" type="s:string" />
</s:schema>
</wsdl:types>
<wsdl:message name="FahrenheitToCelsiusSoapIn">
<wsdl:part name="parameters" element="tns:FahrenheitToCelsius" />
</wsdl:message>
<wsdl:message name="FahrenheitToCelsiusSoapOut">
<wsdl:part name="parameters" element="tns:FahrenheitToCelsiusResponse" />
</wsdl:message>
<wsdl:message name="CelsiusToFahrenheitSoapIn">
<wsdl:part name="parameters" element="tns:CelsiusToFahrenheit" />
</wsdl:message>
<wsdl:message name="CelsiusToFahrenheitSoapOut">
<wsdl:part name="parameters" element="tns:CelsiusToFahrenheitResponse" />
</wsdl:message>
<wsdl:message name="FahrenheitToCelsiusHttpPostIn">
<wsdl:part name="Fahrenheit" type="s:string" />
</wsdl:message>
<wsdl:message name="FahrenheitToCelsiusHttpPostOut">
<wsdl:part name="Body" element="tns:string" />
</wsdl:message>
<wsdl:message name="CelsiusToFahrenheitHttpPostIn">
<wsdl:part name="Celsius" type="s:string" />
</wsdl:message>
<wsdl:message name="CelsiusToFahrenheitHttpPostOut">
<wsdl:part name="Body" element="tns:string" />
</wsdl:message>
<wsdl:portType name="TempConvertSoap">
<wsdl:operation name="FahrenheitToCelsius">
<wsdl:input message="tns:FahrenheitToCelsiusSoapIn" />
<wsdl:output message="tns:FahrenheitToCelsiusSoapOut" />
</wsdl:operation>
<wsdl:operation name="CelsiusToFahrenheit">
<wsdl:input message="tns:CelsiusToFahrenheitSoapIn" />
<wsdl:output message="tns:CelsiusToFahrenheitSoapOut" />
</wsdl:operation>
</wsdl:portType>
<wsdl:portType name="TempConvertHttpPost">
<wsdl:operation name="FahrenheitToCelsius">
<wsdl:input message="tns:FahrenheitToCelsiusHttpPostIn" />
<wsdl:output message="tns:FahrenheitToCelsiusHttpPostOut" />
</wsdl:operation>
<wsdl:operation name="CelsiusToFahrenheit">
<wsdl:input message="tns:CelsiusToFahrenheitHttpPostIn" />
<wsdl:output message="tns:CelsiusToFahrenheitHttpPostOut" />
</wsdl:operation>
</wsdl:portType>
<wsdl:binding name="TempConvertSoap" type="tns:TempConvertSoap">
<soap:binding transport="http://schemas.xmlsoap.org/soap/http" />
<wsdl:operation name="FahrenheitToCelsius">
<soap:operation soapAction="http://tempuri.org/FahrenheitToCelsius" style="document" />
<wsdl:input>
<soap:body use="literal" />
</wsdl:input>
<wsdl:output>
<soap:body use="literal" />
</wsdl:output>
</wsdl:operation>
<wsdl:operation name="CelsiusToFahrenheit">
<soap:operation soapAction="http://tempuri.org/CelsiusToFahrenheit" style="document" />
<wsdl:input>
<soap:body use="literal" />
</wsdl:input>
<wsdl:output>
<soap:body use="literal" />
</wsdl:output>
</wsdl:operation>
</wsdl:binding>
<wsdl:binding name="TempConvertSoap12" type="tns:TempConvertSoap">
<soap12:binding transport="http://schemas.xmlsoap.org/soap/http" />
<wsdl:operation name="FahrenheitToCelsius">
<soap12:operation soapAction="http://tempuri.org/FahrenheitToCelsius" style="document" />
<wsdl:input>
<soap12:body use="literal" />
</wsdl:input>
<wsdl:output>
<soap12:body use="literal" />
</wsdl:output>
</wsdl:operation>
<wsdl:operation name="CelsiusToFahrenheit">
<soap12:operation soapAction="http://tempuri.org/CelsiusToFahrenheit" style="document" />
<wsdl:input>
<soap12:body use="literal" />
</wsdl:input>
<wsdl:output>
<soap12:body use="literal" />
</wsdl:output>
</wsdl:operation>
</wsdl:binding>
<wsdl:binding name="TempConvertHttpPost" type="tns:TempConvertHttpPost">
<http:binding verb="POST" />
<wsdl:operation name="FahrenheitToCelsius">
<http:operation location="/FahrenheitToCelsius" />
<wsdl:input>
<mime:content type="application/x-www-form-urlencoded" />
</wsdl:input>
<wsdl:output>
<mime:mimeXml part="Body" />
</wsdl:output>
</wsdl:operation>
<wsdl:operation name="CelsiusToFahrenheit">
<http:operation location="/CelsiusToFahrenheit" />
<wsdl:input>
<mime:content type="application/x-www-form-urlencoded" />
</wsdl:input>
<wsdl:output>
<mime:mimeXml part="Body" />
</wsdl:output>
</wsdl:operation>
</wsdl:binding>
<wsdl:service name="TempConvert">
<wsdl:port name="TempConvertSoap" binding="tns:TempConvertSoap">
<soap:address location="http://www.w3schools.com/webservices/tempconvert.asmx" />
</wsdl:port>
<wsdl:port name="TempConvertSoap12" binding="tns:TempConvertSoap12">
<soap12:address location="http://www.w3schools.com/webservices/tempconvert.asmx" />
</wsdl:port>
<wsdl:port name="TempConvertHttpPost" binding="tns:TempConvertHttpPost">
<http:address location="http://www.w3schools.com/webservices/tempconvert.asmx" />
</wsdl:port>
</wsdl:service>
</wsdl:definitions>

The following wsdl is obtained from the webservice TempConvert of w3schools.com
URL : http://www.w3schools.com/webservices/tempconvert.asmx?WSDL
Now,examine this WSDL closely.

wsdl definition- This contains lot of name space we don't need to care about this,these are all can be added/generated by tools. Generally a WSDL is generated by Tools.But we need to understand it content which help us to write the client code.

wsdl types : This contains the in-built xsd(Xml Schema Definition)for the types methods accept.We will discuss about XSD in detail in next part.

wsdl message : This contains the message name which can refer at various place for mention the types of input and output parameters for webservice methods.

wsdl operation : It contains the Methods provided by the webservice.

wsdl input/output : It contains the Methods inputs and output parameter name and it's type.Normally in big WSDL it being refer in external xsd's.

Other elements we actually no need to worry.Since once we obtained the WSDL ,next thing we have to do is generate java classes from it .It can done by tools.Once we obtained the java classes we can write the client code easily and make a request to the webservice.

XSD :

It is used in WSDL Document to define the data types of Method takes .Such as Input Parameters data types,Output Parameter data types,attribute types and return types.

It defines element that can appear in Document.

It defines attributes that can appear in the Document.

It defines the order of Child element and define no.of child elements.

It defines whether an elemnt is empty or can include text.

It defines default and fixed value for elements and attributes.

It can define in the WSDL itself or it can be refer externally by importing the XSD URL in the WSDL just like JavaScript and CSS.

It's an alternative to DTD's.More Powerful then DTD.(Document Type Definition)

Multiple Schema's can refer in one WSDL.Allows re-usability.Allow Namespace.

We can define our own dataype derived from the standard types.

Root Element of XSD :

<schema> : This is the root element of any XML Schema Document.It can contain attribute also.

Syntax:
<? xml version="1.0" ? >
<xs:schema>
------
------
</xs:schema>

Ex:

<?xml version="1.0"?>

<xs:schema xmlns:"http://www.w3.org/2001/XMLSchema" targetNameSpace="http://www.ayaz.com" xmlns="http://www.ayaz.com" elementFromDefault="qualified">
...............
...............

</xs:schema>

Here xs:schema xmlns follows the namespace define the w3 standards. targetNamspace schema defines the elements defines by this schema comes from ayaz.com.elementfromDefault specifies that every element should be namespace qualified.

Simple Example : (note.xml)
------------------------------------
<?xml version="1.0"?>
<note>
<from>Tom</from>
<to>Jack</to>
<heading>Remainders</heading>
<body> Dont'forget</body>
</note>

note.xsd : Defines element for note.xml

<?xml version="1.0"?>
<xs:schema xmlns:"http://www.w3.org/2001/XMLSchema" targetNameSpace="http://www.ayaz.com" xmlns="http://www.ayaz.com" elementFromDefault="qualified">
.
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:String"/>
<xs:element name="from" type="xs:String/>
<xs:element name="heading" type="xs:String/>
<xs:element name="body" type="xs:String/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

We will look on each of the element in detail in later part.So i have the xsd for my xml element now i can refer in my Document .

<?xml version="1.0"?>
< note xmlns="http://www.ayaz.com" xmlns=xsi="http://www.w3org.2001/XMLSchema-instance"
xsi:schemaLocation="http://www.ayaz.com/note.xsd">
<from>Tom</from>
<to>Jack</to>
<heading>Remainders</heading>
<body> Dont'forget</body>
</note>
Red color highlighted - tells the Schema validator that all elements used in this XML Document are declared in "http://www.ayaz.com"

Blue color highlighted - tells once you have the XML Schema instances we can use the SchemaLocation Attribute.

Brown Color highlighted - tells the namespace used for the xsd and it's location.

2. XSD Elements:

XSD elements can be classified as "Single" and Complex" Element.

a.Simple Element :

A Simple element does not have any attribute and other elements.

It can contain only Text.The Text can be of any type (String,boolean,Int)or it can be custom type define by the user.

We can also add restrictions such as limit the content or to match the specific pattern.

Synatx:

<xs:element name="------ " type="------"/>

Ex: <xs:element name="firstname" type="xs:String"/>

Default & Fixed Value :

Default - This value is taken when there is no value is provided.
Fixed - This value is taken assigned automatically to the element and cannot assign other values.

Ex:
<xs:element name="firstname" type="xs:String" default="John "/>

<xs:element name="color" type="xs:String" fixed="red"/>

b. XSD Attributes:

Simple elements cannot have attributes. If an element is specified with attribute then it is considered ad complex type element.But a attribute always declared as simple type.

Synatx :

<xs:attribute name="-----" type="----"/>

Ex:

<xs:attribute name="lang" type="xs:String" />

<lastname lang="en">Smith</lastname> -- Refering in XML Document.

Note: Default and fixed behave same way as the does for simple element.

Optional & required

Attribute can be specify as optional or required one.Attributes are optional by default to make the attribute declared as required we should declared by the "use" keyword.

<xs:attribute name="lang" type="xs:String" use="required'/>

c. XSD Restrictions /Facets   :

<xs:restriction> is used to set restrcitions.

Restriction on Simple Element

Ex: To accept values between 0-120

<xs:element name="age">
<xs:simpleType>
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="0"/>
      <xs:maxInclusive value="120"/>
    </xs:restriction>
</xs:simpleType>
</xs:element>

To set Restrictions on set of values use the enumeration

<xs:element name="car">
<xs:simpleType>
    <xs:restriction base="xs:string">
      <xs:enumeration value="Audi"/>
      <xs:enumeration value="Golf"/>
      <xs:enumeration value="BMW"/>
    </xs:restriction>
</xs:simpleType>
</xs:element>

Length Restriction

<xs:element name="password">
<xs:simpleType>
    <xs:restriction base="xs:string">
      <xs:length value="8"/>
    </xs:restriction>
</xs:simpleType>
</xs:element>

Constraint    Description

enumeration	Defines a list of acceptable values
fractionDigits	Specifies the maximum number of decimal places allowed. Must be equal to or greater than zero
length	Specifies the exact number of characters or list items allowed. Must be equal to or greater than zero
maxExclusive	Specifies the upper bounds for numeric values (the value must be less than this value)
maxInclusive	Specifies the upper bounds for numeric values (the value must be less than or equal to this value)
maxLength	Specifies the maximum number of characters or list items allowed. Must be equal to or greater than zero
minExclusive	Specifies the lower bounds for numeric values (the value must be greater than this value)
minInclusive	Specifies the lower bounds for numeric values (the value must be greater than or equal to this value)
minLength	Specifies the minimum number of characters or list items allowed. Must be equal to or greater than zero
pattern	Defines the exact sequence of characters that are acceptable
totalDigits	Specifies the exact number of digits allowed. Must be greater than zero
whiteSpace	Specifies how white space (line feeds, tabs, spaces, and carriage returns) is handled

3. Complex XSD Elements :

A complex element is an XML element that contains other elements and/or attributes.
There are four kinds of complex elements:

empty elements
elements that contain only other elements
elements that contain only text
elements that contain both other elements and text.

empty element :

An empty complex element cannot have contents, only attributes.

An empty XML element

<product prodid="1345" />

Ex:
<xs:element name="product">
<xs:complexType>
<xs:attribute name="prodid" type="xs:positiveInteger"/>
</xs:complexType>
</xs:element>

Element which have elements only :

An "elements-only" complex type contains an element that contains only other elements.

An XML element, "person", that contains only other elements:

<person>
<firstname>John</firstname>
<lastname>Smith</lastname>
</person>

You can define the "person" element in a schema, like this:

<xs:element name="person">
<xs:complexType>
    <xs:sequence>
      <xs:element name="firstname" type="xs:string"/>
      <xs:element name="lastname" type="xs:string"/>
    </xs:sequence>
</xs:complexType>
</xs:element>

Text- Only :

A complex text-only element can contain text and attributes.

This type contains only simple content (text and attributes), therefore we add a simpleContent element around the content. When using simple content, you must define an extension OR a restriction within the simpleContent element, like this:

<xs:element name="somename">
<xs:complexType>
    <xs:simpleContent>

        ....
        ....
      </xs:extension>
    </xs:simpleContent>
</xs:complexType>
</xs:element>

OR

<xs:element name="somename">
<xs:complexType>
    <xs:simpleContent>

        ....
        ....
      </xs:restriction>
    </xs:simpleContent>
</xs:complexType>
</xs:element>

Mixed Complex Elements:

A mixed complex type element can contain attributes, elements, and text.

An XML element, "letter", that contains both text and other elements:

<letter>
Dear Mr.<name>John Smith</name>.
Your order <orderid>1032</orderid>
will be shipped on 2001-07-13.
</letter>

The following schema declares the "letter" element:

<xs:complexType mixed="true">
    <xs:sequence>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="orderid" type="xs:positiveInteger"/>
      <xs:element name="shipdate" type="xs:date"/>
    </xs:sequence>
</xs:complexType>
</xs:element>

4 . Indicators
There are seven indicators:
Order indicators:

All
Choice
Sequence

Occurrence indicators:

maxOccurs
minOccurs

Group indicators:

Group name
attributeGroup name

Mostly Used indicators are discussed below :

Sequence Indicator

The <sequence> indicator specifies that the child elements must appear in a specific order:

<xs:element name="person">
   <xs:complexType>
    <xs:sequence>
      <xs:element name="firstname" type="xs:string"/>
      <xs:element name="lastname" type="xs:string"/>
    </xs:sequence>
</xs:complexType>
</xs:element>

Occurence Indicator

Occurrence indicators are used to define how often an element can occur.

maxOccurs Indicator
The <maxOccurs> indicator specifies the maximum number of times an element can occur:

<xs:element name="person">
<xs:complexType>
    <xs:sequence>
      <xs:element name="full_name" type="xs:string"/>
      <xs:element name="child_name" type="xs:string" maxOccurs="10"/>
    </xs:sequence>
</xs:complexType>
</xs:element>

The example above indicates that the "child_name" element can occur a minimum of one time (the default value for minOccurs is 1) and a maximum of ten times in the "person" element.

minOccurs Indicator

The <minOccurs> indicator specifies the minimum number of times an element can occur:

<xs:element name="person">
<xs:complexType>
    <xs:sequence>
      <xs:element name="full_name" type="xs:string"/>
      <xs:element name="child_name" type="xs:string"
      maxOccurs="10" minOccurs="0"/>
    </xs:sequence>
</xs:complexType>
</xs:element>

The example above indicates that the "child_name" element can occur a minimum of zero times and a maximum of ten times in the "person" element.

To allow an element to appear an unlimited number of times, use the maxOccurs="unbounded" statement

wsimport :

This tool is come along with the jdk.If you have install the JDK properly and your classpath or JAVA_HOME variable is set you can see the list of options provided by this command by typing wsimport in the command prompt.

For generating Java classess from wsimport the syntax is :

wsimport wsdl-location-path -d -keep

The wsdl-location-path : Is the location of wsdl file existence.

-d : specify the directory where all the generated classes should be placed.

-keep : It will keep the java source code of generated classes in the respective directory mentioned.

-extension :allow vendor extension - functionality if not have been specified.

Ex:

wsimport hello.wsdl src -keep

Now we will take the sample webservice of w3schools which has the follwing wsdl location:

http://www.w3schools.com/webservices/tempconvert.asmx?WSDL

Execute the command in DOS Prompt to generate classes from wsdl :

C:\webservice> wsimport http://www.w3schools.com/webservices/tempconvert.asmx?WSDL -extension -keep

After execution it will create a folder called org inside the webs service folder.

Now Create a Main class to access these webservice :

Code:

import org.tempuri.*;

public class Main {

    public static void main(String a[])
    {
        TempConvert convert=new TempConvert();
        System.out.println(convert.getTempConvertSoap().celsiusToFahrenheit("122"));

    }
}

Save these class inside "webservice" folder .Compile and run it you will see the output as "251.6" returning from web service.

Result :

Well, There are many ways to write a webservice, but it all depends upon the what kind of business logic and what kind of stuff your webservice wants to provide. They are two types of webservice available in the market as for now they are:

1.SOAP
2.REST

Here we will be creating SOAP based Web Service.What is SOAP and how it's work we have already seen,you can refer links in our index page. As per REST we will see in later part of our tutorials.

In this section we will concentrating on writing a Java webservice based upon a Java class which has a business method from this class a webservice we are going to create ,which can be access by many for there needs.

We are going to follow the eclipse Bottom approach to create this webservice.

Bottom-Up-Approach : It is used to create a webservice & WSDL from a Java class.

So before we begin we need the following

Requirements:

1. Eclipse IDE with J2EE Plugins or Eclipse Indigo J2EE Version [http://www.eclipse.org]

2.A Webserver or Application Server (Here i am using Tomcat Webserver)

Once we are done with our first webservice. We will access it through SOAP UI.We have already discuss about SOAPUI functionality in previous tutorials you can refer from here :

http://ayazroomy-java.blogspot.in/2013/07/java-web-service-tutorial-with-soap-ui.html

Now, Once you have the Eclipse IDE , You should configure the Tomcat Server in the Eclipse by Adding the Server in the Server TAB.So now we have the Server ready ....

Now create a new WebProject by :

Step 1:

File -> New -> Dynamic WebProject

Give a Name to the Project say : MyService and click the "Finish" Button.

Step 2:

Create a Package called "com.test" under "src" folder of "MyService" WebApplication.

Create a new Class called "Display" under "com.test" package.

Display class :

package com.test;

public class Display {

public String convert(String name)
{
return name.toUpperCase();
}

}

It is a straight forward simple class having a single business method which accepts a String and returns a String in Upper Case format.

Step 3:

Now, Right click on the Project "MyService"[Project Name] node and select the following :

New -> Other -> Webservices -> WebService : It will show a WebService Window

In this window we have couple of options:

1. Select the WebService Type as: BottomUp Java Bean Service
2. Select the class you have created now here namely : "Display By clicking on the Browse Button"

There are 3 Configuration links available in this window:

- ServerRuntime : Select your server in this case we have use Tomcat
- Service Runtime : Apache Axis - Default it will be selected
-Service Project : Specify your Project name here "MyService" and Click the Next Button >>

Step 4:

Follow instructions in screen shot below. Click Next Button ...

Step 5 :

After successful deployment the publishing window will appear.Now click the Finish Button

Now you can a wsdl namely Display.wsdl will be generated inside the wsdl folder under the WebContent Directory of the Project "MyService".

Display.wsdl :

<?xml version="1.0" encoding="UTF-8"?>
<wsdl:definitions targetNamespace="http://test.com" xmlns:apachesoap="http://xml.apache.org/xml-soap" xmlns:impl="http://test.com" xmlns:intf="http://test.com" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:wsdlsoap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<wsdl:types>
<schema elementFormDefault="qualified" targetNamespace="http://test.com" xmlns="http://www.w3.org/2001/XMLSchema">
<element name="convert">
<complexType>
<sequence>
<element name="name" type="xsd:string"/>
</sequence>
</complexType>
</element>
<element name="convertResponse">
<complexType>
<sequence>
<element name="convertReturn" type="xsd:string"/>
</sequence>
</complexType>
</element>
</schema>
</wsdl:types>

<wsdl:message name="convertRequest">

<wsdl:part element="impl:convert" name="parameters">

</wsdl:part>

</wsdl:message>

<wsdl:message name="convertResponse">

<wsdl:part element="impl:convertResponse" name="parameters">

</wsdl:part>

</wsdl:message>

<wsdl:portType name="Display">

<wsdl:operation name="convert">

<wsdl:input message="impl:convertRequest" name="convertRequest">

</wsdl:input>

<wsdl:output message="impl:convertResponse" name="convertResponse">

</wsdl:output>

</wsdl:operation>

</wsdl:portType>

<wsdl:binding name="DisplaySoapBinding" type="impl:Display">

<wsdlsoap:binding style="document" transport="http://schemas.xmlsoap.org/soap/http"/>

<wsdl:operation name="convert">

<wsdlsoap:operation soapAction=""/>

<wsdl:input name="convertRequest">

<wsdlsoap:body use="literal"/>

</wsdl:input>

<wsdl:output name="convertResponse">

<wsdlsoap:body use="literal"/>

</wsdl:output>

</wsdl:operation>

</wsdl:binding>

<wsdl:service name="DisplayService">

<wsdl:port binding="impl:DisplaySoapBinding" name="Display">

<wsdlsoap:address location="http://localhost:8080/MyService/services/Display"/>

</wsdl:port>

</wsdl:service>

</wsdl:definitions>

Now in the browser if you type the URL :

http://localhost:8080/MyService/services . We can see list of Deployed Services in the Server.

Now, we have the service ready , now we can test this through SOAP UI.

Step 6:

- Open SOAP UI.

- Right Click on Project Node->New SoapUI Project --> Provide the WSDL Location Path .

- In this case http://localhost:8080/MyService/services/Display?wsdl

Open the Soap Request give some name in lower case in request XML and we can see the Response in Upper case for the given name.

We can create stub for this Webservice by using wsimport tool please refer http://ayazroomy-java.blogspot.in/2013/07/using-wsimport-tool-to-generate-client.html for creating Stub and client from WSDL.

Thursday, October 3, 2013

Struts MVC Architecture Flow

The model contains the business logic and interact with the persistance storage to store, retrive and manipulate data.
The view is responsible for dispalying the results back to the user. In Struts the view layer is implemented using JSP.
The controller handles all the request from the user and selects the appropriate view to return. In Sruts the controller's job is done by the ActionServlet.

The following events happen when the Client browser issues an HTTP request.

The ActionServlet receives the request.
The struts-config.xml file contains the details regarding the Actions, ActionForms, ActionMappings and ActionForwards.
During the startup the ActionServelet reads the struts-config.xml file and creates a database of configuration objects. Later while processing the request the ActionServlet makes decision by refering to this object.

When the ActionServlet receives the request it does the following tasks.

Bundles all the request values into a JavaBean class which extends Struts ActionForm class.
Decides which action class to invoke to process the request.
Validate the data entered by the user.
The action class process the request with the help of the model component. The model interacts with the database and process the request.
After completing the request processing the Action class returns an ActionForward to the controller.
Based on the ActionForward the controller will invoke the appropriate view.
The HTTP response is rendered back to the user by the view component.

Monday, August 26, 2013

Top 10 Mistakes Made by New Agile Teams

The following is a collection of the 10 most common mistakes that a new Agile team can make. You’ll find suggested remedies that come directly from our support, user learning, and coaching teams, who have years of experience guiding teams through Agile transitions. If you’re new to Agile, Rally, or just need a refresher, click an item in the list below to jump directly to a topic:

Tuesday, July 23, 2013

HOW TO Become an Efficient Support Engineer in IT industry?

I have been a Production Support Engineer for over 7 years now. Over the years I have used several tools to help me work efficiently and be productive. In this post I have tried to list down the few things that I have learned being a Support Engineer.

Use a good text editor and master it (Sublime Text 2)

A text editor that is easy to use is a very good tool in the hands of a Support Engineer. No matter what product you support, you will always need a workspace to copy things quickly, make notes, etc. There have been several instances wherein I have had to transform text using simple actions such as find and replace. I also use my text editor to read/write code.
The text editor I recommend and use is Sublime Text 2. This is easily the best text editor I have used over the years. It has a fantastic set of nifty shortcuts, very good syntax highlighting that helps in reading code and also has a very simple word completion feature (really useful when writing code). The one feature I really like about Sublime Text 2 is that the unsaved files are not lost in case you accidentally close the editor and will be available when you open Sublime the next time. The default theme is very soothing too.
Some of the other good text editors I have used are :

Using the various utilities that come with these powerful text editors will speed up your work drastically.

Manage passwords using a Password Manager (Keepass)

As a support engineer you would have access several systems to get your work done. Most of these systems would be password protected and managing the access credentials could be really difficult.
Keepass is a free tool that can be used to manage your passwords and log in information.

Use a Desktop Automation Software (AutoHotKey)

My work involves a lot of emailing. There are a few sentences that I use in almost all my emails. Writing the same line repeatedly in every email is not an efficient way to work. I use AutoHotKey to auto complete my sentences that I repeatedly use.

For e.g. I usually end me emails with, “Please do let me know if you need any information.”

The tool is easy to use and allows you to set a combination of keys which when typed, expands to the required sentence. In the above example “Please do let me know if you need any information.” is expanded when I type “pdl”.

Use Microsoft Excel or a similar spreadsheet application

Microsoft Excel is something that I use for almost all kind of data transformation and analysis. This is a must have tool whether you are Support Engineer or no. For me it just helps to put my initial analysis on a spreadsheet and look at it when working on an issue. You could use other spreadsheet solutions that come as part of LibreOffice or Google Docs, but I personally prefer Microsoft Excel.

Use a Cloud Storage (Dropbox)

As a Support Engineer you may have to work on something that is critical in nature even when you don’t have access to your own laptop/PC. It is good to have some storage on the cloud to keep your documents and reference notes so that they are easily accessible from another PC over the internet. I personally recommend Dropbox, but these days there are several other good solutions like Box, SkyDrive and Google Drive.

Track your time

Make it a point to track your time correctly. You can use any tool to track your time that works for you. I use a simple time tracking tool that I built for myself, however you can use Microsoft Excel, any text editor, etc. to track your time. Tracking your time will give you a good feedback on how much time you are taking for a particular task and this will help you give better effort estimations for tasks assigned to you.

Learn a scripting language

Try and learn a scripting language, for e.g. Shell scripting, PowerShell scripting for Windows, Ruby or Perl. I personally prefer Shell Scripting and Ruby. Knowing a scripting language will help you automate or work on adhoc requirements more efficiently. In my opinion learning to program or code in any one language helps one think in a more structured manner.

Develop basic SQL Skills

Learn the basics of SQL like SELECT, UPDATE, INSERT, DELETE and TRUNCATE.

Use a Screen Shot Tool (Greenshot)

I use Greenshot to take a screen shots of certain portions of the screen. This avoids taking the complete screen shot and using an image editor like Microsoft Paint or something similar to cut portions of the image before sharing it.

Ask “How can it be done?”

This is something that I picked up from my father. He always says, that if you are presented with a problem, the first obvious question should be : “How can it be done?”. This instills a problem-solving attitude in you and the only way you would go is forward. This is a simple but a very powerful advice.
Revisit your routine and try and simplify your work. You need to find all possible ways to identify monotonous work and try and simplify them and if possible automate them.
Doing the above will help you be efficient which will give you ample time and resources to help your customers solve their issues and problems.
Let me know your comments and also let me know if you are using any good tool that has helped you be more efficient and productive.

Tuesday, July 16, 2013

Hadoop Developer interview questions and answers

What are supported programming languages for Map Reduce? –

The most common programming language is Java, but scripting languages are also supported via Hadoop streaming.

The original language supported is Java. However, as Hadoop became more and more popular various alternative scripting languages were incorporated

How does Hadoop process large volumes of data?

Hadoop ships the code to the data instead of sending the data to the code.

The basic design principles of Hadoop is to eliminate the data copying between different datanodes

What are sequence files and why are they important? –

Sequence files are a type of the file in the Hadoop framework that allow data to be sorted

Sequence files are intermediate files that are created by Hadoop after the map step

Hadoop is able to split data between different nodes gracefully while keeping data compressed. The sequence files have special markers that allow data to be split across entire cluster

What are map files and why are they important?

Map files are sorted sequence files that also have an index. The index allows fast data look up.

The Hadoop map file is a variation of the sequence file. They are very important for map-side join design pattern.

How can you use binary data in MapReduce?

Binary data can be used directly by a map-reduce job. Often binary data is added to a sequence file.

Binary data can be packaged in sequence files. Hadoop cluster does not work very well with large numbers of small files. Therefore, small files should be combined into bigger ones

What is map - side join?

Map-side join is done in the map phase and done in memory

The map-side join is a techinique that allows for splitting map file between different data nodes. The data will be loaded into memory. This technique allow very fast performance for the join.

What is reduce - side join?

Reduce-side join is a technique for merging data from different sources based on a specific key. There are no memory restrictions

The reduce side join is a technique for joining data of any size in the reduce step. The technique is much slower then map-side join. However, this technique does not have any requirements on data size.

What is HIVE?

Hive is a part of the Apache Hadoop project that provides SQL like interface for data processing

Hive is a project initially developed by facebook specifically for people with very strong SQL skills and not very strong Java skills who want to query data in Hadoop

What is PIG?

Pig is a part of the Apache Hadoop project that provides C-like scripting languge interface for data processing

Pig is a project that was developed by Yahoo for people with very strong skills in scripting languages. Using scripting language, it dynamically creates Map Reduce jobs automatically

How can you disable the reduce step?

A developer can always set the number of the reducers to zero. That will completely disable the reduce step.

If developer uses MapReduce API he has full access to any number of mappers and reducers for job execution

Why would a developer create a map-reduce without the reduce step?

There is a CPU intensive step that occurs between the map and reduce steps. Disabling the reduce step speeds up data processing

This is a map step only. MapReduce jobs are very common. They normally are used to perform transformations on data without sorting and aggregations

What is the default input format?

The default input format is TextInputFormat with byte offset as a key and entire line as a value.

Hadoop permits a large range of input formats. The default is text input format. This format is the simplest way to access data as text lines

How can you overwrite the default input format?

In order to overwrite default input format, a developer has to set new input format on job config before submitting the job to a cluster

Developer can always set different input formats on job configuration (e.g sequence files, binary files, compressed format)

What are the common problems with map-side join?

The most common problems with map-side joins are out of memory exceptions on slave nodes.

Map-side join uses memory for joining the data based on a key. As a result the data size is limited to the size of the available memory. If this exceeds available memory an out of memory error will occur

Which is faster: Map-side join or Reduce-side join? Why?

Map-side join is faster because join operation is done in memory.

The map-side join is faster. This is primarily due to usage of memory. Memory operations are always faster since there is no disk I/O involved.

Will settings using Java API overwrite values in configuration files?

Yes. The configuration settings using Java API take precedence

Developer has full control over the setting on Hadoop cluster. All configurations can be changed via Java API

What is AVRO?

Avro is a java serialization library

AVRO is an Apache project that is bridging the gap between unstructured data and structured data. The avro file format is highly optimized for network transmisions and splitable between different datanodes

Can you run Map - Reduce jobs directly on Avro data?

Yes, Avro was specifically designed for data processing via Map-Reduce

AVRO implements all necessary interfaces for MapReduce processing and avro data can be processed directly via Hadoop cluster

What is distributed cache?

The distributed cache is a component that allows developers to deploy jars for Map-Reduce processing.

Distributed cache is the Hadoop answer to the problem of deploying third-party libraries. Distributed cache will allow libraries to be deployed to all datanodes

What is the best performance one can expect from a Hadoop cluster?

The best performance expectation one can have is measured in seconds. This is because Hadoop can only be used for batch processing –

Hadoop specifically was designed for batch processing. There are a few additional components that will allow better performance. Near real-time and real-time Hadoop performance are not currently possible but are in the works.

What is writable?

Writable is a java interface that needs to be implemented for MapReduce processing.

Hadoop performs a lot of data transmissions between different datanodes. Writable is needed for mapreduce processing in order to improve performance of the data transmissions.

The Hadoop API uses basic Java types such as LongWritable, Text, IntWritable. They have almost the same features as default java classes. What are these writable data types optimized for?

Writable data types are specifically optimized for network transmissions

Data needs to be represented in a format optimized for network transmission. Hadoop is based on the ability to send data between datanodes very quickly. Writable data types are used for this purpose.

Can a custom type for data Map-Reduce processing be implemented?

Yes, custom data types can be implemented as long as they implement writable interface.

Developers can easily implement new data types for any objects. It is common practice to use existing classes and extend them with writable interface.

What happens if mapper output does not match reducer input?

A real-time exception will be thrown and map-reduce job will fail.

Reducers are based on the mappers output and Java is a strongly typed language. Therefore, an exception will be thrown at run-time if types do not much

Can you provide multiple input paths to a map-reduce jobs?

Yes, developers can add any number of input paths.

The Hadoop framework is capable of taking different input paths and assigning different mappers for each one. This is a very convenient way of writing different mappers to handle various datasets.

Can you assign different mappers to different input paths?

Yes, different mappers can be assigned to different directories

Assigning different mappers to different data sources is the way to quickly and efficiently create code for processing multiple formats.

Can you suppress reducer output?

Yes, there is a special data type that will suppress job output.

There are a number of scenarios where output is not required from reducers. For instance, web crawling or image processing does not require external fetch or data processing.

Is there a map input format?

No, but sequence file input format can read map files

Map files are just a variation of sequence files. They store data in sorted order

What is the most important feature of map-reduce?

Ability to process data on the cluster of the machines without copying all the data over.

The fundamental difference of the Hadoop framework is that multiple machines will be used to process the same data and data is readily available for processing in distributed file system.

What is HBASE?

Hbase is a part of the Apache Hadoop project that provides interface for scanning large amount of data using Hadoop infrastructure

Hbase is one of the Hadoop framework projects that allow real time data scans across big data volumes. This is very often used to serve data from a cluster

Hadoop admin interview question and answers

Which operating system(s) are supported for production Hadoop deployment?

The main supported operating system is Linux. However, with some additional software Hadoop can be deployed on Windows.

What is the role of the namenode?

The namenode is the "brain" of the Hadoop cluster and responsible for managing the distribution blocks on the system based on the replication policy. The namenode also supplies the specific addresses for the data based on the client requests.

What happen on the namenode when a client tries to read a data file?

The namenode will look up the information about file in the edit file and then retrieve the remaining information from filesystem memory snapshot

Since the namenode needs to support a large number of the clients, the primary namenode will only send information back for the data location. The datanode itselt is responsible for the retrieval.

What are the hardware requirements for a Hadoop cluster (primary and secondary namenodes and datanodes)?

There are no requirements for datanodes. However, the namenodes require a specified amount of RAM to store filesystem image in memory

Based on the design of the primary namenode and secondary namenode, entire filesystem information will be stored in memory. Therefore, both namenodes need to have enough memory to contain the entire filesystem image.

What mode(s) can Hadoop code be run in?

Hadoop can be deployed in stand alone mode, pseudo-distributed mode or fully-distributed mode. –

Hadoop was specifically designed to be deployed on multi-node cluster. However, it also can be deployed on single machine and as a single process for testing purposes

How would an Hadoop administrator deploy various components of Hadoop in production?

Deploy namenode and jobtracker on the master node, and deploy datanodes and taskstrackers on multiple slave nodes

There is a need for only one namenode and jobtracker on the system. The number of datanodes depends on the available hardware

What is the best practice to deploy the secondary namenode

Deploy secondary namenode on a separate standalone machine

The secondary namenode needs to be deployed on a separate machine. It will not interfere with primary namenode operations in this way. The secondary namenode must have the same memory requirements as the main namenode.

Is there a standard procedure to deploy Hadoop?

. No, there are some differences between various distributions. However, they all require that Hadoop jars be installed on the machine

There are some common requirements for all Hadoop distributions but the specific procedures will be different for different vendors since they all have some degree of proprietary software

What is the role of the secondary namenode?

Secondary namenode performs CPU intensive operation of combining edit logs and current filesystem snapshots

The secondary namenode was separated out as a process due to having CPU intensive operations and additional requirements for metadata back-up

What are the side effects of not running a secondary name node?

The cluster performance will degrade over time since edit log will grow bigger and bigger

If the secondary namenode is not running at all, the edit log will grow significantly and it will slow the system down. Also, the system will go into safemode for an extended time since the namenode needs to combine the edit log and the current filesystem checkpoint image.

What happen if a datanode loses network connection for a few minutes?

The namenode will detect that a datanode is not responsive and will start replication of the data from remaining replicas. When datanode comes back online, the extra replicas will be

The replication factor is actively maintained by the namenode. The namenode monitors the status of all datanodes and keeps track which blocks are located on that node. The moment the datanode is not avaialble it will trigger replication of the data from the existing replicas. However, if the datanode comes back up, overreplicated data will be deleted. Note: the data might be deleted from the original datanode.

What happen if one of the datanodes has much slower CPU?

The task execution will be as fast as the slowest worker. However, if speculative execution is enabled, the slowest worker will not have such big impact

Hadoop was specifically designed to work with commodity hardware. The speculative execution helps to offset the slow workers. The multiple instances of the same task will be created and job tracker will take the first result into consideration and the second instance of the task will be killed.

What is speculative execution?

If speculative execution is enabled, the job tracker will issue multiple instances of the same task on multiple nodes and it will take the result of the task that finished first. The other instances of the task will be killed.

The speculative execution is used to offset the impact of the slow workers in the cluster. The jobtracker creates multiple instances of the same task and takes the result of the first successful task. The rest of the tasks will be discarded.

What is speculative execution?

. If speculative execution is enabled, the job tracker will issue multiple instances of the same task on multiple nodes and it will take the result of the task that finished first. The other instances of the task will be killed.

How many racks do you need to create an Hadoop cluster in order to make sure that the cluster operates reliably?

In order to ensure a reliable operation it is recommended to have at least 2 racks with rack placement configured

Hadoop has a built-in rack awareness mechanism that allows data distribution between different racks based on the configuration.

Are there any special requirements for namenode?

Yes, the namenode holds information about all files in the system and needs to be extra reliable

- The namenode is a single point of failure. It needs to be extra reliable and metadata need to be replicated in multiple places. Note that the community is working on solving the single point of failure issue with the namenode.

If you have a file 128M size and replication factor is set to 3, how many blocks can you find on the cluster that will correspond to that file (assuming the default apache and cloudera configuration)?

. 6

Based on the configuration settings the file will be divided into multiple blocks according to the default block size of 64M. 128M / 64M = 2 . Each block will be replicated according to replication factor settings (default 3). 2 * 3 = 6 .

What is distributed copy (distcp)?

Distcp is a Hadoop utility for launching MapReduce jobs to copy data. The primary usage is for copying a large amount of data

One of the major challenges in the Hadoop enviroment is copying data across multiple clusters and distcp will allow multiple datanodes to be leveraged for parallel copying of the data.

What is replication factor?

Replication factor controls how many times each individual block can be replicated –

Data is replicated in the Hadoop cluster based on the replication factor. The high replication factor guarantees data availability in the event of failure.

What daemons run on Master nodes?

NameNode, Secondary NameNode and JobTracker

Hadoop is comprised of five separate daemons and each of these daemon run in its own JVM. NameNode, Secondary NameNode and JobTracker run on Master nodes. DataNode and TaskTracker run on each Slave nodes.

What is rack awareness?

Rack awareness is the way in which the namenode decides how to place blocks based on the rack definitions

Hadoop will try to minimize the network traffic between datanodes within the same rack and will only contact remote racks if it has to. The namenode is able to control this due to rack awareness

What is the role of the jobtracker in an Hadoop cluster? –

The jobtracker is responsible for scheduling tasks on slave nodes, collecting results, retrying failed tasks

The job tracker is the main component of the map-reduce execution. It control the division of the job into smaller tasks, submits tasks to individual tasktracker, tracks the progress of the jobs and reports results back to calling code. .

How does the Hadoop cluster tolerate datanode failures?

Since Hadoop is design to run on commodity hardware, the datanode failures are expected. Namenode keeps track of all available datanodes and actively maintains replication factor on all data.

The namenode actively tracks the status of all datanodes and acts immediately if the datanodes become non-responsive. The namenode is the central "brain" of the HDFS and starts replication of the data the moment a disconnect is detected.

What is the procedure for namenode recovery?

A namenode can be recovered in two ways: starting new namenode from backup metadata or promoting secondary namenode to primary namenode

The namenode recovery procedure is very important to ensure the reliability of the data.It can be accomplished by starting a new namenode using backup data or by promoting the secondary namenode to primary.

Web-UI shows that half of the datanodes are in decommissioning mode. What does that mean? Is it safe to remove those nodes from the network?

This means that namenode is trying retrieve data from those datanodes by moving replicas to remaining datanodes. There is a possibility that data can be lost if administrator removes those datanodes before decomissioning finished .

Due to replication strategy it is possible to lose some data due to datanodes removal en masse prior to completing the decommissioning process. Decommissioning refers to namenode trying to retrieve data from datanodes by moving replicas to remaining datanodes

What does the Hadoop administrator have to do after adding new datanodes to the Hadoop cluster?

Since the new nodes will not have any data on them, the administrator needs to start the balancer to redistribute data evenly between all nodes.

Hadoop cluster will detect new datanodes automatically. However, in order to optimize the cluster performance it is recommended to start rebalancer to redistribute the data between datanodes evenly.

If the Hadoop administrator needs to make a change, which configuration file does he need to change?

It depends on the nature of the change. Each node has it`s own set of configuration files and they are not always the same on each node

Correct Answer is A - Each node in the Hadoop cluster has its own configuration files and the changes needs to be made in every file. One of the reasons for this is that configuration can be different for every node.

Map Reduce jobs are failing on a cluster that was just restarted. They worked before restart. What could be wrong?

The cluster is in a safe mode. The administrator needs to wait for namenode to exit the safe mode before restarting the jobs again

This is a very common mistake by Hadoop administrators when there is no secondary namenode on the cluster and the cluster has not been restarted in a long time. The namenode will go into safemode and combine the edit log and current file system timestamp

Map Reduce jobs take too long. What can be done to improve the performance of the cluster?

One the most common reasons for performance problems on Hadoop cluster is uneven distribution of the tasks. The number tasks has to match the number of available slots on the cluster

Hadoop is not a hardware aware system. It is the responsibility of the developers and the administrators to make sure that the resource supply and demand match.

How often do you need to reformat the namenode?

Never. The namenode needs to formatted only once in the beginning. Reformatting of the namenode will lead to lost of the data on entire

The namenode is the only system that needs to be formatted only once. It will create the directory structure for file system metadata and create namespaceID for the entire file system. –

After increasing the replication level, I still see that data is under replicated. What could be wrong?

Data replication takes time due to large quantities of data. The Hadoop administrator should allow sufficient time for data replication

Depending on the data size the data replication will take some time. Hadoop cluster still needs to copy data around and if data size is big enough it is not uncommon that replication will take from a few minutes to a few hours.

Labels

Sunday, October 6, 2013

Java Web Services

Thursday, October 3, 2013

Struts MVC Architecture Flow

Monday, August 26, 2013

Top 10 Mistakes Made by New Agile Teams

Top 10 Mistakes Made by New Agile Teams

Tuesday, July 23, 2013

HOW TO Become an Efficient Support Engineer in IT industry?

Use a good text editor and master it (Sublime Text 2)

Manage passwords using a Password Manager (Keepass)

Use a Desktop Automation Software (AutoHotKey)

Use Microsoft Excel or a similar spreadsheet application

Use a Cloud Storage (Dropbox)

Track your time

Learn a scripting language

Develop basic SQL Skills

Use a Screen Shot Tool (Greenshot)

Ask “How can it be done?”

Tuesday, July 16, 2013

Hadoop Developer interview questions and answers

Hadoop admin interview question and answers

Contributors

Followers

Blog Archive

Count