Tuesday, July 3, 2012

How to Check URL using shell script?

We had an issue on production environment. Suddenly one of the webserver got crashed that no one aware of it. That was happen in the Weekend there were no users accessing at that time. When the business hour started Monday morning customer reported to service desk that one of the URL is not working for them. We middleware support team check the URL and identified that the HTTP server not working. Next action items as you know taking the logs backup and starting the HTTP server done.

As part of issue management preventive action to this there could be a monitoring of the URLs. All the web servers hosted application, load balancer that is serving the application or Application server that hosted the web application must be checked randomly or every day before business hours starts.
To do this monitoring team started working on listing out all loadbalancer URL, primary, secondary webserver URLs and also Application server hosted URL lists and checking on the browser each URL.
Here is the pain-area doing this in manually there could be
JHuman errors, Re-testing same URL
• Each URL checking takes almost 1 min
• Business have 10 domains then each domain have load balancer URL, primary, secondary webserver URL approx 30 min
Thought of the day! Why don’t we automate this task using UNIX shell scripting. We came up with idea using wget command.  Started brainstorming searching on internet for a scripts that could monitor the URL. Many forums, blogs, most of them are using curl command whereas Solaris box is not having this command.
Alternate to curl command we have powerful wget command on Solaris.

1. While using wget it is downloading junk
2. We have URL list some of them are SSO enabled
The first challenge, can be fix with –spider option will avoid the downloading anything from the website.
The second challenge overcome by using --no-check-certificate option
Redirect output of command into a variable
Test the test url from the command line, that is giving output from HTTP header of that URL. From that we need only single line that shows HTTP status. The grep command will do this, but it cannot be applied directly to that command. Because the wget execution is unable to sending the response to STDOUT, rather it will send to STDERR. To fix this, redirect the STDERR to STDOUT and do this in background ( 2> &1).  You can do this by storing that into a file, but it is not optimum solution. To store it into a variable, So that grep command will be enabled to find the pattern ‘HTTP’.

Create a file with desired URL list - urls.txt
Prepared a bash script that will work as web crawler for your business to check the URL and tells you it’s host is OK or not.
Effective automation script made the script as follows:

# This script will check for the appliction url working or not

# Function that check the URL returns HTTP Status Code
       # keep below 2 lines in one
       s=`(/usr/sfw/bin/wget -t 0 --spider --no-check-certificate $1) 2>&1 \
       grep HTTP| tail -1|cut -c 41-43`
       echo $s

#========== = = = main script = = = ===========
for url in `cat urls.txt`
         HTTPCode=$(checkURL $url) # function call
         if [ $HTTPCode -ge 500 ]
             # mail if this have issue
         TIME=`date +%d-%m-%Y_%H.%M.%S`
         echo At $TIME the URL $url $HTTPCode $status

This script can be associated with schedule job
 1. crontab old way
2. autosys job scheduler
3. at command for trials

No comments:

Post a Comment