Showing posts with label HTTP. Show all posts
Showing posts with label HTTP. Show all posts

11 July, 2008

Sources of com.sun.net.httpserver.HttpServer

Sun JRE contains implementation of simple HTTP server.
(in the com.sun.net.httpserver and sun.net.httpserver packages)


But this "HttpServer":

  • Not a part of Java API.

  • It is Sun's proprietary component

  • It works only under Sun JRE >=1.6

  • His sources are available only as a part of Standard Edition Development Kit Source Release under Java Research License
    (see Java™ Platform, Standard Edition 6u3 Source Snapshot Releases jdk-6u3-fcs-src-b05-jrl-24_sep_2007)


But sometimes this implementation is pretty useful (for example here).

27 April, 2008

HowTo simulate Googlebot

Googlebot is Google's web crawling robot, which finds and retrieves pages on the web and hands them off to the Google indexer.
Googlebot visits sites with special value in his HTTP request header.

It uses special user-agent string:
"Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)"

It is possible to simulate Googlebot from the shell script via wget program.

Like this:

#!/bin/bash

TEST_URL="http://digg.com/"

FIREFOX_USERAGENT_STRING="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14"
GOOGLEBOT_USERAGENT_STRING="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

#get page for firefox browser
wget -c --user-agent="$FIREFOX_USERAGENT_STRING" --output-document=firefox.html "$TEST_URL"

#get page for google bot
wget -c --user-agent="$GOOGLEBOT_USERAGENT_STRING" --output-document=googlebot.html "$TEST_URL"


This script may be useful for testing of site's search engine optimization.

...May the Force be with you...