The voodoo of web metrics

December 19, 2007 · Posted in Tech tidbit · 5 Comments 

The emergence of Web 2.0 and Internet startups has given rise to a whole new industry in itself - web metrics. If you follow the Web 2.0 world, it is unlikely that you havent heard of at least one of the following companies - Comscore, Hitwise, Compete, Alexa, Quantcast. All these companies provide various web metrics such as number of unique visitors, pageviews, hits etc. for websites.

The metrics released by these companies often become the subject of interesting conversations within the tech blogosphere. While a lot of the top tech bloggers frequently refer to Comscore and other metrics, very little is discussed about the process and methodology used by these companies to compile this data.

Ever wonder how exactly these companies compute their metrics ? What exactly is the voodoo behind these numbers ?

In today’s date, when web metrics play such a huge role in website valuations, I thought it would be a good idea to get an overview of the various companies in this segment and the methods they use for data collection.

In the first installment of this multi-part post, I’ll overview some of the data collection methods used within the industry. In a follow up post, I’ll review the companies and what specific methods they employ for data collection and metrics calculation.

There are three primary methods of data collection while computing web metrics. Each method relies on a different physical collection point where the data is tapped.

1) End User based data collection:

In this method of data collection, end users are generally required to install some piece of software on their computers, which will track their Internet usage. The software will in turn report the results to the metrics company, which can then compile the results. Two variations of the user end software are available:

(a) browser based toolbar - Users install a toolbar in their browser, which will collect the usage data. Alexa employs this method of data collection. The drawback of this approach is that non-browser related usage will not get reported. Moreover, if the end user is using a different browser, the Internet usage will not be effectively reported.

(b) Panel based approach - Users install an application (non-browser-based), which tracks both browser and non browser based Internet usage. The users agreeing to this method of data collection form a sample ‘panel’ and so this method is called panel based data collection. Often times, end users are provided with fringe benefits for participating in the “user panel”. Once the results are obtained from the user panel, the data is sampled and normalized to compute the final metrics. Comscore employs this method for data collection.

2) ISP based data collection:

The idea is to intercept and track the data at the network level, which would obviously be the ISP level. Data is gathered from the ISP and then sampled and analyzed for computing the final results. Note that in this particular case, a lot depends on the ISP chosen to track the data. And based on the subset of ISPs being used for data collection, the data can get skewed.

3) Website based data collection:

This involves tracking the metrics at the website itself. The website generally adds a tracking code into their webpage and the metrics get collected and reported to the company. A limitation of this approach is that the website owners have to agree to using the tracking code in their website. And a lot of website owners are wary of getting 3rd party metrics code into their website.

As we’ll see in a follow up post, the web metrics companies employ one or a combination of the above mentioned methods for their data collection.

Internet is a medium that prides itself on being measurable, quantifiable and chides TV and radio for the very same reasons. Quite ironical then — isnt it ? Web metrics and stats related to web page views and unique visitors etc. is still at best a ‘guesstimate’.

Microsoft to acquire 20 companies a year

October 19, 2007 · Posted in Tech tidbit · 1 Comment 

Microsoft’s own Web 2.0 efforts have yet to strike a chord with users. So, looks like they intend to take the acquisition route to acquire new, interesting Web 2.0 companies and their technology. Speaking at the Web 2.0 conference in San Francisco, Steve Ballmer, Microsoft CEO, mentioned that they will acquire 20 companies every year for the next five years. Acquisition price range - between $50 million - $1 billion.

A bit late to the acquisition game, dont you think ?

Tech tidbit

October 15, 2007 · Posted in Tech tidbit · 2 Comments 

Reebok India launches cricket website

Close on the heels of the Twenty20 rage, Reebok India has launched a cricket website - www.rbkcricket.in . The site offers trivia, games, player profiles, articles and live match simulation (I think courtesy of Krishcricket).

via DNA

WatchIndia announces partnership with Yash Raj Films

WatchIndia has entered into a strategic partnership with Yash Raj Films under which, YRF movies will be available on WatchIndia on a pay-per-view basis. Later down the line, all YRF content will be available as a download option. They claim to be getting 1 million visitors per month.

via release

Web18 launches Hindi Portal Josh18

Web 18, the web division of Network18, has launched Josh18 - a Hindi portal for Indian youth. News, forums, blogs, movies, sports and market updates are part of the portal.

via TS

Design and develop API for your website

October 12, 2007 · Posted in India Web 2.0, Tech tidbit · 2 Comments 

API’s have prevailed for quite some time now - Amazon, eBay, Yahoo, Google - everyone provides one. API’s are the one that fueled the ‘mashup revolution’ a few years back. Lately however, its all been about the Facebook platform. Since Facebook released their API, thousands of applications have popped up, generating traffic and buzz like never before.

In this multi part post, I’ll try to address some of the details about designing and implementing your own API and what options you have.

What are API’s ?

For the benefit of the non-techies, lets first begin with defining what API’s are and the concept behind API’s. API’s are an abbreviation for Application Programming Interface. API is a set of functions that a computer program/software makes available to other programs so that they can directly talk to the program without actually accessing the source code. For the application developer, the API provider is like a black-box - he has no clue into the implementation of the API. All he knows is about the various functions / queries he can invoke and what results to expect.

api.jpg

Image from xml-rpc.com

So, What are some of the options for implementing your API ?

There are 3 common options for implementing your API - SOAP, XML-RPC and REST.

1) SOAP

Simple Object Access Protocol (SOAP) is a protocol for sending XML messages over HTTP/ HTTPS / JMS / SMTP etc.. The specification for this protocol is maintained by W3. SOAP encapsulates the message in a SOAP ‘envelope’ and expects messages to be sent and received in a particular format. As a result of this, a SOAP toolkit is generally required to create the outgoing messages from the client (application developer) to the API provider.

The pros for using SOAP are: (i) SOAP is not tied to HTTP (although SOAP over HTTP is the most popular combo) (ii) it is easier to pass complex data types using SOAP

However, detractors of SOAP argue that SOAP adds unnecessary complexity to the API implementation. This complexity also contributes additional overhead and greater message size.

Google has one of the most famous API implementations using SOAP.

2) XML-RPC

XML Remote Procedure Call is one of the oldest means of invoking methods on remote servers. The specification is maintained at xmlrpc.com and implementations are available for most of the popular languages. XML-RPC provides a means to invoke a remote call / procedure and make changes and - or retrieve data. At the core, XML-RPC is a HTTP POST with an XML request in the body. The response is also received as a XML message as part of the body.

While XML-RPC is very simple to learn and implement, some developers find this over-simplicity severely limiting at times - for example, XML-RPC doesnt provide the ability to pass an object as an argument to a function.

The most famous XML-RPC API implementations are the blog ping services.

3) REST

Representational State Transfer (REST) has emerged as one of the most popular mechanisms for implementing APIs. Amongst the listed benefits, REST is not only simple but also most closely resembles the intended architecture of the Internet. REST is not a protocol or a specification, but rather a software architecture. The concept of REST revolves around resources, where each resource is uniquely addressable. All resources share the same constrained interface for state transfer. For example, take the Internet - URI’s are used to uniquely address each resource while HTTP methods (GET,POST, PUT etc.) provide a common constrained interface to transfer the state to the client (application developer).

Other than the above listed benefits, REST is cacheable, fast and requires no client side toolkits.

Any non-RPC interface using XML over HTTP, in response to HTTP GET requests is technically called ‘Relaxed REST’. Most of the PHP applications have this ‘Relaxed REST’ API implementation.

So that was a brief summary of the 3 options you have for implementing your API. Next we’ll look into designing and implementing your API. In the meanwhile, I’d like to hear your feedback about this post — it will give me a good idea into how to structure the next post.

HP enters retail photo printing in India

October 9, 2007 · Posted in India Web 2.0, Tech tidbit · Comment 

HP is entering the retail photo printing market in India. The company is tying up with channel partners and existing players for HP branded outlets to tap into this market, estimated to be around 3 billion prints per year. About 3000 outlets are planned over the next 4 years as per HP India sales GM, consumer sales - imaging and printing group, Varadarajan Krishnan.

The retail presence will be through self-service & operated kiosks that will provide prints on canvas, mugs, t-shirts etc. and accept payments via credit cards.

The retail presence will be complemented by launching the Snapfish service in India. Snapfish India will be available at www.snapfish.co.in, using which, users will be able to order prints online.

It’ll be interesting to see how existing players - iTasveer and Picsquare and other recent entrants - ZoomIn react to this news. HP’s entry definitely heats up the market dynamics.

via ET

IAMAI pegs Indian e-commerce market at Rs 9000 crores by 2007-2008

October 8, 2007 · Posted in Stats and Numbers, Tech tidbit · 1 Comment 

In its latest research report, Internet and Mobile Association of India (IAMAI) estimates that the e-commerce market in India will reach Rs 9210 crores by end of 2007 - 2008.  The following segments were included as part of determining the e-commerce market:

* Online travel portals and aggregators

* Online retailers and auctions

* Classifieds - comprising of jobs, matrimony, real estate, autos

* Paid content subscription

* Digital downloads - for mobiles and PC’s

Online travel was the leader, with Rs 7000 crores market size. Followed by retailers (Rs 1105 crores), classifieds (Rs. 820 crores), content subscription (Rs 30 crores) and digital downloads (Rs 255 crores).

Major boosters for e-commerce in decreasing order of priority:

1) Saves times and effort

2) convenience

3) Access to wide variety

4) Good deals

5) Detailed product info / research

6) Easy comparison between products

Leading barriers to e-commerce are:

1) unreliable product quality

2) No bargaining (bole to ekdum Indian mentality - paisa vasool honeka)

3) Online security apprehension

4) Not a tangible experience

5) Not enough discounts (wow — I didnt see this coming)

6) Wait time to delivery

Read the entire report here (PDF doc). I’ll dissect the report more closely later this week.

Indian Apple store opening ahead of Diwali

October 8, 2007 · Posted in Tech tidbit · 2 Comments 

News about Reliance Retail bringing the Apple store to India was in the air earlier.

The deal is now done and the first store, to be called iStore, will be launching by October end in Bangalore - just ahead of Diwali. By year end, 10 iStores are likely to be launched across India.

Any ideas where this store is coming up in Bangalore ?

via ET

CBI wants to monitor Internet gateways in India

October 7, 2007 · Posted in Tech tidbit · Comment 

Key loggers, liability of service providers towards obscene content and now CBI wants to monitor Internet and Internet gateways. The goal is to monitor VOIP data, which they feel is a threat to national security.

The great Indian telecom license frenzy

September 27, 2007 · Posted in India broadband, Tech tidbit · Comment 

Back in August, TRAI decided not to put a cap on the number of telecom operators in India. And this has triggered a frenzy in getting telecom licenses.

Day by day, it looks like every big Indian firm, who is sitting on a pile of some extra cash, is applying for a telecom license. You have the real estate companies (DLF, Parsvnath, IndiaBulls Real Estate Ltd., Unitech Ltd.). And now, with Russian company Sistema acquiring Shyam Telelink, even foreign players are jumping in.

While some of the license applicants maybe serious about their telecom foray, I think majority are aiming to get a piece of the pie - and then sellout at a premium at a later time.

Just like DoT announced that no new applications will be accepted for mobile licenses starting Oct 1, at this rate, such an announcement should come soon for the telecom sector as well.

Content sab ka baap hai

August 14, 2007 · Posted in Tech tidbit · 2 Comments 

Well, thats exactly what a study released by the Online Publishers Association reveals.  Online content is king. Internet users spend almost half their time viewing news, entertainment content etc. overshadowing other online activities like email, online search, and online shopping.

The 4 year study conducted by Nielsen/NetRatings recorded a 37 % increase in time spent on content sites, which includes online videos too.

Online content accounts for 47% of the time spent online in 2007 (was 34% in 2003). Web search accounted for 5% of the time spent online in 2007 (3 % in 2003). Commerce and shopping sites accounted for 15%  in 2007 (20% in 2003) while email (& other communications) accounted for 33% of time spent online in 2007 (46% in 2003).

via Reuters 

« Previous PageNext Page »