DecisionStats http://www.decisionstats.com Better Decisions= Faster Stats Tue, 15 May 2012 08:00:30 +0000 en hourly 1 http://wordpress.org/?v=3.3.2 Happy $100 Billion to Mark Zuckerberg Productions ! http://www.decisionstats.com/happy-100-billion-to-mark-zuckerberg-productions/ http://www.decisionstats.com/happy-100-billion-to-mark-zuckerberg-productions/#comments Tue, 15 May 2012 08:00:14 +0000 Ajay Ohri http://www.decisionstats.com/?p=10077 Heres to an expected $100 billion market valuation to the latest Silicon Valley Legend, Facebook- A Mark Zuckerberg Production.

Some milestones that made FB what it is-

1) Beating up MySpace, Ibibo, Google Orkut combined

2) Smart timely acquisitions from Friend feed , to Instagram

3) Superb infrastructure for 900 million accounts, fast interface rollouts, and a policy of never deleting data. Some of this involved creating new technology like Cassandra. There have been no anti-trust complaints against FB’s behavior particularly as it simply stuck to being the cleanest interface offering a social network

4) Much envied and copied features like Newsfeed, App development on the FB platform, Social Gaming as revenue streams

5) Replacing Google as the hot techie employer, just like Google did to Microsoft.

6) An uncanny focus, including walking away from a billion dollars from Yahoo,resisting Google, Apple’s Ping, imposing design changes unilaterally, implementing data sharing only with flexible partners  and strategic investors (like Bing)

FB has made more money for more people than any other company in the past ten years. Here’s wishing it an even more interesting next ten years! With 900 million users if they could integrate a PayPal like system, or create an alternative to Adsense for content creators, they could create an all new internet economy – one which is more open than the Google dominated internet ; 0

 

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/happy-100-billion-to-mark-zuckerberg-productions/feed/ 0
BigML meets R #rstats http://www.decisionstats.com/bigml-meets-r-rstats/ http://www.decisionstats.com/bigml-meets-r-rstats/#comments Fri, 11 May 2012 10:53:35 +0000 Ajay Ohri http://www.decisionstats.com/?p=10073 I am just checking the nice new R package created by BigML.com co-founder Justin Donaldson. The name of the new package is bigml, which can confuse a bit since there do exist many big suffix named packages in R (including biglm)

The bigml package is available at CRAN http://cran.r-project.org/web/packages/bigml/index.html

I just tweaked the code given at http://blog.bigml.com/2012/05/10/r-you-ready-for-bigml/ to include the ssl authentication code at http://www.brocktibert.com/blog/2012/01/19/358/

so it goes

> library(bigml)
Loading required package: RJSONIO
Loading required package: RCurl
Loading required package: bitops
Loading required package: plyr
> setCredentials(“bigml_username”,”API_key”)

# download the file needed for authentication
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

# set the curl options
curl <- getCurlHandle()
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem",
package = "RCurl"),
ssl.verifypeer = FALSE))
curlSetOpt(.opts = list(proxy = 'proxyserver:port'), curl = curl)

> iris.model = quickModel(iris, objective_field = ‘Species’)

Of course there are lots of goodies added here , so read the post yourself at http://blog.bigml.com/2012/05/10/r-you-ready-for-bigml/

Incidentally , the author of this R package (bigml) Justin Donalsdon who goes by name sudojudo at http://twitter.com/#!/sudojudo has also recently authored two other R packages including tsne at  http://cran.r-project.org/web/packages/tsne/index.html (tsne: T-distributed Stochastic Neighbor Embedding for R (t-SNE) -A “pure R” implementation of the t-SNE algorithm) and a GUI toolbar http://cran.r-project.org/web/packages/sculpt3d/index.html (sculpt3d is a GTK+ toolbar that allows for more interactive control of a dataset inside the RGL plot window. Controls for simple brushing, highlighting, labeling, and mouseMode changes are provided by point-and-click rather than through the R terminal interface)

This along with the fact the their recently released python bindings for bigml.com was one of the top news at Hacker News- shows bigML.com is going for some traction in bringing cloud computing, better software interfaces and data mining together!

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/bigml-meets-r-rstats/feed/ 1
Interview BigML.com http://www.decisionstats.com/interview-bigml-com/ http://www.decisionstats.com/interview-bigml-com/#comments Tue, 01 May 2012 05:11:52 +0000 Ajay Ohri http://www.decisionstats.com/?p=10066 Here is an interview with Charlie Parker, head of large scale online algorithms at http://bigml.com

Ajay-  Describe your own personal background in scientific computing, and how you came to be involved with machine learning, cloud computing and BigML.com

Charlie- I am a machine learning Ph.D. from Oregon State University. Francisco Martin (our founder and CEO), Adam Ashenfelter (the lead developer on the tree algorithm), and myself were all studying machine learning at OSU around the same time. We all went our separate ways after that.

Francisco started Strands and turned it into a 100+ million dollar company building recommender systems. Adam worked for CleverSet, a probabilistic modeling company that was eventually sold to Cisco, I believe. I worked for several years in the research labs at Eastman Kodak on data mining, text analysis, and computer vision.

When Francisco left Strands to start BigML, he brought in Justin Donaldson who is a brilliant visualization guy from Indiana, and an ex-Googler named Jose Ortega who is responsible for most of our data infrastructure. They pulled in Adam and I a few months later. We also have Poul Petersen, a former Strands employee, who manages our herd of servers. He is a wizard and makes everyone else’s life much easier.

Ajay- You use clojure for the back end of BigML.com .Are there any other languages and packages you are considering? What makes clojure such a good fit for cloud computing ?

Charlie- Clojure is a great language because it offers you all of the benefits of Java (extensive libraries, cross-platform compatibility, easy integration with things like Hadoop, etc.) but has the syntactical elegance of a functional language. This makes our code base small and easy to read as well as powerful.

We’ve had occasional issues with speed, but that just means writing the occasional function or library in Java. As we build towards processing data at the Terabyte level, we’re hoping to create a framework that is language-agnostic to some extent. So if we have some great machine learning code in C, for example, we’ll use Clojure to tie everything together, but the code that does the heavy lifting will still be in C. For the API and Web layers, we use Python and Django, and Justin is a huge fan of HaXe for our visualizations.

 Ajay- Current support is for Decision Trees. When can we see SVM, K Means Clustering and Logit Regression?

Charlie- Right now we’re focused on perfecting our infrastructure and giving you new ways to put data in the system, but expect to see more algorithms appearing in the next few months. We want to make sure they are as beautiful and easy to use as the trees are. Without giving too much away, the first new thing we will probably introduce is an ensemble method of some sort (such as Boosting or Bagging). Clustering is a little further away but we’ll get there soon!

Ajay- How can we use the BigML.com API using R and Python.

Charlie- We have a public github repo for the language bindings. https://github.com/bigmlcom/io Right now, there there are only bash scripts but that should change very soon. The python bindings should be there in a matter of days, and the R bindings in probably a week or two. Clojure and Java bindings should follow shortly after that. We’ll have a blog post about it each time we release a new language binding. http://blog.bigml.com/

Ajay-  How can we predict large numbers of observations using a Model  that has been built and pruned (model scoring)?

Charlie- We are in the process of refactoring our backend right now for better support for batch prediction and model evaluation. This is something that is probably only a few weeks away. Keep your eye on our blog for updates!

Ajay-  How can we export models built in BigML.com for scoring data locally.

Charlie- This is as simple as a call to our API. https://bigml.com/developers/models The call gives you a JSON object representing the tree that is roughly equivalent to a PMML-style representation.

About-

You can read about Charlie Parker at http://www.linkedin.com/pub/charles-parker/11/85b/4b5 and the rest of the BigML team at

https://bigml.com/team

 

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/interview-bigml-com/feed/ 0
Converting SAS language code to Java http://www.decisionstats.com/converting-sas-language-code-to-jave/ http://www.decisionstats.com/converting-sas-language-code-to-jave/#comments Fri, 27 Apr 2012 10:48:46 +0000 Ajay Ohri http://www.decisionstats.com/?p=10058 http://dullesopen.com/ have an interesting nifty application to demonstrate their product Carolina, which is said to run much faster because it converts SAS language code to Java code, hence ideal for Big Data applications and for legacy transitions of SAS code to any other in-database method you want to invoking Java

While the website has mostly not been updated in 3-4 years, Dulles Research and Dulles open were even before WPS in getting an alternative to SAS language compilers.

From

http://dullesopen.com/company/history

DullesOpen.com presents enterprise IT managers with Carolina™, the first software utility to convert Base SAS® language programs to industry standard Java. By parsing Base SAS code and building an alternative Base SAS syntax tree, DullesOpen.com created, in effect, the industry’s first compiler for the dominant enterprise data scoring programming language.

Carolina replaces the need for customers to license, in full or in part, many of the following products:  Base SAS, SAS Access®, SAS Connect®, SAS Integration Technologies®, SAS BI Server, and other SAS technologies.

Carolina enables SAS users to achieve levels of integration and performance not possible with native SAS.  Examples include: 100% in-database processing support for SAS (not possible with SAS); web application integration; Business Rules Management Systems integration; third-party application integration; MXG® support with no underlying SAS license; SAS program development, via the forthcoming Carolina IDE™ desktop client.

and finally a look at the online SAS code to Jave code converter at

http://demo.dullesopen.com/carolina/servlet/

 

 

SAS Institute missed out on the big money in web analytics, and if they want to stay pertinent in big data analytics, they need to do more than rely on Teradata and maybe acquire smaller companies with interesting technologies .

(or atleast we need to have more online converters from one language to another!)

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/converting-sas-language-code-to-jave/feed/ 2
Avengers Review http://www.decisionstats.com/avengers-review/ http://www.decisionstats.com/avengers-review/#comments Fri, 27 Apr 2012 09:27:20 +0000 Ajay Ohri http://www.decisionstats.com/?p=10052 Avengers is the big ticket block buster which heralds summer just like the groundhog denotes spring. An ensemble cast (of superheroes and okay actors) , it stars Hulk (angry green man aka Dr Bruce Banner /Mark Ruffalo) ,Iron Man (genius billionaire philanthropist playboy aka Tony Stark / Robert Downey Jr), Thor (an Australian looking Chris H), Loki (God of Mischief played  by German looking Tom Hiddleston ), Captain America  and Scarlet Johnassen and Jeremy “Hurt Locker” Renner and Samuel L Jackson. You know somethings is gotta give if the A List stars(?) in the cast is going to be longer than a plot summary.

Well Loki the bad guys strikes a deal with some other bad Guys of funnily named world called Assguard (parallel universe!) and tries to find a cube (which is all energy powerful like the Transformers 1 Cube)  and in return gets an Army from the dark side (who look just  like Cybertrons and Lords of the Rings orcs combined).

The Avengers after much dilly dallying, trying to emote, create bromances, tension buildup, in the end decide to give you what you came looking for- a visual feast of credible looking CGI to counter the bad guys. The scene stealer is the Hulk. He is kind of cute for a big green guy, if you dont know what I mean, see the movie!

This is American cinema at its most profoundly intellectual since the Die Hard series. and its quite entertaining, especially if you are a geeky comic book fan-boy (like me).

Summer is here and so are the super-heroes!! Unleash the popcorn.

 

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/avengers-review/feed/ 0
Software Review- Google Drive versus Dropbox http://www.decisionstats.com/software-review-google-drive-versus-dropbox/ http://www.decisionstats.com/software-review-google-drive-versus-dropbox/#comments Wed, 25 Apr 2012 08:05:15 +0000 Ajay Ohri http://www.decisionstats.com/?p=10045 Here are some notes from reviewing Google Drive  https://drive.google.com/ vs Dropbox https://www.dropbox.com/.

1) Google Drive gives more free space upfront  than Dropbox.5GB versus 2GB

2) Dropbox has a referral system 500 mb per referral while there is no referral system for Google Drive

3) The sync facility with Google Docs makes Google Drive especially useful for prior users of Google Docs.

4) API access to Google Drive is only for Chrome apps which is intriguing!

https://developers.google.com/drive/apps_overview

Apps will not have any API access to files unless users have first installed the app in Chrome Web Store.

You can use the Dropbox API much more easily -

See the platforms at

https://www.dropbox.com/developers/start/core

Choose your platform:

iOS Android Python Ruby

But-

(though I wonder if you set the R working directory to the local shared drive for Google Drive it should sync up as well but of course be slower -http://scrogster.wordpress.com/2011/01/29/using-dropbox-with-r-2/)

5) Google Drive icon is ugly (seriously, dude!) , but the features in the Windows app is just the same as the Dropbox App. Too similar ;)

 

6) Upgrade space is much more cheaper to Google Drive than Dropbox ( by Google Drive prices being exactly  a quarter of prices on Dropbox and max storage being 16 times as much). This will affect power storage users. I expect to see some slowdown in Dropbox new business unless G Drive has outage (like Gmail) . Existing users at Dropbox probably wont shift for the small dollar amount- though it is quite easy to do so.

 

Install Google Drive on your local workstation and cut and paste your Dropbox local folder to the Google Drive local folder!!

7) Dropbox deserves credit for being first (like Hotmail and AOL) but Google Drive is almost better in all respects!

Google Drive

Free
5 GB of Drive (0% used)
10 GB of Gmail (48% used)
1 GB of Picasa (0% used)

Upgrade:

25 GB
2,49 $ / Month
+25 GB for Drive and Picasa
Bonus: Your Gmail storage will be upgraded to 25 GB.
Choose this plan

100 GB
4,99 $ / Month
+100 GB for Drive and Picasa
Bonus: Your Gmail storage will be upgraded to 25 GB.
Choose this plan

 Need more storage?

Up to 16 TB available

Dropbox–

Current account type

Large DropboxDropbox Badge greenFree
Free
Up to 18 GB (2 GB + 500 MB per referral)
Account info 

Other account types

Large DropboxDropbox Badge orange50 GB +
Pro 50
+1 GB per referral, up to +32 GB
$9.99/month or $99.00/year Upgrade to Pro 50
Large DropboxDropbox Badge purple100 GB +
Pro 100
+1 GB per referral, up to +32 GB
$19.99/month or $199.00/year Upgrade to Pro 100
Triple DropboxDropbox For Teams Badge1 TB +
Teams
Plans starting at 1 TB
Large shared quota, centralized admin and billing, and more!

 

 

 

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/software-review-google-drive-versus-dropbox/feed/ 0
Software Review- BigML.com – Machine Learning meets the Cloud http://www.decisionstats.com/software-review-bigml-com-machine-learning-meets-the-cloud/ http://www.decisionstats.com/software-review-bigml-com-machine-learning-meets-the-cloud/#comments Mon, 23 Apr 2012 15:22:26 +0000 Ajay Ohri http://www.decisionstats.com/?p=10038 I had a chance to dekko the new startup BigML https://bigml.com/ and was suitably impressed by the briefing and my own puttering around the site. Here is my review-

1) The website is very intutively designed- You can create a dataset from an uploaded file in one click and you can create a Decision Tree model in one click as well. I wish other cloud computing websites like  Google Prediction API make design so intutive and easy to understand. Also unlike Google Prediction API, the models are not black box models, but have a description which can be understood.

2) It includes some well known data sources for people trying it out. They were kind enough to offer 5 invite codes for readers of Decisionstats ( if you want to check it yourself, use the codes below the post, note they are one time only , so the first five get the invites.

BigML is still invite only but plan to get into open release soon.

3) Data Sources can only be by uploading files (csv) but they plan to change this hopefully to get data from buckets (s3? or Google?) and from URLs.

4) The one click operation to convert data source into a dataset shows a histogram (distribution) of individual variables.The back end is clojure , because the team explained it made the easiest sense and fit with Java. The good news (?) is you would never see the clojure code at the back end. You can read about it from http://clojure.org/

As cloud computing takes off (someday) I expect clojure popularity to take off as well.

Clojure is a dynamic programming language that targets the Java Virtual Machine (and the CLR, and JavaScript). It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming. Clojure is a compiled language – it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection.

Clojure is a dialect of Lisp

 

5) As of now decision trees is the only distributed algol, but they expect to roll out other machine learning stuff soon. Hopefully this includes regression (as logit and linear) and k means clustering. The trees are created and pruned in real time which gives a slightly animated (and impressive effect). and yes model building is an one click operation.

The real time -live pruning is really impressive and I wonder why /how it can ever be replicated in other software based on desktop, because of the sheer interactive nature.

 

Making the model is just half the work. Creating predictions and scoring the model is what is really the money-earner. It is one click and customization is quite intuitive. It is not quite PMML compliant yet so I hope some Zemanta like functionality can be added so huge amounts of models can be applied to predictions or score data in real time.

 

If you are a developer/data hacker, you should check out this section too- it is quite impressive that the designers of BigML have planned for API access so early.

https://bigml.com/developers

BigML.io gives you:

  • Secure programmatic access to all your BigML resources.
  • Fully white-box access to your datasets and models.
  • Asynchronous creation of datasets and models.
  • Near real-time predictions.

 

Note: For your convenience, some of the snippets below include your real username and API key.

Please keep them secret.

REST API

BigML.io conforms to the design principles of Representational State Transfer (REST)BigML.io is enterely HTTP-based.

BigML.io gives you access to four basic resources: SourceDatasetModel and Prediction. You cancreatereadupdate, and delete resources using the respective standard HTTP methods: POSTGET,PUT and DELETE.

All communication with BigML.io is JSON formatted except for source creation. Source creation is handled with a HTTP PUT using the “multipart/form-data” content-type

HTTPS

All access to BigML.io must be performed over HTTPS

and https://bigml.com/developers/quick_start ( In think an R package which uses JSON ,RCurl  would further help in enhancing ease of usage).

 

Summary-

Overall a welcome addition to make software in the real of cloud computing and statistical computation/business analytics both easy to use and easy to deploy with fail safe mechanisms built in.

Check out https://bigml.com/ for yourself to see.

The invite codes are here -one time use only- first five get the invites- so click and try your luck, machine learning on the cloud.

If you dont get an invite (or it is already used, just leave your email there and wait a couple of days to get approval)

  1. https://bigml.com/accounts/register/?code=E1FE7
  2. https://bigml.com/accounts/register/?code=09991
  3. https://bigml.com/accounts/register/?code=5367D
  4. https://bigml.com/accounts/register/?code=76EEF
  5. https://bigml.com/accounts/register/?code=742FD

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/software-review-bigml-com-machine-learning-meets-the-cloud/feed/ 1
Oracle R Updated! http://www.decisionstats.com/oracle-r-updated/ http://www.decisionstats.com/oracle-r-updated/#comments Fri, 20 Apr 2012 17:27:34 +0000 Ajay Ohri http://www.decisionstats.com/?p=10035 Interesting message from https://blogs.oracle.com/R/ the latest R blog

 

_——–_

Oracle just released the latest update to Oracle R Enterprise, version 1.1. This release includes the Oracle R Distribution (based on open source R, version 2.13.2), an improved server installation, and much more.  The key new features include:

  • Extended Server Support: New support for Windows 32 and 64-bit server components, as well as continuing support for Linux 64-bit server components
  • Improved Installation: Linux 64-bit server installation now provides robust status updates and prerequisite checks
  • Performance Improvements: Improved performance for embedded R script execution calculations

In addition, the updated ROracle package, which is used with Oracle R Enterprise, now reads date data by conversion to character strings.

We encourage you download Oracle software for evaluation from the Oracle Technology Network. See these links for R-related software: Oracle R DistributionOracle R EnterpriseROracleOracle R Connector for Hadoop.  As always, we welcome comments and questions on the Oracle R Forum.

 

 

Oracle R Distribution 2-13.2 Update Available

Oracle has released an update to the Oracle R Distribution, an Oracle-supported distribution of open source R. Oracle R Distribution 2-13.2 now contains the ability to dynamically link the following libraries on both Windows and Linux:

  • The Intel Math Kernel Library (MKL) on Intel chips
  • The AMD Core Math Library (ACML) on AMD chips

 

To take advantage of the performance enhancements provided by Intel MKL or AMD ACML in Oracle R Distribution, simply add the MKL or ACML shared library directory to the LD_LIBRARY_PATH system environment variable. This automatically enables MKL or ACML to make use of all available processors, vastly speeding up linear algebra computations and eliminating the need to recompile R.  Even on a single core, the optimized algorithms in the Intel MKL libraries are faster than using R’s standard BLAS library.

Open-source R is linked to NetLib’s BLAS libraries, but they are not multi-threaded and only use one core. While R’s internal BLAS are efficient for most computations, it’s possible to recompile R to link to a different, multi-threaded BLAS library to improve performance on eligible calculations. Compiling and linking to R yourself can be involved, but for many, the significantly improved calculation speed justifies the effort. Oracle R Distribution notably simplifies the process of using external math libraries by enabling R to auto-load MKL orACML. For R commands that don’t link to BLAS code, taking advantage of database parallelism usingembedded R execution in Oracle R Enterprise is the route to improved performance.

For more information about rebuilding R with different BLAS libraries, see the linear algebra section in the R Installation and Administration manual. As always, the Oracle R Distribution is available as a free download to anyone. Questions and comments are welcome on the Oracle R Forum.

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/oracle-r-updated/feed/ 0
New Economics Theories for the new Tech World http://www.decisionstats.com/new-economics-theories-for-the-new-tech-world/ http://www.decisionstats.com/new-economics-theories-for-the-new-tech-world/#comments Tue, 17 Apr 2012 09:30:26 +0000 Ajay Ohri http://www.decisionstats.com/?p=10031 When I was doing my MBA (a decade ago), one of the principal theories on why corporations exist was 1) Shareholder Value creation (grow wealth for investors) and a notable second was 2) Stakeholder Value creation- creating jobs for societies, providing tax to countries, providing employees with stable employment and incentives,  and of course creating monetary value for shareholders.

There were two ways you could raise money- debt or equity. Debt had the advantage of interest payments being tax deductible. Debt payments had to be met regularly. Equity had the advantage that equity holders were the last ones to be paid in case of closing the company down, which justified that rate of return on equity is generally higher than cost of debt.  Dividend payouts to stockholders could be deferred in a low revenue year or due to planning reasons.

Or in plain English, over the long term borrowing money from share holders in lieu of stocks was more expensive than selling bonds or borrowing from the banks.

Hybrid combinations of debt and equity were warrants and debentures that started off as one form of instrument and over a period of time gave much more flexibility and risk safety nets to both issuers and subscribers of capital. Another hybrid was stock options (now considered as a default option of rewarding employees in technology companies, but this was not always the case).

The use of call and put options in debentures, and the idea of vesting period in stock options was to promote lone term stability and minimize fluctuations in stock prices, employee attrition, besides of course to minimize the weighted average cost of capital. Venture capital was another class of capital known for both huge rates of return and risk taking (?)

But in today’s world where a Google has three classes of shares, companies trade shares before IPOs, and valuations of technology companies sink and rise by huge % over weeks (especially as they near IPO dates)- I wonder if traditional theories in finance need a much stronger overhaul.

or do markets need a regulatory overhaul, that would enable stock exchanges to have once more the credibility they had as the primary sources of raising capital.

 

Who will guard the guardians? Their conscience- the regulators or the news media?

There are ways of raising money that are not evil.

But they are not perfectly fair as well.

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/new-economics-theories-for-the-new-tech-world/feed/ 0
Easter Eggs in #Rstats http://www.decisionstats.com/easter-eggs-in-rstats/ http://www.decisionstats.com/easter-eggs-in-rstats/#comments Fri, 13 Apr 2012 17:52:45 +0000 Ajay Ohri http://www.decisionstats.com/?p=10024 Yes.

Cite-http://en.wikipedia.org/wiki/Easter_egg_(media)

A virtual Easter egg is an intentional hidden messagein-joke, or feature in a work such as a computer programweb pagevideo gamemoviebook, or crossword. The term was coined — according to Warren Robinett — by Atari after they were pointed to the secret message left by Robinett in the game Adventure.[1] It draws a parallel with the custom of the Easter egg hunt observed in many Western nations as well as the last Russian imperial family’s tradition of giving elaborately jeweled egg-shaped creations by Carl Fabergé which contained hidden surprises

In R.

Cite-http://stackoverflow.com/questions/7910270/are-there-any-easter-eggs-in-base-r-or-in-major-packages

I like this

just type

example(readLine)

and these two

on 32 bit R type

memory.limit(4096)

and on any version try four question marks

Perhaps the prettiest eggs are the demos in animation package.

But there is magic in asking for help on internal functions in R

Just type-

?.Internal

and you get the sobering thought that you probably are a R Muggle

Call an Internal Function

Description

.Internal performs a call to an internal code which is built in to the R interpreter.

Only true R wizards should even consider using this function, and only R developers can add to the list of internal functions.

Usage

 .Internal(call)

Arguments

call a call expression

See Also

.Primitive, .External (the nearest equivalent available to users).

I liked that I could see the actual internal functions in svn at http://svn.r-project.org/R/trunk/src/main/names.c

The opening of the internals document floored me.

It must have been a curious year in 2003-4 when the copyright of R was held (briefly it seems) by the R Foundation and also by the R Development Core Team. (which sounds better?)

*  R : A Computer Language for Statistical Data Analysis
 *  Copyright (C) 1995, 1996  Robert Gentleman and Ross Ihaka
 *  Copyright (C) 1997--2012  The R Development Core Team
 *  Copyright (C) 2003, 2004  The R Foundation

My contribution

R help discourages for loop

Try ??for or ?for

you go into a loop till you hit escape

If you want more-just write
 .Internal(inspect(ls())) at the end of your  R program.

 

 

 

 

 

 

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/easter-eggs-in-rstats/feed/ 0
Facebook Search- The fall of the machines http://www.decisionstats.com/facebook-search-the-fall-of-the-machines/ http://www.decisionstats.com/facebook-search-the-fall-of-the-machines/#comments Thu, 12 Apr 2012 15:23:24 +0000 Ajay Ohri http://www.decisionstats.com/?p=10020 Increasingly I am beginning to search more and more on Facebook. This is for the following reasons-

1) Facebook is walled off to Google (mostly). While within Facebook , I get both people results and content results (from Bing).

Bing is an okay alternative , though not as fast as Google Instant.

2) Cleaner Web Results When Facebook increases the number of results from 3 top links to say 10 top links, there should be more outbound traffic from FB search to websites.For some reason Google continues to show 14 pages of results… Why? Why not limit to just one page.

3) Better People Search than  Pipl and Google. But not much (or any) image search. This is curious and I am hoping the Instagram results would be added to search results.

4) I am hoping for any company Facebook or Microsoft to challenge Adsense . Adwords already has rivals. Adsense is a de facto monopoly and my experiences in advertising show that content creators can make much more money from a better Adsense (especially ) if Adsense and Adwords do not have a conflict of interest from same advertisers.

Adwords should have been a special case of Adsense for Google.com but it is not.

5) Machine learning can only get you from tau to delta tau. When ad click behavior is inherently dependent on humans who behave mostly on chaotic , or genetic models than linear CPC models. I find FB has an inherent advantage in the quantity and quality of data collected on people behavior rather than click behavior. They are also more aggressive and less apologetic about behavorially targeted  ads.

Additional point- Analytics for Google Analytics is not as rich as analytics from Facebook pages in terms of demographic variables. This can be tested by anyone.

 

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/facebook-search-the-fall-of-the-machines/feed/ 0
Play Color Cipher and Visual Cryptography http://www.decisionstats.com/play-color-cipher-and-visual-cryptography/ http://www.decisionstats.com/play-color-cipher-and-visual-cryptography/#comments Mon, 09 Apr 2012 07:03:18 +0000 Ajay Ohri http://www.decisionstats.com/?p=10006 I was just reading up on my weekly to-read list and came across this interesting method. It is called Play Color Cipher-

 

Each Character ( Capital, Small letters, Numbers (0-9), Symbols on the keyboard ) in the plain text is substituted with a color block from the available 18 Decillions of colors in the world [11][12][13] and at the receiving end the cipher text block (in color) is decrypted in to plain text block. It overcomes the problems like “Meet in the middle attack, Birthday attack and Brute force attacks [1]”.
It also reduces the size of the plain text when it is encrypted in to cipher text by 4 times, with out any loss of content. Cipher text occupies very less buffer space; hence transmitting through channel is very fast. With this the transportation cost through channel comes down.

Reference-

http://www.ijcaonline.org/journal/number28/pxc387832.pdf

Visual Cryptography is indeed an interesting topic-

Visual cryptography, an emerging cryptography technology, uses the characteristics of human vision to decrypt encrypted
images. It needs neither cryptography knowledge nor complex computation. For security concerns, it also ensures that hackers
cannot perceive any clues about a secret image from individual cover images. Since Naor and Shamir proposed the basic
model of visual cryptography, researchers have published many related studies.

Visual cryptography (VC) schemes hide the secret image into two or more images which are called
shares. The secret image can be recovered simply by stacking the shares together without any complex
computation involved. The shares are very safe because separately they reveal nothing about the secret image.

Visual Cryptography provides one of the secure ways to transfer images on the Internet. The advantage
of visual cryptography is that it exploits human eyes to decrypt secret images .

References-

Color Visual Cryptography Scheme Using Meaningful Shares

http://csis.bits-pilani.ac.in/faculty/murali/netsec-10/seminar/refs/muralikrishna4.pdf

Visual cryptography for color images

http://csis.bits-pilani.ac.in/faculty/murali/netsec-10/seminar/refs/muralikrishna3.pdf

Other Resources

  1. http://users.telenet.be/d.rijmenants/en/visualcrypto.htm
  2. Visual Crypto – One-time Image Create two secure images from one by Robert Hansen
  3. Visual Crypto Java Applet at the University of Regensburg
  4. Visual Cryptography Kit Software to create image layers
  5. On-line Visual Crypto Applet by Leemon Baird
  6. Extended Visual Cryptography (pdf) by Mizuho Nakajima and Yasushi Yamaguchi
  7. Visual Cryptography Paper by Moni Noar and Adi Shamir
  8. Visual Crypto Talk (pdf) by Frederik Vercauteren ESAT Leuven
  9. http://cacr.uwaterloo.ca/~dstinson/visual.html
  10. t the University of Salerno web page on visual cryptogrpahy.
  11. Visual Crypto Page by Doug Stinson
  12. Simple implementation of the visual cryptography scheme based on Moni Naor and Adi Shamir, Visual Cryptography, EUROCRYPT 1994, pp1–12. This technique allows visual information like pictures to be encrypted so that decryption can be done visually.The code outputs two files. Try printing them on two separate transparencies and putting them one on top of the other to see the hidden message. http://algorito.com/algorithm/visual-cryptography

Visual Cryptography 

Ajay- I think a combination of sharing and color ciphers would prove more helpful to secure Internet Communication than existing algorithms. It also levels the playing field from computationally rich players to creative coders.

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/play-color-cipher-and-visual-cryptography/feed/ 0
Color Palettes in R using RColorBrewer #rstats http://www.decisionstats.com/color-palettes-in-r-using-rcolorbrewer-rstats/ http://www.decisionstats.com/color-palettes-in-r-using-rcolorbrewer-rstats/#comments Sun, 08 Apr 2012 07:05:45 +0000 Ajay Ohri http://www.decisionstats.com/?p=10001 The lovely colors at http://ColorBrewer.org can be used for much better color palettes in R.

library(RColorBrewer)

display.brewer.all()

and we use the function

brewer.pal(N,”Name”) as the col  parameter for the new color palettes

where we can see name of palettes  from the list above

 

data(VADeaths)
par(mfrow=c(2,3))
 hist(VADeaths,col=brewer.pal(3,"Set3"),main="Set3 3 colors")
 hist(VADeaths,col=brewer.pal(3,"Set2"),main="Set2 3 colors")
 hist(VADeaths,col=brewer.pal(3,"Set1"),main="Set1 3 colors")
 hist(VADeaths,col=brewer.pal(8,"Set3"),main="Set3 8 colors")
 hist(VADeaths,col=brewer.pal(8,"Greys"),main="Greys 8 colors")
 hist(VADeaths,col=brewer.pal(8,"Greens"),main="Greens 8 colors")
Created by Pretty R at inside-R.org

Colors from [http://www.ColorBrewer.org] by Cynthia A. Brewer, Geography, Pennsylvania State University
• Erich Neuwirth (2011). RColorBrewer: ColorBrewer palettes. R package version 1.0-5. [http://CRAN.R-project.org/package=RColorBrewer]
Note-ColorBrewer is Copyright (c) 2002 Cynthia Brewer, Mark Harrower, and The Pennsylvania State University. All rights reserved. The ColorBrewer palettes have been included in the R package with permission of the copyright holder.

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/color-palettes-in-r-using-rcolorbrewer-rstats/feed/ 0
Cricinfo StatsGuru Database for Statistical and Graphical Analysis http://www.decisionstats.com/cricinfo-statsguru-database-for-statistical-and-graphical-analysis/ http://www.decisionstats.com/cricinfo-statsguru-database-for-statistical-and-graphical-analysis/#comments Sat, 07 Apr 2012 19:49:13 +0000 Ajay Ohri http://www.decisionstats.com/?p=9996 Data from the ESPN Cricinfo website is available from the STATSGURU website.

The url is of the form-

http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=6;template=results;type=batting

http://stats.espncricinfo.com/ci/engine/stats/index.html?

class=1;team=6;template=results;type=batting

If you break down this URL to get more statistics on cricket, you can choose the following parameters.
class
1=Test
2=ODI
3=T20I
11=Test+ODI+T20I
team
1=England
2=Australia
3=South America
4-West Indies
5=New Zealand
6=India ,7=Pakistan and 8=Sri Lanka

type
batting
bowling
fielding
allround
fow
official
team
aggregate

 

ESPN Terms of Use are here-you may need to  check this before trying any web scraping.

http://www.espncricinfo.com/ci/content/site/company/terms_use.html

 

However ESPN has unleashed the API (including both free and premium)for Developers at http://developer.espn.com/docs.

and especially these sports http://developer.espn.com/docs/headlines#parameters

/sports News across all sports/sections
/sports/baseball/mlb Major League Baseball (MLB)
/sports/basketball/mens-college-basketball NCAA Men’s College Basketball
/sports/basketball/nba National Basketball Association (NBA)
/sports/basketball/wnba Women’s National Basketball Association (WNBA)
/sports/basketball/womens-college-basketball NCAA Women’s College Basketball
/sports/boxing Boxing
/sports/football/college-football NCAA College Football
/sports/football/nfl National Football League (NFL)
/sports/golf Golf
/sports/hockey/nhl National Hockey League (NHL)
/sports/horse-racing Horse Racing
/sports/mma Mixed Martial Arts
/sports/racing Auto Racing
/sports/racing/nascar NASCAR Racing
/sports/soccer Professional soccer (US focus)
/sports/tennis Tennis

 

I wonder when this can be enabled for Cricket as well (including APIs  free,academic,premium,partner ).

(Note you can use R packages XML , RCurl , rjson, to get data from the web among others).

Plotting is best done using ggplot2 http://had.co.nz/ggplot2/ or d3.js at http://mbostock.github.com/d3/, and the current status of cricket graphics can surely look a change- they are mostly a single radial plot of shots played /runs scored or a combined barplot/line graph.

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/cricinfo-statsguru-database-for-statistical-and-graphical-analysis/feed/ 0
April Fool’s Day- Catblock! http://www.decisionstats.com/april-fools-day-catblock/ http://www.decisionstats.com/april-fools-day-catblock/#comments Sun, 01 Apr 2012 05:24:08 +0000 Ajay Ohri http://www.decisionstats.com/?p=9989 Since Anonymous didnt disrupt the internet on April Fools Day by overloading the DNS Servers! , the best April Fool’s day imho goes to Adblock- that  nifty extension that allows you to block ads.

Well for today- it replaced ads with funny cats- and you can even buy the cats for ads extension  permanently. That’s right cats take over the Internet!

Only 2% of Chrome and Firefox users block ads! so what are you waiting for- this is how the NYTimes looks for me!!

 

Replace ads with cats-

for chrome here-

https://chrome.google.com/webstore/detail/gighmmpiobklfepjocnamgkkbiglidom

for firefox here-

https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/

read more on catblock here-

http://adblockforchrome.blogspot.in/2012/03/inturdusing-catblock.html

but if you want to buy catblock—

see this

https://chromeadblock.com/pay/?source=catblock

 

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/april-fools-day-catblock/feed/ 0
Doing RFM Analysis in R http://www.decisionstats.com/doing-rfm-analysis-in-r/ http://www.decisionstats.com/doing-rfm-analysis-in-r/#comments Tue, 27 Mar 2012 20:46:33 +0000 Ajay Ohri http://www.decisionstats.com/?p=9979

RFM is a method used for analyzing customer behavior and defining market segments. It is commonly used in database marketing and direct marketing and has received particular attention in retail.


RFM stands for


  • Recency - How recently did the customer purchase?
  • Frequency - How often do they purchase?
  • Monetary Value - How much do they spend?

To create an RFM analysis, one creates categories for each attribute. For instance, the Recency attribute might be broken into three categories: customers with purchases within the last 90 days; between 91 and 365 days; and longer than 365 days. Such categories may be arrived at by applying business rules, or using a data mining technique, such as CHAID, to find meaningful breaks.

from-http://en.wikipedia.org/wiki/RFM

and here is R code- note for direct marketing you need to compute Monetization based on response rates (based on offer date) as well



##Creating Random Sales Data of the format CustomerId (unique to each customer), Sales.Date,Purchase.Value

sales=data.frame(sample(1000:1999,replace=T,size=10000),abs(round(rnorm(10000,28,13)))) 

names(sales)=c("CustomerId","Sales Value") 

sales.dates <- as.Date("2010/1/1") + 700*sort(stats::runif(10000)) 

#generating random dates

sales=cbind(sales,sales.dates) 

str(sales) 

sales$recency=round(as.numeric(difftime(Sys.Date(),sales[,3],units="days")) )

library(gregmisc)

##if you have existing sales data you need to just shape it in this format

rename.vars(sales, from="Sales Value", to="Purchase.Value")#Renaming Variable Names

## Creating Total Sales(Monetization),Frequency, Last Purchase date for each customer

salesM=aggregate(sales[,2],list(sales$CustomerId),sum)

names(salesM)=c("CustomerId","Monetization")

salesF=aggregate(sales[,2],list(sales$CustomerId),length) 

names(salesF)=c("CustomerId","Frequency")

salesR=aggregate(sales[,4],list(sales$CustomerId),min) 

names(salesR)=c("CustomerId","Recency")

##Merging R,F,M

test1=merge(salesF,salesR,"CustomerId") 

salesRFM=merge(salesM,test1,"CustomerId") 

##Creating R,F,M levels 

salesRFM$rankR=cut(salesRFM$Recency, 5,labels=F) #rankR 1 is very recent while rankR 5 is least recent

salesRFM$rankF=cut(salesRFM$Frequency, 5,labels=F)#rankF 1 is least frequent while rankF 5 is most frequent

salesRFM$rankM=cut(salesRFM$Monetization, 5,labels=F)#rankM 1 is lowest sales while rankM 5 is highest sales

##Looking at RFM tables


table(salesRFM[,5:6])
table(salesRFM[,6:7])
table(salesRFM[,5:7])

 

Code Highlighted by Pretty R at inside-R.org

 

Share/Bookmark

flattr this!

]]>
http://www.decisionstats.com/doing-rfm-analysis-in-r/feed/ 0