Predictive analytics in the cloud : Angoss

I interviewed Angoss in depth here at http://www.decisionstats.com/interview-eberhard-miethke-and-dr-mamdouh-refaat-angoss-software/

Well they just announced a predictive analytics in the cloud.

 

http://www.angoss.com/predictive-analytics-solutions/cloud-solutions/

Solutions

Overview

KnowledgeCLOUD™ solutions deliver predictive analytics in the Cloud to help businesses gain competitive advantage in the areas of sales, marketing and risk management by unlocking the predictive power of their customer data.

KnowledgeCLOUD clients experience rapid time to value and reduced IT investment, and enjoy the benefits of Angoss’ industry leading predictive analytics – without the need for highly specialized human capital and technology.

KnowledgeCLOUD solutions serve clients in the asset management, insurance, banking, high tech, healthcare and retail industries. Industry solutions consist of a choice of analytical modules:

KnowledgeCLOUD for Sales/Marketing

KnowledgeCLOUD solutions are delivered via KnowledgeHUB™, a secure, scalable cloud-based analytical platform together with supporting deployment processes and professional services that deliver predictive analytics to clients in a hosted environment. Angoss industry leading predictive analytics technology is employed for the development of models and deployment of solutions.

Angoss’ deep analytics and domain expertise guarantees effectiveness – all solutions are back-tested for accuracy against historical data prior to deployment. Best practices are shared throughout the service to optimize your processes and success. Finely tuned client engagement and professional services ensure effective change management and program adoption throughout your organization.

For businesses looking to gain a competitive edge and put their data to work, Angoss is the ideal partner.

—-

Hmm. Analytics in the cloud . Reduce hardware costs. Reduce software costs . Increase profitability margins.

Hmmmmm

My favorite professor in North Carolina who calls cloud as a time sharing, are you listening Professor?

Jill Dyche on 2012

In part 3 of the series for predictions for 2012, here is Jill Dyche, Baseline Consulting/DataFlux.

Part 2 was Timo Elliot, SAP at http://www.decisionstats.com/timo-elliott-on-2012/ and Part 1 was Jim Kobielus, Forrester at http://www.decisionstats.com/jim-kobielus-on-2012/

Ajay: What are the top trends you saw happening in 2011?

 

Well, I hate to say I saw them coming, but I did. A lot of managers committed some pretty predictable mistakes in 2011. Here are a few we witnessed in 2011 live and up close:

 

1.       In the spirit of “size matters,” data warehouse teams continued to trumpet the volumes of stored data on their enterprise data warehouses. But a peek under the covers of these warehouses reveals that the data isn’t integrated. Essentially this means a variety of heterogeneous virtual data marts co-located on a single server. Neat. Big. Maybe even worthy of a magazine article about how many petabytes you’ve got. But it’s not efficient, and hardly the example of data standardization and re-use that everyone expects from analytical platforms these days.

 

2.       Development teams still didn’t factor data integration and provisioning into their project plans in 2011. So we saw multiple projects spawn duplicate efforts around data profiling, cleansing, and standardization, not to mention conflicting policies and business rules for the same information. Bummer, since IT managers should know better by now. The problem is that no one owns the problem. Which brings me to the next mistake…

 

3.       No one’s accountable for data governance. Yeah, there’s a council. And they meet. And they talk. Sometimes there’s lunch. And then nothing happens because no one’s really rewarded—or penalized for that matter—on data quality improvements or new policies. And so the reports spewing from the data mart are still fraught and no one trusts the resulting decisions.

 

But all is not lost since we’re seeing some encouraging signs already in 2012. And yes, I’d classify some of them as bona-fide trends.

 

Ajay: What are some of those trends?

 

Job descriptions for data stewards, data architects, Chief Data Officers, and other information-enabling roles are becoming crisper, and the KPIs for these roles are becoming more specific. Data management organizations are being divorced from specific lines of business and from IT, becoming specialty organizations—okay, COEs if you must—in their own rights. The value proposition for master data management now includes not just the reconciliation of heterogeneous data elements but the support of key business strategies. And C-level executives are holding the data people accountable for improving speed to market and driving down costs—not just delivering cleaner data. In short, data is becoming a business enabler. Which, I have to just say editorially, is better late than never!

 

Ajay: Anything surprise you, Jill?

 

I have to say that Obama mentioning data management in his State of the Union speech was an unexpected but pretty powerful endorsement of the importance of information in both the private and public sector.

 

I’m also sort of surprised that data governance isn’t being driven more frequently by the need for internal and external privacy policies. Our clients are constantly asking us about how to tightly-couple privacy policies into their applications and data sources. The need to protect PCI data and other highly-sensitive data elements has made executives twitchy. But they’re still not linking that need to data governance.

 

I should also mention that I’ve been impressed with the people who call me who’ve had their “aha!” moment and realize that data transcends analytic systems. It’s operational, it’s pervasive, and it’s dynamic. I figured this epiphany would happen in a few years once data quality tools became a commodity (they’re far from it). But it’s happening now. And that’s good for all types of businesses.

 

About-

Jill Dyché has written three books and numerous articles on the business value of information technology. She advises clients and executive teams on leveraging technology and information to enable strategic business initiatives. Last year her company Baseline Consulting was acquired by DataFlux Corporation, where she is currently Vice President of Thought Leadership. Find her blog posts on www.dataroundtable.com.

Interview Markus Schmidberger ,Cloudnumbers.com

Here is an interview with Markus Schmidberger, Senior Community Manager for cloudnumbers.com. Cloudnumbers.com is the exciting new cloud startup for scientific computing. It basically enables transition to a R and other platforms in the cloud and makes it very easy and secure from the traditional desktop/server model of operation.

Ajay- Describe the startup story for setting up Cloudnumbers.com

Markus- In 2010 the company founders Erik Muttersbach (TU München), Markus Fensterer (TU München) and Moritz v. Petersdorff-Campen (WHU Vallendar) started with the development of the cloud computing environment. [Read more...]

Page Mathematics

I was looking at the site http://www.google.com/adplanner/static/top1000/index.html

and I saw this list (Below) and using a Google Doc at https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AtYMMvghK2ytdE9ybmVQeUxMeXdjWlVKYzRlMkxjX0E&output=html.

I then decided to divide  pageviews by users to check the maths

Facebook is AAAAAmazing! and the Russian social network is amazing too!

or

The maths is wrong! (maybe sampling, maybe virtual pageviews caused by friendstream refresh)

but the average of 1,136 page views per unique visitor per month means 36 page views /visitor a Day!

Rank Site     Category        Unique Visitors (users) Page Views Views/Visitors
1  facebook.com	  Social Networks	880000000 1000000000000	1,136
29 linkedin.com	  Social Networks	80000000     2500000000	31
38 orkut.com	  Social Networks	66000000     4000000000	61
40 orkut.com.br	  Social Networks	62000000    43000000000	694
65 weibo.com	  Social Networks	42000000     2800000000	67
66 renren.com	  Social Networks	42000000     3300000000	79
84 odnoklassniki.ru Social Networks	37000000    13000000000	351
90 scribd.com	  Social Networks	34000000      140000000	4
95 vkontakte.ru	  Social Networks	34000000    48000000000	1,412
and
Rank Site	Category  Unique Visitors (users)Page Views	Page Views/Visitors
1 facebook.com	Social Networks	880000000	1000000000000	1,136
2 youtube.com	Online Video	800000000	100000000000	125
3 yahoo.com	Web Portals	590000000	77000000000	131
4 live.com	Search Engines	490000000	84000000000	171
5 msn.com	Web Portals	440000000	20000000000	45
6 wikipedia.org	Dict    	410000000	6000000000	15
7 blogspot.com	Blogging	340000000	4900000000	14
8 baidu.com	Search Engines	300000000	110000000000	367
9 microsoft.com	Software	250000000	2500000000	10
10	 qq.com	Web Portals	250000000	39000000000	156

see complete list at http://www.google.com/adplanner/static/top1000/index.html [Read more...]

Interview Dan Steinberg Founder Salford Systems

Here is an interview with Dan Steinberg, Founder and President of Salford Systems (http://www.salford-systems.com/ )

Ajay- Describe your journey from academia to technology entrepreneurship. What are the key milestones or turning points that you remember.

 Dan- When I was in graduate school studying econometrics at Harvard,  a number of distinguished professors at Harvard (and MIT) were actively involved in substantial real world activities.  Professors that I interacted with, or studied with, or whose software I used became involved in the creation of such companies as Sun Microsystems, Data Resources, Inc. or were heavily involved in business consulting through their own companies or other influential consultants.  Some not involved in private sector consulting took on substantial roles in government such as membership on the President’s Council of Economic Advisors. The atmosphere was one that encouraged free movement between academia and the private sector so the idea of forming a consulting and software company was quite natural and did not seem in any way inconsistent with being devoted to the advancement of science.

 Ajay- What are the latest products by Salford Systems? Any future product plans or modification to work on Big Data analytics, mobile computing and cloud computing.

 Dan- Our central set of data mining technologies are CART, MARS, TreeNet, RandomForests, and PRIM, and we have always maintained feature rich logistic regression and linear regression modules. In our latest release scheduled for January 2012 we will be including a new data mining approach to linear and logistic regression allowing for the rapid processing of massive numbers of predictors (e.g., one million columns), with powerful predictor selection and coefficient shrinkage. The new methods allow not only classic techniques such as ridge and lasso regression, but also sub-lasso model sizes. Clear tradeoff diagrams between model complexity (number of predictors) and predictive accuracy allow the modeler to select an ideal balance suitable for their requirements.

The new version of our data mining suite, Salford Predictive Modeler (SPM), also includes two important extensions to the boosted tree technology at the heart of TreeNet.  The first, Importance Sampled learning Ensembles (ISLE), is used for the compression of TreeNet tree ensembles. Starting with, say, a 1,000 tree ensemble, the ISLE compression might well reduce this down to 200 reweighted trees. Such compression will be valuable when models need to be executed in real time. The compression rate is always under the modeler’s control, meaning that if a deployed model may only contain, say, 30 trees, then the compression will deliver an optimal 30-tree weighted ensemble. Needless to say, compression of tree ensembles should be expected to be lossy and how much accuracy is lost when extreme compression is desired will vary from case to case. Prior to ISLE, practitioners have simply truncated the ensemble to the maximum allowable size.  The new methodology will substantially outperform truncation.

The second major advance is RULEFIT, a rule extraction engine that starts with a TreeNet model and decomposes it into the most interesting and predictive rules. RULEFIT is also a tree ensemble post-processor and offers the possibility of improving on the original TreeNet predictive performance. One can think of the rule extraction as an alternative way to explain and interpret an otherwise complex multi-tree model. The rules extracted are similar conceptually to the terminal nodes of a CART tree but the various rules will not refer to mutually exclusive regions of the data.

 Ajay- You have led teams that have won multiple data mining competitions. What are some of your favorite techniques or approaches to a data mining problem.

 Dan- We only enter competitions involving problems for which our technology is suitable, generally, classification and regression. In these areas, we are  partial to TreeNet because it is such a capable and robust learning machine. However, we always find great value in analyzing many aspects of a data set with CART, especially when we require a compact and easy to understand story about the data. CART is exceptionally well suited to the discovery of errors in data, often revealing errors created by the competition organizers themselves. More than once, our reports of data problems have been responsible for the competition organizer’s decision to issue a corrected version of the data and we have been the only group to discover the problem.

In general, tackling a data mining competition is no different than tackling any analytical challenge. You must start with a solid conceptual grasp of the problem and the actual objectives, and the nature and limitations of the data. Following that comes feature extraction, the selection of a modeling strategy (or strategies), and then extensive experimentation to learn what works best.

 Ajay- I know you have created your own software. But are there other software that you use or liked to use?

 Dan- For analytics we frequently test open source software to make sure that our tools will in fact deliver the superior performance we advertise. In general, if a problem clearly requires technology other than that offered by Salford, we advise clients to seek other consultants expert in that other technology.

 Ajay- Your software is installed at 3500 sites including 400 universities as per http://www.salford-systems.com/company/aboutus/index.html What is the key to managing and keeping so many customers happy?

 Dan- First, we have taken great pains to make our software reliable and we make every effort  to avoid problems related to bugs.  Our testing procedures are extensive and we have experts dedicated to stress-testing software . Second, our interface is designed to be natural, intuitive, and easy to use, so the challenges to the new user are minimized. Also, clear documentation, help files, and training videos round out how we allow the user to look after themselves. Should a client need to contact us we try to achieve 24-hour turn around on tech support issues and monitor all tech support activity to ensure timeliness, accuracy, and helpfulness of our responses. WebEx/GotoMeeting and other internet based contact permit real time interaction.

 Ajay- What do you do to relax and unwind?

 Dan- I am in the gym almost every day combining weight and cardio training. No matter how tired I am before the workout I always come out energized so locating a good gym during my extensive travels is a must. I am also actively learning Portuguese so I look to watch a Brazilian TV show or Portuguese dubbed movie when I have time; I almost never watch any form of video unless it is available in Portuguese.

 Biography-

http://www.salford-systems.com/blog/dan-steinberg.html

Dan Steinberg, President and Founder of Salford Systems, is a well-respected member of the statistics and econometrics communities. In 1992, he developed the first PC-based implementation of the original CART procedure, working in concert with Leo Breiman, Richard Olshen, Charles Stone and Jerome Friedman. In addition, he has provided consulting services on a number of biomedical and market research projects, which have sparked further innovations in the CART program and methodology.

Dr. Steinberg received his Ph.D. in Economics from Harvard University, and has given full day presentations on data mining for the American Marketing Association, the Direct Marketing Association and the American Statistical Association. After earning a PhD in Econometrics at Harvard Steinberg began his professional career as a Member of the Technical Staff at Bell Labs, Murray Hill, and then as Assistant Professor of Economics at the University of California, San Diego. A book he co-authored on Classification and Regression Trees was awarded the 1999 Nikkei Quality Control Literature Prize in Japan for excellence in statistical literature promoting the improvement of industrial quality control and management.

His consulting experience at Salford Systems has included complex modeling projects for major banks worldwide, including Citibank, Chase, American Express, Credit Suisse, and has included projects in Europe, Australia, New Zealand, Malaysia, Korea, Japan and Brazil. Steinberg led the teams that won first place awards in the KDDCup 2000, and the 2002 Duke/TeraData Churn modeling competition, and the teams that won awards in the PAKDD competitions of 2006 and 2007. He has published papers in economics, econometrics, computer science journals, and contributes actively to the ongoing research and development at Salford.