July 3, 2009 – DECISION STATS

Interview John Moore CTO, Swimfish

Here is an interview with John F Moore, VP Engineering and Chief Technology Officer, Swimfish a provider of business solutions and CRM. A well known figure in Technology and CRM circles, John talks of Social CRM, Technology Offshoring, Community Initiatives and his own career.

Too many CRM systems are not usable. They are built by engineers that think of the system as a large database and the systems often look like a database making it difficult to use by the sales, support, and marketing people.

-John F Moore

Ajay – Describe your career journey from college to CTO. What changes in mindset did you undergo along the journey? What advice would you give to young students to take up science careers ?

John- First, I wanted to take time to thank you for the interview offer. I graduated from Boston University in 1988 with a degree in Electrical Engineering. At the time of my graduation I found myself to be very interested in the advanced taking place on the personal computing front by companies like Lotus with their 1-2-3 product. I knew that I wanted to be involved with these efforts and landed my first job in the software space as a Software Quality Engineer working on 1-2-3 for DOS.

I spent the first few years of my career working at Lotus as a developer, a quality engineer, and manager, on products such as Lotus 1-2-3 and Lotus Notes. Throughout those early career years I learned a lot and focused on taking as many classes as possible.

From Lotus I sought out the start-up environment and by early 2000 and joined a startup named Brainshark (http://www.brainshark.com). Brainshark was, and is, focused on delivering an asynchronous communication platform on the web and was one of the early providers of SAAS. In my seven years at Brainshark I learned a lot about delivering an Enterprise class SAAS solution on top of the Microsoft technology stack. The requirements to pass security audits for Fortune 500 companies, the need to match the performance of in-house solutions, resulted in all of us learning a great deal. These were very fun times.

I now work as the VP of Engineering and CTO at Swimfish, a services and software provider of business solutions. We focus on the financial marketplace where we have the founder has a very deep background, but also work within other verticals as well. Our products are focused on the CRM, document management, and mobile product space and are built on the Microsoft technology stack. Our customers leverage both our SAAS and on-premise solutions which require us to build our products to be more flexible than is generally required for a SAAS-only solution.

The exciting thing for me is the sheer amount of opportunities I see available for science/engineering students graduating in the near future. To be prepared for these opportunities, however, it will be important to not just be technically savvy.

Engineering students should also be looking at:

* Business classes. If you want to build cool products they must deliver business value.

* Writing and speaking classes. You must be able to articulate your ideas or no one will be willing to invest in them.

I would also encourage people to take chances, get in over your head as often as possible.You may fail, you may succeed. Either way you will gain experiences that make it all worthwhile.

Ajay- How do you think social media can help with CRM. What are the basic do’s and don’ts for social media CRM in your opinion?

John- You touch upon a subject that I am very passionate about. When I think of Social CRM I think about a system of processes and products that enable businesses to actively engage with customers in a manner that delivers maximum value to all. Customers should be able to find answers to their questions with minimal friction or effort; companies should find the right customers for their products.

Social CRM should deliver on some of these fronts:

* Analyze the web of relationships that exists to define optimal pathways. These pathways will define relationships that businesses can leverage for finding their customers. These pathways will enable customers to quickly find answers to their questions. For example, I needed an answer to a question about SharePoint and project management. I asked the question on Twitter and within 3 minutes had answers from two different people. Not only did I get the answer I needed but I made two new friends who I still talk to today.

* Monitor conversations to gauge brand awareness, identify customers having problems or asking questions. This monitoring should not be stalking; however, it should be used to provide quick responses to customers to benefit the greater community.

* Usability. Too many CRM systems are not usable. They are built by engineers that think of the system as a large database and the systems often look like a database making it difficult to use by the sales, support, and marketing people.

Finally, when I think of social media I think of these properties:

* Social is about relationship building.

* You should always add more value to the community than you take in return.

* Be transparent and honest. People can tell when you’re not.

Ajay- You are involved in some noble causes – like using blog space for out of work techies and separately for Alzheimer’s disease. How important do you think is for people especially younger people to be dedicated to community causes?

John- My mother-in-law was diagnosed with Alzheimer’s disease at the age 57. My wife and I moved into their two-family house to help her through the final years of her life. It is a horrible disease and one that it is easy to be passionate about if you have seen it in action.

My motivation on the job front is very similar. I have seen too many people suffer through these poor economic times and I simply want to do what I can to help people get back to work.

It probably sounds corny, but I firmly believe that we must all do what we can for each other. Business is competitive, but it does not mean that we cannot, or should not, help each other out. I think it’s important for everyone to have causes they believe in. You have to find your passions in life and follow them. Be a whole person and help change the world for the better.

Ajay- Describe your daily challenges as head of Engineering of Swimfish, Inc How important is it for the tech team to be integrated with the business and understand it as well.

John- The engineering team at Swimfish works very closely with the business teams. It is important for the team to understand the challenges our customers are encountering and to build products that help the customer succeed. I am not satisfied with the lack of success that many companies encounter when deploying a CRM solution.

We go as deep as possible to understand the business, the processes currently in use, the disparate systems being utilized, and then the underlying technologies currently in use. Only then do we focus on the solutions and deliver the right solution for that company.

On the product front it is the same. We work closely with customers on the features we are planning to add, trying to ensure that the solutions meet their needs as well as the needs of the other customers in the market that we are hoping to serve.

I do expect my engineers to be great at their core job, that goes without question. However, if they cannot understand the business needs they will not work for me very long.My weeks at Swimfish always provide me with interesting challenges and opportunities.

My typical day involves:

* Checking in with our support team to understand if there are any major issues being encountered by any of our customers.

* Challenging the support team to hit their targets. I love sales as without them I cannot deliver products.

* Checking in with my developers and test teams to determine how each of our projects is doing. We have a daily standup as well, but I try and personally check-in with as many people as possible.

* Most days I spend some time developing, mostly in C#. My current focus area is on our next release of our Milestone Tracking Matrix where I have made major revisions to our user interface.

I also spend time interacting on various social platforms, such as Twitter, as it is critical for me to understand the challenges that people are encountering in their businesses, to keep up with the rapid pace of technology, and just to check-in with friends. Keep it real.

Ajay- What are your views on off shoring work especially science jobs which ultimately made science careers less attractive in the US- at the same time outsourcing companies ( in India) generally pay only 1/3 rd of billing fees to salaries. Do you think concepts like ODesk can help change the paradigm of tech out-sourcing.

John- I have mixed opinions on off-shoring. You should not offshore because of perceived cost savings only. On net you will generally break even, you will not save as much as you might originally think.

I am, however, close to starting a relationship with a good development provider in Costa Rica. The reason for this relationship is not cost based, it is knowledge based. This company has a lot of experience with the primary CRM system that we sell to customers and I have not been successful in finding this experience locally. I will save a lot of money in upfront training on this skill-set; they have done a lot of work in this area already (and have great references). There is real value to our business, and theirs.

Note that Swimfish is already working with a geographically dispersed team as part of the engineering team is in California and part is in Massachusetts. This arrangement has already helped us to better prepare for an offshore relationship and I know we will be successful when we begin.

Ajay- What does John Moore do to have fun when he is not in front of his computer or with a cause.

John- As the father of two teenage daughters I spend a lot of time going to soccer, basketball, and softball games. I also enjoy spending time running, having completed a couple of marathons, and relaxing with a good book. My next challenge will be skydiving as my 17 year old daughter and I are going skydiving when she turns 18.

Brief Bio:

For the last decade I have worked as a senior engineering manager for SAAS applications built upon the Microsoft technology stack. I have established the processes, and hired the teams that delivered hundreds of updates ranging from weekly patches to longer running full feature releases. My background as a hands-on developer combined with my strong QA background has enabled me to deliver high quality software on-time.

You can learn more about me, and my opinions, by reading my blog at http://johnfmoore.wordpress.com/ or joining me on Twitter at http://twitter.com/JohnFMoore

R and Hadoop

Here is an exciting project for using R on the cloud computing environment ( two of my favorite things). It is called RHIPE

R and Hadoop Integrated Processing Environment v.0.38

cloud

The website source is http://ml.stat.purdue.edu/rhipe/

RHIPE(phonetic spelling: hree-pay’ ¹) is a java package that integrates the R environment with Hadoop, the open source implementation of Google’s mapreduce. Using RHIPE it is possible to code map-reduce algorithms in R e.g

m <- function(key,val){
  words <- strsplit(val," +")[[1]]
  wc <- table(words)
  cln <- names(wc)
  names(wc)<-NULL; names(cln)<-NULL;
  return(sapply(1:length(wc),function(r) list(key=cln[r],value=wc[[r]]),simplify=F))
}
r <- function(key,value){
  value <- do.call("rbind",value)
  return(list(list(key=key,value=sum(value))))
}
rhmr(map=m,reduce=r,combiner=T,input.folder="X",output.folder="Y")

rhapply packages the user's request into an R vector object. This is serialized and sent to the RHIPE server. The RHIPE server picks apart the object creating a job request that Hadoop can understand. Each element of the provided list is processed by the users function during the Map stage of mapreduce. The results are returned and if the output is to a file, these results are serialized and written to a Hadoop Sequence file, the values can be read back into R using the rhsq* functions.

2 rhlapply

rhlapply <- function( list.object,
                    func,
                    configure=expression(),
                    output.folder='',
                    shared.files=c(),
                    hadoop.mapreduce=list(),
                    verbose=T,
                    takeAll=T)

list.object
 This can either be a list or a single scalar. In case of the former, the function given by func will be applied to each element of list.object. In case of a scalar, the function will be applied to the list 1:n where n is the value of the scalar 
func
 A function that takes one parameter: an element of the list. 
configure
 An configuration expression to run before the func is executed. Executed once for every JVM. If you need variables, data frames, use rhsave or rhsave.image , use rhput to copy it to the DFS and then use shared.files
config = expression({
              library(lattice)
              load("mydataset.Rdata")
})



output.folder
 Any file that is created by the function is stored in the output.folder. This is deleted first. If not given, the files created will not be copied.  For side effect files to be copies create them in tmp e.g pdf("tmp/x.pdf"), note no leading slash.The directory will contain a slew of part* files, as many as there maps. These contain the binary key-value pairs.

shared.files
 The function or the preload expression might require the presence resource files e.g *.Rdata files. The user could copy it from the HDFS in the R code or just load it from the local work directory were the files present. This is the role of shared.files. It is a vector of paths to files on the HDFS, each of these will be copied to the work directory where the R code is run. e.g c('/tmp/x.Rdata','/foo.tgz'), then the first file can be loaded via load("x.Rdata") . For those familiar with Hadoop terminology, this is implemented via DistributedCache . 
hadoop.mapreduce
 a list of Hadoop specific options e.g
list(mapreduce.map.tasks=10,mapreduce.reduce.tasks=3)

takeAll
 if takeALL is true, the value returned is a list each entry the return value of the the function, not in order so element 1 of the returned list is not the result of  func(list.object=1=) . 
verbose
 If True, the user will see the job progress in the R console. If False, the web url to the jobtracker will be displayed. Cancelling the command with CTRL-C will not cancel the job, use rhkill for that. 




Mapreduce in R.
rhmr <- function(map,reduce,input.folder,configure=list(map=expression(),reduce=expression()),
                close=list(map=expression(),reduce=expression())
                 output.folder='',combiner=F,step=F,
                 shared.files=c(),inputformat="TextInputFormat",
                 outputformat="TextOutputFormat",
                 hadoop.mapreduce=list(),verbose=T,libjars=c())
Execute map reduce algorithms from within R. A discussion of the parameters follow.

input.folder
 A folder on the DFS containing the files to process. Can be a vector. 
output.folder
 A folder on the DFS where output will go to. 
inputformat
 Either of TextInputFormat or SequenceFileInputFormat. Use the former for text files and the latter for sequence files created from within R or as outputs from RHIPE(e.g rhlapply or rhmr). Note, one can't use any sequence file, they must have been created via a RHIPE function. Custom Input formats are also possible. Download the source and look at code/java/RXTextInputFormat.java 
outputformat
 Either of TextOutputFormat or SequenceFileOutputFormat. In case of the former, the return value from the mapper or reducer is converted to character and written to disk. The following code is used to convert to character.
paste(key,sep='',collapse=field_separator)
Custom output formats are also possible. Download the source and look at code/java/RXTextOutputFormat.java
If custom formats implement their own writables, it must subclass RXWritable or use one of the writables presents in RHIPE

shared.files
 same as in rhlapply, see that for documentation. 
verbose
 If T, the job progress is displayed. If false, then the job URL is displayed. 

At any time in the configure, close, map or reduce function/expression, the variable mapred.task.is.map will be equal to "true" if it is map task,"false" otherwise (both strings) Also, mapred.iswhat is mapper, reducer, combiner in their respective environments.

configure
 A list with either one element (an expression) or two elements map and reduce both of which must be expressions. These expressions are called in their respective environments, i.e the map expression is called during the map configure and similarly for the  reduce expression. The reduce expression is called for the combiner configure method.If only one list element, the expression is used for both the map and reduce

close
 Same as configure . 
map
 a function that takes two values key and value. Should return a list of lists. Each list entry must contain two elements key and value , e.g
...
ret <- list()
ret[[1]] <-  list(key=c(1,2), value=c('x','b'))
return(ret)
If any of key/value are missing the output is not collected, e.g. return NULL to skip this record. If the input format is a TextInputFormat, the value is the entire line and the key is probably useless to the user( it is a number indicating bytes into the file). If the input format is SequenceFileInputFormat, the key and value are taken from the sequence file.

reduce
 Not needed if mapred.reduce.tasks is 0. Takes a key and a list of values( all values emitted from the maps that share the same map output key ). If step is True, then not a list. Must return a list of lists each element of which must have two elements key and value.     This collects all the values and sends them to function. If NULL is returned or the return value is not conforming to the above nothing is collected the Hadoop collector. 
step
 If step is TRUE, then the reduce function is called for every value corresponding to a key that is once for every value.

 The variable red.status is equal to 1 on the first call.
 red.status is equal to 0 for every subsequent calls including the last value
 The reducer function is called one last time with red.status equal to -1. The value is NULL.Anything returned at any of these stages is written to disk The close function is called once every value for a given key has been processed, but returning anything  has no effect.  To a assign to the global environment use  the <<- operator.


combiner
 T or F, to use the reducer as a combiner. Using a combiner makes computation more efficient. If combiner is true, the reduce function will be called as a combiner (0 or more times, it may never be called during the combine stage even if combiner is T) .The value of mapred.task.is.map is 'true' or '*'false*' (both strings)  if the combiner is being executed as part of the map stage or reduce stage respectively.
Whether knowledge of this is useful or not is something I'm not sure of. However, if combiner is T , keep in mind,your reduce function must be able to handle inputs sent from the map or inputs sent from the reduce function(itself).

libjars
 If specifying a custom input/output format, the user might need to specify jar files here. 
hadoop.mapreduce
 set RHIPE and Hadoop options via this. 


1.1 RHIPE Options for mapreduce

 





Option
Default
Explanation




rhipejob.rport
8888
The port on which Rserve runs, should be same across all machines


rhipejob.charsxp.short
0
If 1,  RHIPE optimize serialization for character vectors. This reduces the length of the serialization


rhipejob.getmapbatches
100
If the reduce/mapper emits several key,values, how many to get from Rserve at a time. A higher number reduce the number of network reads(the network reads are to localhost)


rhipejob.outfmt.is.text
1 if TextInputFormat
Must be 1 if the output is textual


rhipejob.textoutput.fieldsep
' '
The field separator for any text based output format


rhipejob.textinput.comment
'#'
In the TextInputFormat, lines beginning with this are skipped


rhipejob.combinerspill
100,000
The combiner is run after collecting at most this many items


rhipejob.tor.batch
200,000
Number of values for the same key to collate before sending to the Reducer, if you have dollops of memory, set this larger. However, too large and you hit Java's heap space limit


rhipejob.max.count.reduce
Java's INT_MAX (about 2BN)
the total number of values for a given key to be collected, note the values are not ordered by any variable.


rhipejob.inputformat.keyclass
The default is chosen depending on TextInputFormat or SequenceFileInputFormat
Provide the full Java URL to the keyclass e.g org.saptarshiguha.rhipe.hadoop.RXWritableText, when using a Custom InputFormat  implement RXWritable     and implement the methods


rhipejob.inputformat.valueclass
The default is chosen depending on TextInputFormat or SequenceFileInputFormat
Provide the full Java URL to the valueclass e.g org.saptarshiguha.rhipe.hadoop.RXWritableText when using a Custom InputFormat  implement RXWritable     and implement the methods


mapred.input.format.class
As above, the default is either org.saptarshiguha.rhipe.hadoop.RXTextInputFormat or org.apache.hadoop.mapred.SequenceFileInputFormat
specify yours here


rhipejob.outputformat.keyclass
The default is chosen depending on TextInputFormat or SequenceFileInputFormat
Provide the full Java URL to the keyclass e.g org.saptarshiguha.rhipe.hadoop.RXWritableText , also the keyclass must implement RXWritable and


rhipejob.outputformat.valueclass
The default is chosen depending on TextInputFormat or SequenceFileInputFormat
Provide the full Java URL to the value e.g org.saptarshiguha.rhipe.hadoop.RXWritableText , also the valueclass must implement RXWritable


mapred.output.format.class
As above, the default is either org.saptarshiguha.rhipe.hadoop.RXTextOutputFormat or org.apache.hadoop.mapred.SequenceFileInputFormat
specify yours here, provide libjars if required




Citation:http://ml.stat.purdue.edu/rhipe/index.html
Great exciting news for the world of computing remotely.

Option	Default	Explanation
rhipejob.rport	8888	The port on which Rserve runs, should be same across all machines
rhipejob.charsxp.short	0	If 1, RHIPE optimize serialization for character vectors. This reduces the length of the serialization
rhipejob.getmapbatches	100	If the reduce/mapper emits several key,values, how many to get from Rserve at a time. A higher number reduce the number of network reads(the network reads are to localhost)
rhipejob.outfmt.is.text	1 if TextInputFormat	Must be 1 if the output is textual
rhipejob.textoutput.fieldsep	' '	The field separator for any text based output format
rhipejob.textinput.comment	'#'	In the TextInputFormat, lines beginning with this are skipped
rhipejob.combinerspill	100,000	The combiner is run after collecting at most this many items
rhipejob.tor.batch	200,000	Number of values for the same key to collate before sending to the Reducer, if you have dollops of memory, set this larger. However, too large and you hit Java's heap space limit
rhipejob.max.count.reduce	Java's INT_MAX (about 2BN)	the total number of values for a given key to be collected, note the values are not ordered by any variable.
rhipejob.inputformat.keyclass	The default is chosen depending on TextInputFormat or SequenceFileInputFormat	Provide the full Java URL to the keyclass e.g `org.saptarshiguha.rhipe.hadoop.RXWritableText`, when using a Custom InputFormat implement RXWritable and implement the methods
rhipejob.inputformat.valueclass	The default is chosen depending on TextInputFormat or SequenceFileInputFormat	Provide the full Java URL to the valueclass e.g `org.saptarshiguha.rhipe.hadoop.RXWritableText` when using a Custom InputFormat implement RXWritable and implement the methods
mapred.input.format.class	As above, the default is either `org.saptarshiguha.rhipe.hadoop.RXTextInputFormat` or `org.apache.hadoop.mapred.SequenceFileInputFormat`	specify yours here
rhipejob.outputformat.keyclass	The default is chosen depending on TextInputFormat or SequenceFileInputFormat	Provide the full Java URL to the keyclass e.g `org.saptarshiguha.rhipe.hadoop.RXWritableText` , also the keyclass must implement `RXWritable` and
rhipejob.outputformat.valueclass	The default is chosen depending on TextInputFormat or SequenceFileInputFormat	Provide the full Java URL to the value e.g `org.saptarshiguha.rhipe.hadoop.RXWritableText` , also the valueclass must implement `RXWritable`
mapred.output.format.class	As above, the default is either `org.saptarshiguha.rhipe.hadoop.RXTextOutputFormat` or `org.apache.hadoop.mapred.SequenceFileInputFormat`	specify yours here, provide libjars if required

Too many CRM systems are not usable. They are built by engineers that think of the system as a large database and the systems often look like a database making it difficult to use by the sales, support, and marketing people.

-John F Moore

Please share:

2 rhlapply

1.1 RHIPE Options for mapreduce

Citation:http://ml.stat.purdue.edu/rhipe/index.html

Great exciting news for the world of computing remotely.

Please share: