Interview with Mark Maunder, Co-Founder of WorkZoo
WorkZoo was founded by Mark Maunder and Kerry Boyte who are husband and wife. I was very impressed by the job search technology they have put together and am excited that Mark had enough time to do this interview with me (Nathan Enns) for The Search Industry Blog. For permission to copy this interview please contact FyberSearch.
Nathan Enns: What motivated you and Kerry decide to start your own job search engine and what is your background in the search industry?
Mark Maunder: Kerry and I both have backgrounds in software consulting. While we were both consulting in London we would keep an eye on most of the job boards and it became time consuming. So in our spare time we set up a meta-search that searched the top 10 UK job boards and combined the results. The site back then was very basic, but we built up a small user base. In 2003 we moved to California and started looking for business ideas. The USA has more than 100 times the number of job boards in the UK and WorkZoo seemed like an obvious fit. So in 2004 I started full-time work on a US version and changed WorkZoo from a meta-search to a full-blown search engine with it's own crawler and index.
Nathan Enns: When was WorkZoo officially launched?
Mark Maunder: The original UK version of WorkZoo was launched in late 2001. The US version has been around as a meta-search since early 2004 and we officially launched the new search engine with its own index on 24 February this year.
Nathan Enns: You recently announced that WorkZoo is able to index about 45,000 new job listings a day. Without revealing any secrets, can you tell us about how it is able to process that much data every day?
Mark Maunder: Just to clarify, that is the number of jobs we are currently indexing per day. We can handle 5 times that with our existing server capacity.
The first stage is our crawler that actually fetches each job from the job boards. The crawler is called ZooBot and is multi-threaded so it fetches multiple jobs simultaneously without overloading any one web site. We respect the robots.txt exclusion standard on all the sites we crawl and provide detailed instructions on our web site for webmasters to limit the amount of data we display for their site or to limit the rate at which we crawl them.
As each job is fetched, we do some initial processing on the HTML and we compress the HTML and store that and a plain-text version of the page ready for the indexer. The compressed HTML is what we used to provide cached versions of each job.
The next stage is an indexer that crunches through the job data created by the crawler and creates the final index we use for searching. The indexer determines the geographic location (longitude and latitude) of each job and indexes all the words in the job title and body.
We continually check job boards for new jobs throughout the day and have made our crawler intelligent enough so that it only fetches a small number of pages if there are no new jobs. In other words we don't crawl an entire job site just to find that there is only one new job.
All the above runs on a single server. Every hour we copy the latest index to our search server (or servers if we are clustering) so that our users get the latest jobs with the minimum of lag time.
Nathan Enns: How many computers do you have setup to power WorkZoo? Can you tell us anything about the computers such as processor speed and the amount of RAM?
Mark Maunder: We have one dedicated crawl server and another dedicated search server. We had a third server generating the map that you see in the search results, but then I rewrote that code and made it a lot faster. So now the search server generates the map and search results. So the live site right now runs on 2 machines.
I designed WorkZoo to be clustered so that as the load increases we simply add another server to the search cluster and include it in our data replication cycle.
The search server is a Dual Athlon MP 2600 with 2 Gigs of RAM and two 80GB harddrives. The server runs RedHat ES3 Linux. I've grown to prefer Fedora and will be using that in future, mostly because if I need some exotic RPM, ES usually doesn't have it and Fedora does.
I keep some large data structures in memory to speed searching up, and also rely on the filesystem cache which is why we have that extra RAM. The servers load average sits at around 1.5 during peak hours, so we're still doing OK for now and it's really easy to add another server if we need more power.
The crawl server is a single AMD Athlon XP 3000+ with 1 Gig of RAM. It runs Fedora2. During peak job posting times the load will spike up to 3.5 to 4, but then it drops off when we're not crawling or indexing. And because it's performance isn't as time critical as the search server, the hardware we have right now does a great job.
Nathan Enns: Can you share any index statistics? How many total jobs do you have cataloged? How much disk space does it take to store the jobs you have indexed?
Mark Maunder: We have 1,430,414 jobs in our index as I'm writing this. We limit our index to 7 days so at any one time our users can search about a quarter of a million jobs in our index. The advantage of limiting our index to 7 days is that we minimize the number of dead links as job sites remove jobs or they expire - and most of our users visit us daily anyway. Our job index including compressed HTML is just under 10 Gigabytes.
Nathan Enns: Do you have any plans to hire or partner with anyone to assist with the development of WorkZoo?
Mark Maunder: I do all the development on WorkZoo and I handle the workload OK. Hey I even find time to do the odd interview. ;)
I have hired a few consultants to help out in various areas of the business. Our traffic has shown a nice increase lately and I expect to do some hiring in the next few months.
Nathan Enns: How is WorkZoo funded? Are you accepting investments, did you take out a loan or did you finance it all yourself?
Mark Maunder: Kerry and I have financed the company out of our savings. We have been approached by VC's since our latest launch but have not yet found a fit that we like. If we find an investor that shares our vision then we may consider outside funding.
Nathan Enns: Besides the job search engine, what other resources and tools can be found at WorkZoo?
Mark Maunder: We release research on our WorkZoology page at: http://www.workzoo.com/e/zoology
We have an animated map up there showing the changes in job distribution of the the last 28 days across the USA. And a few top 10 lists.
We also provide tools for webmasters who want to give their users the latest jobs in a geographic area or in their field of interest. These are at: http://www.workzoo.com/e/jobjar
WorkZoo also provides RSS feeds. There is an orange RSS link at the top right of any search results that you can subscribe to. We have built up a sizable user base of RSS users lately and may have to set up a separate server for them in the near future.
Nathan Enns: You have a wide variety of advanced features on WorkZoo which I consider a very good thing in any search engine. Do you have any new features in the works?
Mark Maunder: We have two exciting projects we're working on right now that will be launched in the coming weeks. Unfortunately I can't give any more information than that, but keep an eye our our site for details.
Nathan Enns: Please tell us what you think about the future of the job search industry and what role WorkZoo will play.
Mark Maunder: The next 18 months is going to be very exciting for search in general as you see vertical search engines like WorkZoo make their appearance. Google has shown that search is much more important than everyone thought. It is in fact the only sane way to navigate the web.
Most job boards don't realize that a large part of their business is providing a search engine to their users. When you visit a job board you are 'searching' for a job. Anything that gets between you and that job you are searching for is a distraction.
Once you've found the right job then the job sites can add value by connecting the prospective employee with the employer in an efficient and confidential way - and of course they add value with the various tools they provide. But until then it's purely a search function any job seeker is looking for.
WorkZoo provides that search function. We are your starting point when you are looking for a job and we take you directly to the best job based on your search criteria with the minimum of page views and distractions.