SlideShare a Scribd company logo
Building a low cost virtualized computing grid for scaleBy Bradley D. Brown, TUSCOracle Certified Advantage PartnerThe Oracle Experts
AbstractHave you considered using Postgres, MySQL or Oracle Express (the free Oracle DB version) to reduce costs?  Have MySQL’s index optimization (or lack thereof) or Oracle’s 4GB database size limit caused issues for you?  Are you considering creating a low cost license cost-free grid?  Do you have your "master" database in a costly DB license (i.e. Oracle, DB2 etc.)?  In this presentation, you'll learn how IntelliReal stores their "master" database in an Oracle Standard Edition One database instance and uses a MySQL grid to provide a low-cost scalable solution to meet their needs.  The Oracle database is about 4TB.  The grid effectively stores a materialized view of the key data from the Oracle database, which is roughly 400Gb of data, into a series of physically partitioned MySQL databases.  The data is US residential property data.  Data is partitioned by US zip codes - about 10,000 databases (or schemas in MySQL).  The grid architecture will be discussed in detail.
Bradley D. Brownhttp://bradleydbrown.blogspot.comTUSCFounder, Author, Chief Technology OfficerUniversity of DenverAdjunct  ProfessorGraduate Class – New Venture CreationClientsIntelliReal, OAB, EventConnex, Jepp, Sun, …GroupsYPO, OOW, IOUG, ODTUG, LAOUG, RMOUG, etc.Oracle Fusion Middleware Director/AceIOUC Fusion Council
AgendaStartup 101N-tier grid architectureWhy build a grid?Scaling considerationsThen and NowLessons LearnedMoving to ProductionCompetitive BenefitCloud computing and other reference material
Startup 101 for Startup #10Failure 101Purchased software of failed entityDenver MSA only, MySQL and Delphi, RussiaFocused on small market (appraisers)Goal / VisionBest valuation and real estate intelligence in the USAMUST be nationwidePrototype / POC $20kAll Oracle and ApExDenver MSA onlyPeople loved what we were about to deliver to the marketB2B is key, $ while sleepPrimarily “fixed” costs, low variable costsConserve cash, retain equity, build a scalable platformBeyond the Prototype into BetaNext stages - $1M+Procure dataDifferentiator - MLSFull developmentOracle SE OneNumerous MySQL databases for scaling services, etc.Purchased 25 servers, 30TB of storage for about $50KLight weight SOAWeb ServicesBusiness to BusinessUsed iPerspective to expose data and business logic as Web ServicesWrite it onceUse it everywhereCustomersUIInternally
The Concept and Reference Arch.Building an N-Tier Grid ArchitectureApproach and architecture for UI layer and for B2BRobust, scalable computing environmentLimitless scaling and redundancy in this architecture
Grid Architecture
Why Build a Grid?VirtualizationCall one serviceLoose couplingScaleRedundancyFailoverBackup, slower, lower priority data accessCostOpen Source
Grid and Scaling ConsiderationsOracle SE One$5000/socket2 sockets maxNo DB size limits4TB database100M rows“Wide” rows, many tables400Gb de-normalizedNo partitioningNo clustering (RAC)Expensive to move to EEOracle ExpressFreeVery LimitingLimited to 1 processor4 GB of DB space1 GB of memoryMySQLFreeNo limitsOracle version 5Index optimization?Physical partitioningPartitioning better now
The Grid – then and nowPhysical Partitioning800 partitions initially10,000+ partitions todayCould be different drives or different boxesPartitioner moves the data from Oracle to MySQLMySQL replication moves it to all 10 boxes10 small Dell 850s - $2k, 4Gb, 1TbInitially partitioned 5 ways by County (800 counties) with 2 deep for failover and performanceLater 1 way, 10 deep and by zip codesQueries get directed based on partition and prioritized round robin technique through boxesTiers (best data - Oracle, older - MySQL, etc)Prioritized per boxUpward scalability
Lessons Learned – “The Ugly”Initially boxes were crashing regularlyTook time to rebuildVirtualization (VMs) solves this to a large degreeCan be a lot to support on your own if you don’t purchase software to manageExtra processing was requiredConstantly reading the master database with 10 Partitioner threadsExtra “master” processingNetwork bandwidthReplication chews up network bandwidth tooRecommend Open Source ETL approaches such as Pentaho
From Concept Into ProductionTook $1M more…Gave up some equityFocused on coreBetter values, dataMore servicesBenchmarksMLS dataQuick changes based on customer demandsExtensible API for customers
Blowing Away The CompetitionBig companiesSlower to move to new technologySlower to integrate new data sourcesThey have a flywheel$20k today is valued at $30M – 2 more years,  $100MWhy?Quick response to market needsFocused on core, not contextDisplaced the marketiPerspectiveWrite in language you know, takes years to master a language
Cloud Computing and VirtualizationAmazon EC2Powerful GridOracle CloudUses EC2 and S3SaaSDeploy and scale your app in hours without changing codeProvision, monitor and manage operations with just a browserScale from a fraction of a server to hundreds of CPUs in daysGet your life back -- no more late night rushes to replace failed equipment
Periscope Virtualizes Google’s Web ServiceSELECT *FROM   googleWHERE  searchstring = ‘RMOUG Brad Brown’RMOUG Brad Brown – 8 recordsRMOUG – 374 records
Good Papers, Presentations, …Step-by-step Web  Service CreationWeb 2.0 & Apex PresentationsWatch my Bloghttp://bradleydbrown.blogspot.comIntelliReal Case StudiesJava-based Oracle Web Development
Java Server Pages
JavaMail
Java for the PL/SQL Developer

More Related Content

What's hot (20)

PPT
SQL/NoSQL How to choose ?
Venu Anuganti
 
PPTX
Big Data Introduction - Solix empower
Durga Gadiraju
 
PDF
Upgrade to a Dell EMC PowerEdge R740xd database server that harnesses the pow...
Principled Technologies
 
PDF
Geek Sync | Successfully Migrating Existing Databases to Azure SQL Database
IDERA Software
 
PPTX
MinneBar 2013 - Scaling with Cassandra
Jeff Smoley
 
PDF
Oracle Cloud Infrastructure – Compute
MarketingArrowECS_CZ
 
PPTX
Oracle database 12c_and_DevOps
Maria Colgan
 
PDF
Power BI with Essbase in the Oracle Cloud
Kellyn Pot'Vin-Gorman
 
PDF
Azure Data services
Rajesh Kolla
 
PDF
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
huguk
 
PDF
Upgrade to a Dell EMC PowerEdge R740xd database server that harnesses the pow...
Principled Technologies
 
PPT
Frb Briefing Database
Clarke Colombo
 
PDF
Aerospike meetup july 2019 | Big Data Demystified
Omid Vahdaty
 
PPTX
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
 
PPT
Exadata
Maged Ali
 
PDF
Oracle vs NoSQL – The good, the bad and the ugly
John Kanagaraj
 
PPTX
Application design for the cloud using AWS
Jonathan Holloway
 
PDF
Get more out of your Windows 10 laptop experience with SSD storage instead of...
Principled Technologies
 
PPTX
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
 
PDF
NoSQL Database: Classification, Characteristics and Comparison
Mayuree Srikulwong
 
SQL/NoSQL How to choose ?
Venu Anuganti
 
Big Data Introduction - Solix empower
Durga Gadiraju
 
Upgrade to a Dell EMC PowerEdge R740xd database server that harnesses the pow...
Principled Technologies
 
Geek Sync | Successfully Migrating Existing Databases to Azure SQL Database
IDERA Software
 
MinneBar 2013 - Scaling with Cassandra
Jeff Smoley
 
Oracle Cloud Infrastructure – Compute
MarketingArrowECS_CZ
 
Oracle database 12c_and_DevOps
Maria Colgan
 
Power BI with Essbase in the Oracle Cloud
Kellyn Pot'Vin-Gorman
 
Azure Data services
Rajesh Kolla
 
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
huguk
 
Upgrade to a Dell EMC PowerEdge R740xd database server that harnesses the pow...
Principled Technologies
 
Frb Briefing Database
Clarke Colombo
 
Aerospike meetup july 2019 | Big Data Demystified
Omid Vahdaty
 
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
 
Exadata
Maged Ali
 
Oracle vs NoSQL – The good, the bad and the ugly
John Kanagaraj
 
Application design for the cloud using AWS
Jonathan Holloway
 
Get more out of your Windows 10 laptop experience with SSD storage instead of...
Principled Technologies
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
 
NoSQL Database: Classification, Characteristics and Comparison
Mayuree Srikulwong
 

Similar to 22059 slides (20)

PPT
xTech2006_DB2onRails
webuploader
 
PDF
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld
 
PPTX
WiredTiger Overview
WiredTiger
 
PPTX
WiredTiger Overview
WiredTiger
 
PDF
Prague data management meetup 2018-03-27
Martin Bém
 
PPT
Oracle Database 11g Lower Your Costs
Mark Rabne
 
PPTX
Long and winding road - 2014
Connor McDonald
 
PDF
Ralph Kemperdick – IT-Tage 2015 – Microsoft Azure als Datenplattform
Informatik Aktuell
 
PPTX
Handling Massive Writes
Liran Zelkha
 
PDF
Bases de datos en la nube con AWS
Amazon Web Services LATAM
 
PPTX
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
Dustin Vannoy
 
PPTX
Databricks for Dummies
Rodney Joyce
 
PDF
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
Lenovo Data Center
 
PPT
Large Scale SQL Considerations for SharePoint Deployments
Joel Oleson
 
PPTX
SPSMadrid Get sql spinning with SharePoint. Best practice for the back end
Knut Relbe-Moe [MVP, MCT]
 
PPTX
Oracle mysql comparison
Arun Sharma
 
PPTX
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
PPTX
Polyglot Database - Linuxcon North America 2016
Dave Stokes
 
DOC
SAMADMohammad
Mohammad Abdus Samad
 
PPTX
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Bill Wilder
 
xTech2006_DB2onRails
webuploader
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld
 
WiredTiger Overview
WiredTiger
 
WiredTiger Overview
WiredTiger
 
Prague data management meetup 2018-03-27
Martin Bém
 
Oracle Database 11g Lower Your Costs
Mark Rabne
 
Long and winding road - 2014
Connor McDonald
 
Ralph Kemperdick – IT-Tage 2015 – Microsoft Azure als Datenplattform
Informatik Aktuell
 
Handling Massive Writes
Liran Zelkha
 
Bases de datos en la nube con AWS
Amazon Web Services LATAM
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
Dustin Vannoy
 
Databricks for Dummies
Rodney Joyce
 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
Lenovo Data Center
 
Large Scale SQL Considerations for SharePoint Deployments
Joel Oleson
 
SPSMadrid Get sql spinning with SharePoint. Best practice for the back end
Knut Relbe-Moe [MVP, MCT]
 
Oracle mysql comparison
Arun Sharma
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
Polyglot Database - Linuxcon North America 2016
Dave Stokes
 
SAMADMohammad
Mohammad Abdus Samad
 
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Bill Wilder
 
Ad

22059 slides

  • 1. Building a low cost virtualized computing grid for scaleBy Bradley D. Brown, TUSCOracle Certified Advantage PartnerThe Oracle Experts
  • 2. AbstractHave you considered using Postgres, MySQL or Oracle Express (the free Oracle DB version) to reduce costs? Have MySQL’s index optimization (or lack thereof) or Oracle’s 4GB database size limit caused issues for you? Are you considering creating a low cost license cost-free grid? Do you have your "master" database in a costly DB license (i.e. Oracle, DB2 etc.)? In this presentation, you'll learn how IntelliReal stores their "master" database in an Oracle Standard Edition One database instance and uses a MySQL grid to provide a low-cost scalable solution to meet their needs. The Oracle database is about 4TB. The grid effectively stores a materialized view of the key data from the Oracle database, which is roughly 400Gb of data, into a series of physically partitioned MySQL databases. The data is US residential property data. Data is partitioned by US zip codes - about 10,000 databases (or schemas in MySQL). The grid architecture will be discussed in detail.
  • 3. Bradley D. Brownhttp://bradleydbrown.blogspot.comTUSCFounder, Author, Chief Technology OfficerUniversity of DenverAdjunct ProfessorGraduate Class – New Venture CreationClientsIntelliReal, OAB, EventConnex, Jepp, Sun, …GroupsYPO, OOW, IOUG, ODTUG, LAOUG, RMOUG, etc.Oracle Fusion Middleware Director/AceIOUC Fusion Council
  • 4. AgendaStartup 101N-tier grid architectureWhy build a grid?Scaling considerationsThen and NowLessons LearnedMoving to ProductionCompetitive BenefitCloud computing and other reference material
  • 5. Startup 101 for Startup #10Failure 101Purchased software of failed entityDenver MSA only, MySQL and Delphi, RussiaFocused on small market (appraisers)Goal / VisionBest valuation and real estate intelligence in the USAMUST be nationwidePrototype / POC $20kAll Oracle and ApExDenver MSA onlyPeople loved what we were about to deliver to the marketB2B is key, $ while sleepPrimarily “fixed” costs, low variable costsConserve cash, retain equity, build a scalable platformBeyond the Prototype into BetaNext stages - $1M+Procure dataDifferentiator - MLSFull developmentOracle SE OneNumerous MySQL databases for scaling services, etc.Purchased 25 servers, 30TB of storage for about $50KLight weight SOAWeb ServicesBusiness to BusinessUsed iPerspective to expose data and business logic as Web ServicesWrite it onceUse it everywhereCustomersUIInternally
  • 6. The Concept and Reference Arch.Building an N-Tier Grid ArchitectureApproach and architecture for UI layer and for B2BRobust, scalable computing environmentLimitless scaling and redundancy in this architecture
  • 8. Why Build a Grid?VirtualizationCall one serviceLoose couplingScaleRedundancyFailoverBackup, slower, lower priority data accessCostOpen Source
  • 9. Grid and Scaling ConsiderationsOracle SE One$5000/socket2 sockets maxNo DB size limits4TB database100M rows“Wide” rows, many tables400Gb de-normalizedNo partitioningNo clustering (RAC)Expensive to move to EEOracle ExpressFreeVery LimitingLimited to 1 processor4 GB of DB space1 GB of memoryMySQLFreeNo limitsOracle version 5Index optimization?Physical partitioningPartitioning better now
  • 10. The Grid – then and nowPhysical Partitioning800 partitions initially10,000+ partitions todayCould be different drives or different boxesPartitioner moves the data from Oracle to MySQLMySQL replication moves it to all 10 boxes10 small Dell 850s - $2k, 4Gb, 1TbInitially partitioned 5 ways by County (800 counties) with 2 deep for failover and performanceLater 1 way, 10 deep and by zip codesQueries get directed based on partition and prioritized round robin technique through boxesTiers (best data - Oracle, older - MySQL, etc)Prioritized per boxUpward scalability
  • 11. Lessons Learned – “The Ugly”Initially boxes were crashing regularlyTook time to rebuildVirtualization (VMs) solves this to a large degreeCan be a lot to support on your own if you don’t purchase software to manageExtra processing was requiredConstantly reading the master database with 10 Partitioner threadsExtra “master” processingNetwork bandwidthReplication chews up network bandwidth tooRecommend Open Source ETL approaches such as Pentaho
  • 12. From Concept Into ProductionTook $1M more…Gave up some equityFocused on coreBetter values, dataMore servicesBenchmarksMLS dataQuick changes based on customer demandsExtensible API for customers
  • 13. Blowing Away The CompetitionBig companiesSlower to move to new technologySlower to integrate new data sourcesThey have a flywheel$20k today is valued at $30M – 2 more years, $100MWhy?Quick response to market needsFocused on core, not contextDisplaced the marketiPerspectiveWrite in language you know, takes years to master a language
  • 14. Cloud Computing and VirtualizationAmazon EC2Powerful GridOracle CloudUses EC2 and S3SaaSDeploy and scale your app in hours without changing codeProvision, monitor and manage operations with just a browserScale from a fraction of a server to hundreds of CPUs in daysGet your life back -- no more late night rushes to replace failed equipment
  • 15. Periscope Virtualizes Google’s Web ServiceSELECT *FROM googleWHERE searchstring = ‘RMOUG Brad Brown’RMOUG Brad Brown – 8 recordsRMOUG – 374 records
  • 16. Good Papers, Presentations, …Step-by-step Web Service CreationWeb 2.0 & Apex PresentationsWatch my Bloghttp://bradleydbrown.blogspot.comIntelliReal Case StudiesJava-based Oracle Web Development
  • 19. Java for the PL/SQL Developer
  • 20. Web Cache – achieving 150 the performance
  • 28. iFS
  • 30. Top DBA scripts for Web Developers
  • 31. SecuritySummaryStartup 101N-tier grid architectureWhy build a grid?Scaling considerationsThen and NowLessons LearnedMoving to ProductionCompetitive BenefitCloud computing and other reference material
  • 33. Copyright InformationNeither TUSC nor the author guarantee this document to be error-free. Please provide comments/questions to [email protected] © 2008. This document cannot be reproduced without expressed written consent from an officer of TUSC.

Editor's Notes

  • #4: I wanted to give you a little background on myself.I’ve been in consulting for almost 30 years now. Early in my career a good friend of mine told me that I should never forget where I came from – i.e. always stay technical.I’ve been fortunate in my 20 years with TUSC in that I’ve been able to remain technical and be involved in a number of customer accounts.I’ve also been involved with a number of startups – as a founder, acting CIO/CTO, board member, etc.The last couple of years I was fortunate enough to teach a graduate level class at the University of Denver called “New Venture Creation.” It’s all about the process of taking something from an idea into fruition – whether it’s within an existing company or a brand new company.My point is that I understand the pain that businesses experience when it comes to technology. I’ve been speaking about the Web since 1994. Why? I was asked to debate that client/server was here to stay. I quickly knew it was not when I did my research on the Web. Back then I started talking about dynamic content generation. Not long after I talked about XML, then Web Services, then grid computing…I can tell you that these technologies are here to stay.Today I’m going to share with you some information about a startup that I was involved in that started about 3 years ago now. This is when we implemented our first grid technology. That startup is alive and well today and extremely successful – why? A lot of the success is due to use of the grid, another part of the success is the result of Web Services – a light weight SOA implementation. Both provide for virtualization or loose coupling as we’re going to talk about here today. Web Services allowed the business to write it once and use it everywhere. The grid allowed us to scale quickly and to provide redundancy throughout. Enough about me, let’s talk about the agenda.
  • #6: Whether you’re working for yourself, in a startup, in a large company or wherever, I believe you will have a lot of the same goals that we had a this startup.A friend of mine had invested in a startup company. They had spent about $3M developing an application that wasn’t working out. The last investment made used the IP to secure the capital in – a final stage of death for a startup. My friend asked if I would review their IP. We quickly realized that the IP wasn’t worth anything. It was much too limiting – too small a scale, too small of a market (Denver only), undocumented code, etc.However, the concept was good. The relationships were good. So we worked out a purchase agreement and came up with a goal to be the best valuation engine and provide the best real estate intelligence in the US. Originally we thought we would do that one market at a time, but that didn’t work out…it just wasn’t possible to sell anyone values for “Denver.” People wanted valuation data for the entire US.We invested $20K to do a proof of concept. We bought some hardware and software and build a demo based on Oracle software and Oracle’s Application Express development environment.People quickly loved what we were building. We focused on the default mortgage world because it was “growing rapidly.” This was all prior to the huge subprime fallout – but we saw it coming.The beauty of the business model is that it’s much likely software. You can build the data for a property and sell it to 10 different people (or more) and there are 100M properties in the US. That’s quite an inventory of product at $10-20 per valuation. We needed to scale quickly.We used Web Services so that our customers would have access to the data and we could just charge them “by the drink.” The less cash you can build a startup for, the more equity you’ll retain, so cash conservation is very key! But…if you screw up and have an outage, you’ll lose your customer for good.
  • #7: Once we proved the concept, we learned more about the industry and knew we needed to raise some cash to get the business going.We also had to purchase the data needed for the US – the upside is that it’s available. The downside is that it’s expensive, and everyone else has the same data – which is low quality.So we need a differentiator…which turned out to be MLS data. We secured exclusivity to that data.We purchased a bunch of hardware, put everything in a data center and started building a full scale application.As I mentioned, Web Services were a big part of the virtualization.The Web Services use the grid to deliver to the customer. Most everything is serviced Business-to-business. However, the services are also consumed by ApEx (the UI).More on iPerspective soonLet’s take a look at the architecture we implemented for the business to business transactions and for the UI layer.
  • #8: This is a great reference architecture. We’ll take a look at the grid virtualization here in a minute.This is the architecture that is still in place today. This architecture delivers a very robust and scalable environment.Why isn’t ApEx out of the box robust? Everything that gets painted in ApEx is getting generated from the DB. What I’m suggesting is that you use ApEX to do page painting and that you use iPerspective (or hand written services) to talk to your databases. Not just Oracle either – any data source!Our first “ApEx” application (actually, it was Oracle’s App Server with the Web Toolkit, but that’s what ApEx is built on) had 30,000 concurrent users. We had to have BIG Oracle boxes to support this application, but it did scale up to 30,000 users.Using this architecture, the scalability is virtually limitless. In fact, you’ll see that beyond scaling, there are other advantages to this architecture, such as failover, virtualization, easier migration and so on.Let’s talk about the architecture from the bottom up – data sources – Oracle, MySQL, whatever.Then the App Servers, iPerspective servers – who’s Web Services connect to the DB – only when needed.Then your ApEx servers – could be running on small boxes – Oracle Express, Oracle SE one, etc. At this layer, customers can also access these services. Write it once and use it everywhere!At the top – your clients of course
  • #9: We have 2 grids in our configuration. One for data that’s sitting in the Oracle and MySQL environments. Another for address processing and geocoding. Address normalization, correction, and geocoding can be a very expensive process for a business. We figured out a way to do this for about $1000 for 100M properties – which about 10M get normalized a month. The failed company had spent $100k for the Denver MSA alone!You’ll notice the DMZ, which is a key point in this graphic. Nothing is exposed!The Web Services virtualize the low cost computing grid! We’re going to talk more about the costs of the grid in just a minute. Let’s talk about why a grid can be of value to you.
  • #10: There are many motivations for building a grid computing environment.Our primary driver was value or cost, but also scalability and redundancy/failover.Virtualization is a side benefit in my view. In other words, customers simply call a service and it returns a whole set of information – as if that information is coming from one server. The reality is that the course grain services that our customers use call a number of finer grained services to get the job done. For example, let’s talk about the Intelligent property report on the right side of the page. This is a service that can be called by a customer or the UI. One operation in the service returns a PDF and another operation returns the raw data behind this report.The data in this report comes from a number of data sources. But it all starts with an address from the customer (or through the UI). The Web Service virtualizes all of the work that done behind the scenes. The fine grain services do the work. For example, the getIdFromAddress service calls the address servers (in a round robin fashion) to normalize the address. Based on the normalized address, we then look up the property ID that matches the address. By normalize, I mean that you might type street and someone else might abbreviate, but we always stored it normalized. Each of the subsequent fine grain services use the unique property ID to return the additional data. Another fine grain service calculates the value of the property and establishes comps or comparable properties to back up the valuation. Calculating a value is not an easy process – we calculate the value about 5 different ways, then triangulate on the “right” value for the property. You’ll also notice graphs, sliders, maps, and images in this report. Initial Google Maps were used, but later we moved the processing to a mapping service from ESRI. The images might come directly from the county, from the MLS service or from a picture provider.Keep in mind that this entire report needs to be generated in about 5 seconds or less…and we need to be able to generate MANY such reports concurrently. Processing speed is important when customers provide batches of 100k properties that they want valued – today. In fact, the industry does benchmarking of valuation companies. The benchmarks give you 48 hours to return your results.In other words, if we couldn’t provide course granularity and virtualization – we would never be able to get the job done. If we couldn’t scale the application, we would never land deals like the Realtor.com deal where we had to have pre-established values for every property in the US. If you don’t have redunancy, customers will drop you in a heartbeat. Customers monitor our services 24x7. One blip on the radar and an entire escalation process goes into motion!Sure, you can do something like this with unlimited computing power, but we didn’t have that ability. We had a limited budget. Zillow raised $87M to do much of this!Let’s talk about some of the specifics as to why we wanted a grid…
  • #11: We were able to fit our entire database onto Oracle SE One – that wasn’t an issue, but there are limitations – i.e. 2 sockets max, which was a real limiter. Also, there are no grid or clustering capabilities for $10k per box. There’s also no partitioning in the SE license. Our database is about 4Tb total.We considered using Oracle Express (the free version of Oracle), but quickly realized that wasn’t going work. There are far too many limitations with Express. Although we could get the country down to about 400Gb, that would require that we precisely partition our database into 100 partitions or instances. That wasn’t practical.MySQL is free and there are no limits, but we quickly realized that it’s about where Oracle was back at version 5 (or maybe 4). The indexing schemes are severely lacking. Many queries that we ran were complex and caused full table scans, which brought our MySQL boxes to a screeching halt in a hurry. So we opted to physically partition our 400Gb of data. Initially we partitioned by counties. There are more than 3000 counties in the US, but the top 1000 counties have 90% of the population…and counties like LA county, CA and Marion county, AZ are HUGE. So we moved to a partitioning schema that’s a bit more even – zip codes. There are about 10,000 zip codes in our database, so we have 10,000 physical databases – or schemas in MySQL. Queries are executed against a schema – full table scans tend to hit a maximum of about 10,000 records, which never kills our machines.Let’s take a closer look at the grid itself.
  • #13: Some lessons learned. Not everything was wonderful of course. We learned from our mistakes. Initially we had a lot of hardware failures. Rebuilding boxes was killing us. Virtualization to configure new boxes became key to us.Our replication schemes chewed up a lot of network bandwidth – it became the limiter to replication. We had to change our strategies rather than just cycling constantly to replicate, but more intelligent replication – i.e. only when data in Oracle changes does it get pushed out to the grid.We wrote everything from the ground up initially. Primarily because there weren’t any proven packages available at the time. This included our data loading / ETL processing. We later moved our ETL processing to Pentaho, which provides a lot of virtualization and grid capabilities out of the box.The bottom line is we learned a lot!
  • #14: Nothing is as easy as it appears – even when it appears really easy. Nearly out of the gates, we had an offer for $10M (real money) for what we built. Roll that acquisition forward – that company ceased all business recently, so it’s good we didn’t sell! It took another $1M of investment, but the good news is that by the time we gave up equity, our valuation was higher, so we gave up very little equity in exchange for the money.I’m a big believer in not doing what I call “drinking your own Kool-Aid” – don’t invest in the early/next stages of your own idea. Prototyping – yes, invest, but minimal. Everyone will tell you that you’re idea sucks or that it’s great…but you’ll know if it’s great or not based on how easily someone writes you a check.Using virtualization allowed us to focus on what was core to the business. Better values for properties. Better data. More data – i.e. MLS data. More services than the market offered – i.e. audit your appraisers valuation. iMVI which you see to the right, etc. We were able to change very quickly based on demands from our customers. We provided an extensible API for customers, so they called the Web Service and we did all of the heavy lifting virtualized – in seconds or even subseconds…
  • #15: With all of this virtualization, as I mentioned, this company could focus on the essence of what makes them better than the competition. Large companies are slow to respond to the market needs and demands, which changed rapidly especially when the entire subprime market was crashing down at their feet. With virtualization, customers experience was simple, but what was going on behind the scenes was quite complex. I’m happy to tell you that what was once a $20,000 investment is valued at about $30M today. Give it another 2 years and the value is likely around $100M. In fact, due to the virtualization and scaling ability, the business has primarily fixed costs, which means that after you pass the breakeven point, nearly every dollar in is profit. This would not be possible without virtualization. It would not be possible without a low cost virtualized computing grid!Surely this is true in your business too! If you can respond quickly, you’re ahead of your competition. If you can do this inexpensively…again, it results in higher profits.We turned what we learned into the iPerspective technology. It provides for virtualization of Web Services. It allows you to access any data source (Oracle, MySQL, etc) as a service. It allows you to access a grid of data in a prioritized fashion. In other words, we greatly simplified this with our technology. This is done not only for data, but also for technical services (i.e. business logic you’ve written in Oracle PL/SQL). We also integrated other data sources and APIs into the product, so you can virtualize email, other services, etc.You can build your own grid or you can rely on someone elses…so let’s talk about Cloud computing.
  • #16: As I mentioned, you can build your own virtualized computing grid, or you can rely on someone else to do it for you. Amazon has built an incredibly powerful virtualized grid of computing power. The EC2 cloud. If you want to spin up a virtualized image, you can do this with EC2 for 10 cents an hour per machine. Build your business around this model and you’ll have a robust and powerful business model!There are many advantages to cloud computing and paying for software as a service.This is something to consider and look into!
  • #18: I have written a number of white papers and presentations on virtualization topics. From Web Services to Web 2.0 to XML. I have a blog where you can find current information as well. You might be interested in case studies too – of which I’ve written plenty. I’ve authored 5 books for Oracle Press – all in the Web space.Let’s wrap this presentation up now.