Web Services

Amazon’s SimpleDB: Instantly Scalable Database Delivered as a Service

Amazon Web Services

You’ve been hearing about my love of Amazon for years now, including the awesome web services platforms they are making available to developers everywhere. They started off with a cheap and easy to use message queuing service (SQS), later added pay-as-you-go remote file storage (S3), then added on-demand computing capacity (EC2), and most recently, flexible payment services (FPS). These are all great, and together, let startups build web applications that start small and scale easily as demand increases. Until today though, there was one important piece missing from Amazon’s suite of web services.

Amazon’s new SimpleDB service, released in beta today, fills a very big gap in the company’s web services strategy. Their Elastic Computing Cloud allows you to run Linux apps and the Simple Storage Service gives you a place to reliably store files, but neither of these two provided a good way to do structured data storage. You can run MySQL and PostgreSQL within EC2 instances, but they weren’t meant to be used for long term storage. Instead, they are made to be setup and destroyed as demand for computing power fluxes. To work around this, it meant you either had to connect to a database hosted somewhere else on the Internet, or use a hack that lets MySQL store data in Amazon S3 instead of on a local file system. SimpleDB solves this problem by offering a simple database stored and replicated across Amazon’s various data centers.

There are some important things about SimpleDB to consider, however. This is not a drop-in replacement for relational database systems like MySQL, Oracle, SQL Server, etc. It is non-relational and schemaless, if that means anything to you. It boils down to a big mental shift for guys like me who have spent our careers thinking in relationships between database tables. This also means that developers are going to have a more difficult time programming and testing on local machines before deploying to production servers, as there aren’t any installable equivalents to SimpleDB (yet).

SimpleDB has some other big cons too. It is missing many of the “enterprise” features now common in relational databases, such as transactions, stored procedures, security policies, etc. Amazon also currently limits storage space for each domain (table in the relational database world) to 10 GB and query execution time to 5 seconds. Needless to say, this isn’t going to be the solution that everyone is looking for.

For its part, Amazon is making it clear that SimpleDB is for a particular use case:

Today, many developers correlate the word “database” with Relational Database Management Systems (RDBMS). While RDBMS offerings provide deep functionality, for many use cases, they introduce more complexity (and more cost) than is necessary. Many developers simply want to store, process, and query their data without worrying about managing schemas, maintaining indexes, tuning performance or scaling access to their data. Amazon SimpleDB removes the need to maintain a schema, while your attributes are automatically indexed to provide fast real-time lookup and querying capabilities. This flexibility minimizes the performance tuning required as the demands for your data increase.

For certain applications, I can see SimpleDB working quite well. There is still plenty of software that falls outside its scope though, so I don’t see any of the current database systems going away anytime soon. Plus, you have to factor in the people side of it. Programmers have been learning and using relational databases for decades now, and in many cases, you’ll have to pry them out of their cold dead fingers before they learn something new and different.

In any case, I believe SimpleDB is very important, and should not be ignored. Like most disruptive innovations, it will start out small in scope, but eventually lead to something game-changing. It will be fun to see where this technology goes in the near future…