Is it time to move from SQL Azure to Amazon RDS SQL Server?

I am aware that Microsoft appears to be dropping Azure, renaming SQL Azure as SQL Database, but for the purposes of this post I will stick with the old names.

In the middle of 2011 we shipped a brand new system to a customer. This system required a couple of databases, a web server and some Windows services hosting. The processing demand wasn’t high so it could easily fit on a small dedicated server.

Our client had no IT support staff and we did not want to support the infrastructure. The budgets were tight, but some downtime could be tolerated provided that no human intervention was required. As we were already using Amazon SQS (Simple Queue Service), we immediately thought of using an Amazon EC2 instance running SQL Express with scripts (for example using Amazon Cloud Formation to restart the instance if it were to fail). It quickly became obvious that to write the scripts to ensure that the database came back reliably in a consistent state was neither going to be easy nor quick, and deadlines were pressing. The quotes for fully managed dedicated servers ran into several thousand dollars per year plus a set-up fee and they would have to sign up for between 1 and 3 years. That left us looking at Windows Azure with SQL Azure. SQL Azure gave us the zero support option we were looking for at a much cheaper price than the fully managed dedicated servers. It didn’t tie the customers in so we could move away if things didn’t work out. Also data is copied to a replica in the transaction so we even got a redundancy thrown in which we hadn’t expected we could get at the rates the customer was prepared to pay.

We ended up with a couple of databases, a web role and a couple of worker roles, all working fine on extra small instances. Moving from a single server solution did require changes to some parts of the application, and an effort was required to reduce the chattiness of communications with the database (as with any hosted database). At this point we were using the best choice for the client.

Competition is generally a good thing so I was very pleased to hear that Amazon has extended its RDS (Relational Database Service) to include SQL Server in addition to MySQL and Oracle. We have used Amazon RDS for MySQL, so I was very keen to see what it could offer with SQL Server. Naturally I wanted to see if we are still in the right place for our Azure-based customer.

It doesn’t take long to realise that Amazon RDS SQL Server is a very different beast from SQL Azure.

Database Features

Amazon RDS SQL Server gives you a full SQL Server installation, and it supports all the core database engine features. Using SSMS you would be hard-pushed to spot that you are connected to a hosted database. You have access to all the data management views and functions, and if you really want to, you can even run SQL Profiler remotely (be very tight on your filtering!). The full list of features is here.

If you look closely, there are a few features that aren’t supported that may send you running for the hills. A big one for us is encryption. With SQL Azure we are able to connect using SSMS using an encrypted connection and administer the database without any worries about prying eyes. We also configured the web site and the worker roles to use encrypted connections. The recommended solution for connecting to databases for administration purposes in the AWS world is to use a bastion host (a machine that is only booted if access is required and the only machine from which administrative access is allowed). While this is a very sensible way of going about your production access, in a project of this size that is an overhead which I would prefer to do without and I still like the added comfort of knowing that all the traffic is encrypted.

With SQL Azure you aren’t running with a full SQL Server installation. You can’t get under the covers to access data management views and functions, and even old faithful stored procedures like sp_spaceused don’t work. You have to resort to querying the system tables. READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION are set ON and you can’t change them. Also, you must have clustered indexes on all tables. The full list of the limitations is here.

Processing power and database size

With Amazon RDS SQL Server you can pick the storage size you want, from 20GB up to 1TB. This is data and log space combined. You can also pick the amount of memory in your server from 630MB (Micro) all the way up to 68GB (High-Memory Quadruple Extra Large), with similar choices in processing capacity.

SQL Azure you don’t get the option of picking the hardware; it is what is it. The database size is worked out differently. You only pay for the data you use, starting at just 100MB but only going up to 150GB. Log file space is not charged for, but you are billed for each database.

Backup and availability

With Amazon RDS SQL Server you are not able to do multiple availability zone deployments, so no real-time replication is built-in. Transactions logs do get backed up every 15 minutes, but you control when backups and maintenance can happen and you are able to create a snapshot of the whole database instance. Backups can be kept for up to 31 days and snapshots are kept until you explicitly delete them.

In SQL Azure each database is replicated and there are multiple redundant copies, hence the need for all tables to have a primary key. In theory you don’t need to backup your databases. Not being conformable without the option of a point in time restore we have been using the Redgate tools to take copies for us.

Administration

With the AWS management console, you are confronted with 20+ products that you may or may not be using. Finding the status of your environment takes quite a few clicks, and you need to know where you are going. The interface is fine for techies, but our customer glazed over when we tried to show them the SQS management tab. At this point we didn’t bother showing them about the EC2 instances. In addition you can create alarms with email alerts based on very many criteria to help you in monitoring your database without having to log in to the console.

On the other hand the Windows Azure Platform management console is much simpler. It is written in Silverlight (plugin required), is sometimes buggy and is painful to use on a slow connection. However even though it is simpler; you still need to know where you are going, but when you get there the status information is easy to understand. Our non-technical customer can easily see what is happening and reboot the web and worker roles if they see an issue. The downside is that other than very high level monitoring, you don’t get monitoring out of the box with Azure. There is the System Center Monitoring Pack for Windows Azure Applications here, but it most definitely isn’t a simple matter to set it up.

Conclusion

Comparing these two offerings.

Amazon RDS SQL Server

Pros

  • Full database support
  • You get an instance so you can have up to 30 databases
  • Database backups can be controlled and kept for up to 31 days or indefinitely with DB Snapshots
  • Good monitoring framework built-in
  • Database sizes up to 1TB
  • Can select the instance size according to workload

Cons

  • Multiple Availability Zone deployments not supported
  • No SSL connection
  • Alphabet soup administration interface
  • Smallest disk size 20GB

SQL Azure

Pros

  • Small, cheap databases
  • Simple administration interface
  • Multiple redundant copies of the database as standard
  • Encrypted connections

Cons

  • You can’t choose the hardware you are running on
  • Only supports databases up to 150GB
  • Limited SQL Server features
  • Only very high level monitoring by default

For the time being we are happy to stay on the Azure platform for this application. But the ability to choose what level of hardware and the much larger database sizes make Amazon RDS SQL Server a very interesting option for the future. Which one should you choose? It depends…

Over the next few weeks I’ll be posting more to help you decide, looking at costs and performance. This article will be updated with links to the later articles – or subscribe to our RSS feed. Stay tuned!

Contract – ASP.NET MVC, Azure, Amazon Web Services and More

It’s been a busy year for Catalyst Computing. We’ve been involved in some great projects and worked with some wonderful people.
There’s an opportunity arisen to work on one of those projects with some of those wonderful people…
The original system was deployed in July and has been progressively enhanced since. Our clients are looking for someone to undertake a 3 month contract to work on the system full-time to deliver new functionality.
It’s a fun project, and there’s opportunity to be hands-on with all sorts of things:

  • A web front-end written using ASP.NET MVC 3 (and jQuery and jQuery UI) deployed on Windows Azure
  • SQL Azure database
  • Entity Framework
  • Server processes written in C#
  • Winforms-based administration application (using DevExpress components) with OData web services to the database
  • Amazon Simple Queuing Service
  • Python running on industrial controllers
  • And tools and techniques: Resharper, Red Gate tools, dependency injection, xUnit and moq

C# skills are essential, but other than that the ability to quickly learn and use technologies is more important than past experience. You’ll be able to work either at their which are near us in South Lincolnshire (near Peterborough) or from home.

Guerrilla Warfare? In Brum

Guerrilla Tactics – Performance Testing MS SQL Server Applications

Tomorrow (Tuesday 12th July) I am in good old Birmingham NxtGenUG giving a talk on a Guerrilla Tactics approach to performances testing. Not how you would like your project to be but how to cope when they are not.

Below are the links to the slide, demos and other resources.
I used Apache Jmeter to do the multi threaded demos and there are link to that as well.

Have fun fighting those performance fires.

If you have any problem with the demos feel free to leave a comment, email or tweet. (I have to confess I am not great at using twitter these days.)

Links

SQLServer Profiler Impact: GUI vs Server-side

After a recent talk at Cambridge NxtGenUG I was asked what the effect on SQLServer performance of gathering a trace in batch mode as opposed to using the GUI.

The obvious answer, as always with these things, is “it depends”. But I was intrigued to see if I could quantify that a little further.

The batch profiling is done by setting up a server side Profiler trace. There is a post on how to do this here. Although ‘Export – Script Trace Definition’ from the SQLProfiler GUI does the job for you.

I have a little test rig here running SQLServer 2008 R2 and a load testing machine. I ran a simple load test creating orders. It had 20 threads and produces 4000 orders in just over 11 minutes.

I ran the whole test several times in three different scenarios:

  • No tracing
  • A GUI based Profiler trace logged data to screen and file on a remote machine
  • A Server-side Profiler trace logging data to a local file on the server

The two traces logged the same 190,000+ events and captured the same columns.

My findings aren’t very surprising but here they are:

No tracing 11min 25sec
GUI Profiler 12min 02sec
Server Profiler 11min 44sec

So the GUI profiling slowed the run down by 5.4% and the Server Profiler only by 2.8%.

The one thing that did surprise was the server profiler trace file, while holding all the same information, was 80% of the size of the file produced by the GUI.

The GUI has some advantages over Server Profiler:

  • it is easier to use
  • simpler to adjust the information being gathered
  • traces can be written directly to a database table
  • traces files are saved on the client PC, no server file system access required

Also in other cases the load the SQLProfiler put on the server would be low enough not to raise concerns.

However we can see that using server side profiling, logging to a local file on the server, can give you a real performance advantage over using the GUI.

Generating Customer Records

For the demos I created for the Guerrilla Tactics session I am doing at Nxt Gen I decided I was rather bored of the usual test data of Customer 1 …. 99999. So instead adapted a script we have used in the past to anonymise client data.

It is quite a simple script the only parameter is @TotalCustomersPerSex. This can be set anywhere between 1 and 50,000 to give unique names. If you want some duplication or more name you can increase it or mess about with the source data.

The script has been written to target MS SQLServer 2008 although it could be altered to work with earlier versions of SQLServer.

The script does the following:

  • Create a customer table
  • Populates a temporary table with the 500 most popular surnames in England, Wales and the Isle of Man. (From the very helpful site http://surnames.behindthename.com.)
  • Populates another temporary table with the 200 most popular boys names from England & Wales, randomly assigning a number I later use to populate the title.
  • The same is then done for 200 girls names. The sources for both of these list was from http://www.statistics.gov.uk.
  • I then create and populate a temporary table for males tiles and then repeat the process for females.
  • I then have an insert statement selecting from forenames and titles and then cross joining the surnames table. This gives a Cartesian product.
  • To populate information such as customer reference, initials, Post code, age, email address and password I extract characters from the forename and surname

I use the RAND function to populate the title field on insert is because this is only evaluated once per query. So if it is used in the final select only one random value would be selected. Not what I wanted.

I hope you find this useful.

Customer Generate script

Guerrilla Warfare?

Guerrilla Tactics – Performance Testing MS SQL Server Applications

Tonight (Tuesday 26th October) I am giving a talk on a Guerrilla Tactics approach to performances testing. Not how you would like your project to be but how to cope when they are not.

Below are the links to the slide, demos and other resources.
I used Apache Jmeter to do the multi threaded demos and there are link to that as well.

Have fun fighting those performance fires.

If you have any problem with the demos feel free to leave a comment, email or tweet.

Links

A Grand Performance

Guerrilla Tactics -Performance Testing MS SQL Server Applications

Last night (Monday 13th September) I did gave a talk on a Guerrilla Tactics approach to performances testing. Not how you would like your project to be but how to cope when they are not.

Below are the links to the slide, demos and other resources.
I used Apache Jmeter to do the multi threaded demos and there are link to that as well.

Have fun fighting those performance fires.

If you have any problem with the demos feel free to leave a comment, email or tweet.

Links

SQLBits Speaker Training

Firstly thank you to the SQLBits team and Microsoft for putting on the Speaker Training yesterday and particularly Simon Sabin and Guy Smith-Ferrier. It was a great day only marred by the awful traffic conditions getting there and back. Even though I spent six hours in a car there are still some things I can remember so it must have been good.

There were two highlights for me:

  • The critics of our own presentations
  • The ‘How to Give Great Demos’ session

The critics of our own presentations, there was a good balance between not crushing our egos and giving us something to work on.

The ‘How to Give Great Demos’ session

I had to laugh at this because about the first Guy said was “don’t, whatever you do, use a laser pointer”. Earlier this year I was presenting at Microsoft Cambridge and it was suggested that I needed a laser pointer as the screen was huge and I am use to training rooms where I can point to the screen.

He had good reasons and now I have been told I’ll start playing with Zoomit, as suggested.

Other pearls of wisdom were:

  • Introduce the context of the demo
  • Demo code is not production code, just the code to make the point
  • Don’t apologise that it is not production code and get on with the demo
  • Show the finished product so they know where you are going (unless you want to surprise them)
  • Keep the audience informed when you are typing
  • Slow the mouse down so it is clear what steps you are taking
  • Don’t use tools that aren’t part of the standard product e.g. Resharpen
  • After the demo have a slide with the one thing you want to get across in the demo
  • Record the demo as a fall back, perhaps you need an internet connection and that isn’t guaranteed
  • Make code Lucida Console and set the size to 14 or 16
  • Increase the DPI (maybe)
  • Set the default highlight to Black text and yellow background
  • Set the screen resolution to the project size you are going to use 2 weeks before the presentation
  • Create a user on your laptop with all these settings
  • Dave McMahon radically suggested developing all the demos on the laptop so you know they will fit

This and a lot more is available on Guy’s site.

http://www.guysmithferrier.com/downloads/HowToGiveGreatPresentations.pdf

How to steal a Terabyte of Data by Floppy disk

“Why are they banning USB sticks?” a colleague of mine at a client complained. New company policy was that any USB drives had to be encrypted and would only work on company machines. “After all,” he continued, “you could still steal data on a floppy disk.”

This reminded me of a chat I had with a friend whose company database had just hit the 1 Terabyte size when portable 1 Terabyte USB discs came onto the market. He was the security manager there and was locking down USB devices in the same way.
The other day I bought a 1 Terabyte USB2 hard disk and I thought I would see how this, now ubiquitous, device stacked up against the old-school floppy.

How long to copy a Terabyte?

It took 9 hours to fill the disk. Not quick, but could easily be accomplished within a working day. But how long would it take to copy a Terabyte to floppy disk?

1 Terabyte is 1,000 x 1,000 x 1,000 x 1,000 = 1,000,000,000,000 bytes.

1 floppy disk can hold 1,457,664 bytes.

So to hold a Terabyte you would need 686,030 floppy disks. 16 floppy disks stack 5 cm (2 inches) tall. So your stack would be 214,384 cm high – over 2 kilometers (1.33 miles). That’s quite a stack of disks!

Carrying the Disks

People carry sports bags to work all the time, so wouldn’t look out of place. 75 litres is a standard size in the U.K., so let’s use one of those. Each disk is 8.89 x 9.40 x 0.32 cm = 26.74 cubic centimeters, so we can hold 37.4 disks per litre or 18345 litres to carry all our disks.

So a mere 245 trips using our sports bag will do the trick.

How Long To Copy The Data?

By experiment, I managed to fill 7 floppy disks in just under 6 minutes. So not allowing for any comfort breaks you should be able to fill 1 sports bag of disks in 6 hours 40 minutes. So if you take a sports bag of disks in to work per day, you’ll be able to walk out with a Terabyte of data in 49 weeks.

Of course, after spending nearly a year of spending your working days swapping floppy disks somebody will probably have noticed and rumbled you. Although after carrying over 46kg (100lbs) a day you may be able to hold your own with a security guard for a bit :) .

Just a bit of fun, please don’t try this at home. Or the office.

NxtGenUG Nugget – T4 Coventry

Avoiding Boredom Using T4 (the Text Template Transformation Toolkit)

I am doing a Nugget on T4 at the NxtGenUG meeting at Coventry on 12th July 2010. The main event talk is Dave Sussman on How Clean is your ASP.NET?

The Nugget is a brief introduction to T4: what it is, what you can do with it and how it can make boiler plate coding far less dull and error prone. There have been a few minor changes since the nugget I gave in April. The demo project is VS2010 and uses a EF4 model rather than WCF data services model.

Links

01780 480740
enquiries@catalystcomputing.co.uk