You Can Change It Later

The Blog Of David Marks

Should I get a SAN to scale my site architecture?

Posted on | October 23, 2008 | 7 Comments

A friend who runs a popular web photo hosting service called me today with a scaling question:

“What do you think about getting a SAN to help me scale my storage needs as my site scales?”

After a bit of discussion, I told him what I knew about the technology which was that I knew people who had had good experiences running large sites using SANS for just about all their storage needs, and I knew people who suffered long outages when their SAN had a controller failure or other fundamental problem. These people had hung out waiting for their vendor to fly someone out to fix the problem or bring a replacement system.

Think about this: your site is completely down, you have no access to your data, and you have to wait for someone to get on a plane to come fix it while you’re sitting around with a bunch of engineers who can fix most other problems in a few minutes.

For a typical web company hours or days of downtime is absolutely unacceptable and a risk to the entire business. I think the lesson here is that using commodity hardware instead of proprietary solutions is about more than saving a few dollars on hardware, it’s also about minimizing the risks to your company.

Such a single point of failure from both a technology and business perspective is really putting all your eggs in one basket. It’s really nice to be able to walk down the street to Fry’s to pick up emergency equipment, and it’s also nice to be able to switch vendors when something isn’t working for you. And in the land of startups, that kind of agility and freedom is really, really important because the unexpected happens all the time when you’re scaling the business.

My friend is skipping the SAN for now in favor of a bigger server with fast drives, then partitioning his data across a couple of systems when things really get big.

Comments

7 Responses to “Should I get a SAN to scale my site architecture?”

  1. Adrian Cockcroft
    October 26th, 2008 @ 11:04 am

    As flash based SSDs start to go mainstream, the need for high spindle count and large caches to scale for high performance using a SAN can be replaced by locally connected SSDs. The cost of connecting to SAN based storage is high. When you factor in the HBA, SAN switch infrastructure, Storage Virtualization Controllers and RAID controllers you will find that they cost much more than the disks you are storing data on. So even though individual SSDs are still expensive, there are large offsetting savings by simply putting SSDs directly into the hosts that need storage.

    From an operational cost point of view, you save on administrational complexity, power consumption and rack space, and as mentioned, avoid failures that depend on complex vendor specific debugging to resolve them.

  2. IT Blog
    October 26th, 2008 @ 11:35 am

    […] Web 2.0 Installationen, bei denen Skalierbarkeit und Verfügbarkeit natürlich oberste Prio haben: Should I get a SAN to scale my site architecture (You Can Change It Later Blog). Geschrieben von Bernd Eckenfels in Infrastruktur | Kommentare (0) […]

  3. Dan C
    October 26th, 2008 @ 12:11 pm

    Hi David,

    Your comments are interesting. There certainly are pros and cons for each, which will dovetail into the specific requirements of specific startups. However on the face of it, unless your Real Life conversation was much broader, I think you may have missed a trick.

    You cite vendor hardware support as the biggest single stumbling block. What you don’t mention is what kind of support contracts the anecdotal cases were covered by. Were the issues solved within the SLA period that the company was sold? Were they previously led to believe that spares could be sourced locally, faster or would be customer field installable?

    On the flipside what had the aforementioned people all purchased motherboards, controllers and disks from Frys. Six months down the line a controller goes up in smoke. You have no relationship with Frys to ensure that they carry the same controller you previously purchased from them. They may not be able to offer advice as to whether the current products on their shelves will serve as a drop in replacement. Where do you turn that will beat a two or four hour brand name SLA?

    Before you say “but SLAs cost money”. I know. But these are, as you say, the things that minimize risks to your company. When it comes to storage vendors you are ploughing your money into two tangible items: software and support.

    Support is obvious. It is all of the above, plus in a lot of cases a resource to verify “I have X and do Y will Z definitely happen?”. Software is all the clever stuff which is sometimes indistinguishable from the word “product”. These are high availability, a good mechanism for snapshots, a simple path to expansion, etc etc. Maybe your commodity hardware and OSS software can do some or all of these things. Can it do them all as effectively?

    Don’t get me wrong. I am not employed by or sponsoring a brand name vendor. I am an OSS creature at heart. And it is because of this I’ve experienced the growing pains of commodity storage and bleeding-edge open source solutions. Step changes are not the same as scaling. Sometimes you have to pick best of breed from the outset.

  4. Aleksander Podkopaev aka 'sure'
    October 28th, 2008 @ 4:37 am

    BTW – single path SAN is not right designed SAN.

    2Adrian:
    High spindle count is not only to lower access time, but to rise transfer speed too. Look on today’s SSD specs – single drive has 60-80Mb\s only.

    2David:
    A friend of mine is a certified storage engenier running a huge SAN. Developers are calling 500G database “small”. A 12Tb storage – tiny.

    So, it’s all depends… There are number of ways to achive similar avialibility of web site, SAN – on of them.

  5. David
    October 28th, 2008 @ 10:22 am

    Thanks for the great comments guys — here are a couple of brief responses:

    Dan: Very good questions. And I agree: this depends on the situation of the startup as to what the right solution is for them.

    To answer your SLA question:

    “Were the issues solved within the SLA period that the company was sold? Were they previously led to believe that spares could be sourced locally, faster or would be customer field installable?”

    My understanding was that YES, there was an SLA in place but NO the hardware wasn’t able to be fixed in that time frame. I don’t know what, if any, compensation occured based on that situation.

    Regarding the issue with replacing a RAID controller, they’re more widely available and also cheap enough to keep a couple of spares on hand in case of failure.

    Adrian and Aleksander: You’ve hit on something I’m hoping will be viable soon: SSD storage solutions. From what I hear from others who have been testing these, so far they’re still unreliable and suffer from write-limitations that are far too low for commercial use. Many online apps don’t really have a gigantic working data set, but need raw throughput which is why so many people use distributed caches like memcached. Why not just keep everything in memory if your cost/MB for SAN storage is low (compared with RAM)? Hopefully, soon this will be a real option.

  6. Chris Kottom
    November 24th, 2008 @ 5:01 pm

    Hi David,

    There’s no question that your comments are valid for a certain class of requirements. For 98% of the applications out there, local attached storage using some sort of RAID configuration to provide greater throughput and data protection is more than adequate. So for the example of your friend’s photo sharing service, a simple set up (even if he decides to go with a roll-your-own file server solution) is likely to be sufficient. For something like Flickr though, probably not so much, either from a capacity or a performance standpoint. There comes a point in the scaling of the operation (if you’re lucky) where agility is trumped by other considerations.

    I completely agree with Aleksander above – if you’re going to shell out the kind of money it takes to make the jump even from local disks to mid-range network attached storage, unless it’s strictly about capacity requirements and you don’t care about availability and introducing/carrying over an SPOF in the storage tier, plan to start buying two of everything.

  7. Нефтянник Масимка
    May 20th, 2009 @ 6:48 am

    Хе, почему ж так вот? Исследую, как нам раздвинуть данную тему.

Leave a Reply