Sep 21, 2018

Difference between EBS and S3

  1. EBS: Consider this to be a hard-disk. Your server image must be booted off a disk, and this is it. EBS is basically your fault tolerant file-system.
  2. S3: A REST friendly place to store your files, and then retrieve them using HTTP. For performance and scalability reasons you may want to place data and images on S3 rather than running it directly off your server. For example if all you have is static content you could put the whole thing up in an S3 bucket, and not pay for any compute power, just storage, and bandwidth which is cheap.
  3. S3 is completely pay-as-you-use. ie you only pay for the storage and bandwidth that you are using. Whereas with EBS you need to decide up front how big you want your EBS volume to be and you pay for the entire amount regardless of how much data you are actually using.
  4. A single EBS volume can only be connected to one EC2 instance - you cannot share them simultaneously between multiple instances. This may not be an issue for some situation, but its something to be aware of. Its not a shared storage space. On the other hand, resource on S3 are accessible from anywhere using a simple URL.
  5. S3 buckets can be used with Cloudfront (CDN) to speed up delivery around the world. You cannot do this with EBS volumes.
  6. If you are using S3 as for storing data from user uploads, especially in a distributed environment, one big consideration is the fact that S3 is 'eventually consistent' (although, some regions are read-after-write consistent). The consequence of this is that you may successfully upload a file, but if you check for its existence immediately thereafter, may find it to not exist. This problem is more pronounced for scenarios such as updates or deletes, where even read-after-write consistency will not help.
  7. The above will apply to your uploads to S3 regardless of the approach you take. In fact, this is true of most problems one might expect of S3 - it is not so much the approach used to store the data as it is the limitations of S3 that will likely be the most problematic.
  8. S3fs uses the S3 API - just like the PHP (or other) SDK does. Moreover, S3 is designed to handle fairly high levels of concurrency - so (other than the consistency issues) there shouldn't be a problem mounting it on multiple instances (keeping in mind it isn't a traditional file system - problems like locking, etc are handled on the S3 side).
  9. EBS means you need to manage a volume + machines to attach it to. You need to add space as it's filling up and perform backups (not saying you shouldn't back up your S3 data, just that it's not as critical). It also makes it harder to scale: when you want to add additional machines, you either need to pull off the files to a separate machine or clone the files across all. This also means you're adding a bottleneck: you'll have to manage your own upload process that will either upload to all machines or have a single machine managing it.
  10. S3: it's set and forget. Any number of machines can be performing uploads in parallel and you don't really need to notify other machines about the upload.