Working with Object Stores: S3CMD
In my previous article we looked at How to get started with Riak-CS, an elastic and open source based Object Store. In this section of the tutorial, I'll be walking you simple steps to setup S3CMD, a tool that can be used to query, add/ delete objects from the object store.
A little overview of S3CMD from its website:
"S3cmd is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage or DreamHost DreamObjects. It is best suited for power users who are familiar with command line programs. It is also ideal for batch scripts and automated backup to S3, triggered from cron, etc.S3cmd is written in Python. It's an open source project available under GNU Public License v2 (GPLv2) and is free for both commercial and private use. You will only have to pay Amazon for using their storage."
We will be using S3CMD to perform simple tasks on our Object Store such as create buckets, add objects to them, list objects and finally delete objects and buckets.
To get started, first we need to download the latest version of S3CMD (here, I'm using v1.5.0-rc1). Do note that the latest version contains support for Multi-part Upload, a unique technique using which large objects are broken up into smaller, more manageable chunks and then uploaded to the bucket.
InstallationThe installation is a fairly straight forward process:
# wget http://sourceforge.net/projects/s3tools/files/s3cmd/1.5.0-rc1/s3cmd-1.5.0-rc1.tar.gz
Un-tar the S3CMD tar file:
# tar -xvzf s3cmd-1.5.0-rc1.tar.gz
Run the following command to install S3CMD tool.
# python setup.py install
We now need to configure S3CMD to use our Riak-CS server as the Object Store. To do that interactively, type the following command:
# s3cmd --configure
You may get the following error at this point: "Import Error: Trying to import dateutil.parser"
To solve this error, simply install python-dateutil package as shown below:
# yum install python-dateutil
You may optionally install the python-magic package as well. This package is used by S3CMD to guess the MIME type of the file that you upload to the Object store.
There's no harm if you don't install this package, although you will have to content yourself by facing warning messages from S3CMD, stating: "WARNING: Module python-magic is not available. Guessing MIME types based on file extensions"
# yum install python-magic
Once the pre-requites are met, you can again try to configure S3CMD:
# s3cmd --configure
There are 4 default settings you should change:
- Access Key — Use the Riak-CS user access key you generated above.
- Secret Key — Use the Riak-CS user secret key you generated above.
- Proxy Server — Use your Riak-CS Server IP.
- Proxy Port — The default Riak-CS port is 8080
Test the configuration. Make sure you get the success messages as shown below. Finally save these settings.
NOTE: These settings will be saved at /root/.s3cfg
Now comes the fun part.. where we actually test out the Riak-CS using S3CMD
1) List S3CMD version:
# s3cmd --version
2) Create a Bucket:
Syntax: s3cmd mb s3://<unique_bucket_name>
# s3cmd mb s3://yoyo-01
3) List Buckets:
Syntax: s3cmd ls s3://<unique_bucket_name>
# s3cmd ls
4) Add objects in a Bucket:
Syntax: s3cmd put <filename> s3://<unique_bucket_name>
# s3cmd put /opt/testfile s3://yoyo-01
NOTE: Optionally, you can use "get" to download the object from the Bucket as well. Syntax remains the same.
You can verify whether the file was successfully uploaded or not by listing the contents of the bucket using the ls command followed by the Bucket Name:
# s3cmd ls s3://yoyo-01
5) Check size of Bucket:
Syntax: s3cmd du s3://<unique_bucket_name>
# s3cmd du s3://yoyo-01
6) Syncing Objects to your Bucket:
Here, I created a 1GB dummy file using the dd command. The purpose of this large file was to show you how multi-part upload works out using S3CMD.
Sync is a little different to put command as it is used to "sync" a remote directory with several files in it (preferably with nested sub-directories)
IMP: Each time a sync is executed, it will only upload files that don’t exist or different version at the destination. If the files in the bucket already exist or are the same, there will be no transfer.
Syntax: s3cmd sync <folder> s3://<unique_bucket_name>
# s3cmd sync -r /opt s3://yoyo-01
NOTE: -r means recursive, in this example, it will recursively scan the /opt directory and transfer the objects within it to the Riak-CS. Also note how the 1GB test file that we created in the earlier steps is getting uploaded. Its broken up into 15 MB chunks (this is a default setting and can be changed in the /root/.s3cfg file) and then uploaded to the Riak-CS storage.
The recursive (-r) can be used with other S3CMD commands as well, such as list (ls) to list the folders recursively.
# s3cmd ls -r s3://yoyo-01
7) Deleting Objects:
Syntax: s3cmd del s3://<unique_bucket_name>/<filename>
# s3cmd del s3://yoyo-01/testfile
8) Deleting Buckets:
Syntax: s3cmd rb s3://<unique_bucket_name>
# s3cmd rb s3://yoyo-01
NOTE: You cannot delete a bucket that has any content present in it. Make sure you delete the objects in the Bucket first before you try to remove the Bucket itself.
Well, that's all there is to this post folks! Stay tuned for much more coming your way soon!!