Skip to main content

Using MinIO as on premises object storage with .NET and S3 SDK

Ever tried to find a blob store that can work on-premises as well as in a cloud, support meta-data, scale well and have .NET client libraries?

I did and stopped on MinIO. Well, honestly to my surprise I was quite limited in my choice. It's free, it's open-source, it can work on-premises and has helm charts for k8s. The best thing is that its S3 compatible, so if one day you move to the cloud the only thing you`ll need to change in your code is a connection string.

The easiest way to start is by starting a docker image. Pull the image:
docker pull minio/minio

start for testing (data will be part of the container, so after a restart, all files will be gone
docker run -p 9000:9000 minio/minio server /data

Or start with a mapped image in windows:
docker run -p 9000:9000 --name minio1 \
  -v C:\data:/data \
  minio/minio server /data

When the server is up you can access it by http://127.0.0.1:9000/minio/login
default user/password:
minioadmin/minioadmin

Working with .NET

Minio has its own .NET Client
https://docs.min.io/docs/dotnet-client-quickstart-guide.html
but it looked quite raw to me. For example, I`m not sure why GetObjectAsync is not returning Blob info even though internally it loads it every time, this way I have to make one extra call for each file. Or why stream operation is an Action and some of the operations with steams and files internally are not async. Anyway, it's open-source (https://github.com/minio/minio-dotnet),so you can take a look yourself or even contribute :)

Using S3 .NET SDK with MinIO

So, after reviewing Minio SDK I decided to give a try to native amazon S3 SDK
https://www.nuget.org/packages/AWSSDK.S3/(remember, MiniO has S3 compatible API)

I had to play a bit with connection params, but quicky found a combination that worked and allowed to connect to MiniO on premises:
var awsConfig = new AmazonS3Config()
           {
                ServiceURL = "http://127.0.0.1:9000",
                ForcePathStyle = true, 
                UseHttp = true 
           };

AWSCredentials creds = new BasicAWSCredentials(config.AccessKey, config.SecretKey);
_s3Client = new AmazonS3Client(creds, awsConfig);
Now we can store the first object:
var request = new PutObjectRequest();
request.BucketName = bucketName;
request.Key = $"{Guid.NewGuid()}{Path.GetExtension(incomingFile.FileName)}";;
request.ContentType = incomingFile.ContentType;
request.InputStream = incomingFile.Data;

await _s3Client.PutObjectAsync(request, cancellationToken).ConfigureAwait(false);

Some things to pay attention to: 

1) When you work with metadata you need to prefix every key with "X-Amz-Meta-" or it will be added during persistence.
const string S3Prefix = "X-Amz-Meta-"; 
const string FileNameField = S3Prefix + "Filename";
Metadata[FileNameField] = fileName;

2) Bucket name has limitations like it should start with a lowerCase letter. In some cases, SDK will throw an exception, but in others, you might just receive an empty object as a response, so its good to conform your bucket name for both create and get.

3) There is no built-in method to check if a bucket exists. I decided to use 'GetBucketTaggingAsync' method as doesn`t throw when you try to access nonexisting bucket.

async ValueTask<bool> CreateBucketIfNotExist(Bucket bucketName, CancellationToken cancellationToken = default(CancellationToken))
{
   var request = new GetBucketTaggingRequest { BucketName = bucketName };
   var result = await _s3Client.GetBucketTaggingAsync(request, cancellationToken);
   if (result?.HttpStatusCode == System.Net.HttpStatusCode.NotFound)
   {
       await _s3Client.PutBucketAsync(bucketName, cancellationToken);
       return true;
   }
   return false;
}

Getting and processing the data is quite easy
var request = new GetObjectRequest {Key = objectId, BucketName = bucketName};
GetObjectResponse result= await _s3Client.GetObjectAsync(request, cancellationToken);
await result.ResponseStream.CopyToAsync(fileStream); 

As you see now we receive all object headers (metadata) together with a reference to a stream in one call

That's it, hope that was useful.

Comments

Popular posts from this blog

Avoiding distributed transactions (DTC) with SQL Server and async code

Wrapping async code in transaction scope is not as straightforward as sync one. Let's say we have some simple code: await using (var connection = new SqlConnection(connectionString)) { await using var command = new SqlCommand("select 1", connection); await connection.OpenAsync(); await command.ExecuteScalarAsync(); } We can wrap it in transaction scope and test that it still works: using var ts = new TransactionScope(); await using (var connection = new SqlConnection(connectionString)) { await using var command = new SqlCommand("select 1", connection); await connection.OpenAsync(); await command.ExecuteScalarAsync(); } ts.Complete(); But if you try to run this code you will get: "A TransactionScope must be disposed on the same thread that it was created" exception.  The fix is easy: we need to add TransactionScopeAsyncFlowOption.Enabled option to the constructor: var options = new TransactionOptions { IsolationLevel = IsolationLevel.ReadCom...

Fluent-Bit and Kibana in Kubernetes cluster or minikube

Agenda I`ll show how to setup a centralized logging solution running in k8s cluster that works beyond hello world examples.I`ll use local minikube but the same charts with adjustments could be used for normal k8s cluster (the real diff usually comes with usage of persistent storage). What you need to be installed: K8s Cluster (as I said, I use minikube ) Helm ( https://helm.sh/docs/intro/install/ ) Code: https://github.com/Vfialkin/vf-observability A bit of theory first: Let’s start with how logging works by default in Docker and Kubernetes. application log appender should forward logs to standard output, this way it will be passed to Docker container.  default container logging driver will forward them to Pod where logs are stored as JSON files (see: configure logging drivers ). There are other options for log drivers like  syslog, fluentd or splunk , but for now, I’ll limit scenario to default driver. at the end all those files will end-up in a node folde...