Updating README

2020-04-16 14:30:16 +02:00
parent ca0c55f052
commit e206835415
1 changed files with 235 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1,5 +1,239 @@
+# Karaoke tracks from normal songs/videos

-# Welcome to your CDK Python project!
+I'm going to play with [spleeter][https://github.com/deezer/spleeter]
+to explore ideas around home/portable karaoke station for modern times.
+
+It works! I can obtain an mp3 file with only the music, ready to sing over.
+
+## Putting it on the cloud
+
+To turn this into anything useful it would be nice to have the following:
+
+- a simple domain (like deepkaraoke.me)
+- an intuitive interface. You drop a file or a link to a video and the process starts.
+  When it's done it turns into a link, ready to share.
+- the conversion process should run on a serverless engine (AWS possibly)
+- the frontend should be distributed via a CDN
+
+After a failed attempt in packaging spleeter for lambda, I looked for viable  alternatives.
+
+The most promising solution for running a custom docker container on demand (scaling from and to 0 instances), is AWS Batch.
+
+1. You need to *define a job*
+  - a name
+  - a role
+  - container image
+  - command
+  - vcpus (1 vcpu is equivalent to 1024 CPU shares)
+  - memory (hard limit)
+  - job attempts
+  - parameters (if used in command definition)
+  - environment variables
+
+2. Configure the *compute environment* and *job queue*
+
+you submit your jobs to a job queue that stores jobs until the AWS Batch scheduler runs the job on a compute resource within your compute environment.
+
+- compute environment name, specify a unique name for the compute environment
+- for Service Role, choose to create a new role that allows the AWS Batch service to make calls to the required AWS APIs on your behalf. (AWSBatchServiceRole is required).
+- a job queue name
+
+I've managed to build my simple collaboration of components for this kind of tasks, using AWS Batch.
+The service allows to run batch jobs with a custom docker image, submitting the job instances with a simple API call.
+I'm using docker hub, since my project will be open source. A realistic example should include ECR (a docker privare registry) for hosting images.
+
+Now I need a lambda function to submit a new job for each new audio file being created in the input S3 bucket.
+
+  S3 bucket -> event -> trigger_function -> AWS Batch job submitted
+
+### What does not work with this approach
+
+AWS Batch imposes *strict* requirements for memory allocation. If your process exceeds
+that memory amount, the job gets instantly killed.
+You can solve this allocating more resources in the computing environment, but I've noticed that doing so, the job will be less likely to be selected for execution in a 
+timely fashion (my experiments easily took more than half an hour to start working).
+
+While AWS Batch represents a valid solution for pure batch workloads, I was thinking
+the conversion process something that would start as soon as the user uploads a new
+file, and that runs in background, like many lengthy tasks that web applications
+offload out of process to something else.
+
+This could be an AWS Fargate Service, which again will be running our spleeter based
+process within a Docker container. The container this time will run continuously based
+on a Fargate Task definition, which will automatically scale the number of instances
+when required.
+
+The complexity of provisioning the required computing resources to properly run a 
+container on Fargate is not trivial.
+That would certainly be another job for Terraform, but I got caugth by this great
+live demo, of building the same type of application, using AWS Cloud Development Kit.
+
+Think about it as a typed *Python* wapper on top of CloudFormation (in reality the Python support is achieved through a language level binding over the Typescript implementation), which is able to generate CloudFormation templates, resolving all references and reusing stacks as you would do with your code libraries.
+
+### CDK based solution
+
+#### The worker stack
+
+A `QueueProcessingFargateService` could be used to implement a separator worker, which will get items
+(an S3 url to a file) from a job queue.
+
+Files will be read from an input S3 bucket and output written back to an output bucket.
+Worker pool will automatically scale when needed.
+
+#### The HTTP API
+
+As a frontend, we need an HTTP API, to perform at least the following operations:
+
+  - push a new job and return an id
+  - get the status of a job, given it's id
+
+In my mind, this is more a job for a lambda function (or a group of lambda functions), especially in 2020, but in
+order to get a first working version sooner, I would stick to a daemon style container, exposed to the internet
+ using an Application load balancer, which happen to be directly supported by the high level patterns library 
+ included in CDK.
+
+An `ApplicationLoadBalancedFargateService` stack would do the trick then.
+
+##### Installing CDK
+
+CDK command line is needed during local development and later will be needed on CircleCI.
+I installed the CDK command line (which is based on NodeJS) using:
+
+  brew install aws-cdk
+
+To test it worked, I run:
+
+  cdk --version
+  1.32.2 (build e19e206)
+
+##### AWS credentials
+
+I will use the same credentials from my personal AWS account to use CDK.
+I created a new user `aws-cdk` within my personal account and gave him the following permissions:
+
+  - SystemAdministrator
+  - AWSCloudFormationFullAccess
+
+As a personal memo, I will use the new `personal-cdk` profile (in `~/.aws/credentials`).
+
+It's *important* to export the following environment variable:
+
+  export AWS_PROFILE=personal-cdk
+
+##### Creating the CDK project
+
+I had to create a new empty directory to successfully run:
+
+  cdk init app --language python
+
+which conveniently created a blank project which includes a local virtualenv.
+To activate the virtual environment for the project, issue the following command:
+
+  source .env/bin/activate
+
+and then, to install the required dependencies:
+
+  pip install -r requirements.txt
+
+
+##### Bootstraping the CDK toolkit
+
+CDK toolkit need some resources to run its lifecycle commands.
+An S3 bucket and a CloudFormation stack will be created performing the following
+command:
+
+  cdk bootstrap
+
+Ideally this command should be run only once.
+This is how it went for me:
+
+  (.env)  AWS: personal-cdk  domenico@domecbook  ~/dev/experiments/karaokeme-cdk   master ●  cdk bootstrap
+                <aws:personal-cdk>
+  ⏳  Bootstrapping environment aws://521097367553/eu-west-1...
+  CDKToolkit: creating CloudFormation changeset...
+  0/3 | 12:28:08 PM | CREATE_IN_PROGRESS   | AWS::S3::Bucket       | StagingBucket
+  0/3 | 12:28:09 PM | CREATE_IN_PROGRESS   | AWS::S3::Bucket       | StagingBucket Resource creation Initiated
+  1/3 | 12:28:31 PM | CREATE_COMPLETE      | AWS::S3::Bucket       | StagingBucket
+  1/3 | 12:28:33 PM | CREATE_IN_PROGRESS   | AWS::S3::BucketPolicy | StagingBucketPolicy
+  1/3 | 12:28:34 PM | CREATE_IN_PROGRESS   | AWS::S3::BucketPolicy | StagingBucketPolicy Resource creation Initiated
+  2/3 | 12:28:34 PM | CREATE_COMPLETE      | AWS::S3::BucketPolicy | StagingBucketPolicy
+  3/3 | 12:28:35 PM | CREATE_COMPLETE      | AWS::CloudFormation::Stack | CDKToolkit
+
+After that I've configured my Visual Studio Code with the AWS Toolkit
+extension, which is really useful and easy to use.
+
+#### Coding the infrastructure with CDK
+
+I've started with refactoring the application code as suggested
+in the demo video.
+
+There will be shared resorces, and possibly references between the
+ stacks. For this reason the Application object can hold a reference
+ to the other component stacks.
+
+After some coding (pretty east to be honest), it was time to deploy, which I did with the following command:
+
+  cdk deploy '*'
+
+Using two simple placeholder containers and after a couple of iterations 
+where I had to destroy the stacks (some missing permissions for the AWS
+ user), it worked as expected.
+
+A very nice feeling.
+
+During the next iteration I will need to perform the "actual" work that my
+projects need to do.
+
+The real API container must be able to enqueue a new job, both writing to
+a DynamoDB table the entry and pushing it to the processing queue monitored
+by the worker pool.
+
+The worker, is also able to update the DynamoDB table to reflect the job 
+status in the long-term storage.
+
+#### Refining the architecture
+
+The *HTTP API* will expose the following operations:
+
+
+POST /jobs -> {job_id: 4j6h5k6e5h6jke5hjhj5, status: "uploading"}
+
+it will perform the following things:
+
+- generate a unique ID and a unique `s3_key`
+- return a pre-signed URL to upload a file to `s3_key`
+- add a row to the DynamoDB table including the `job_id`, the `s3_key` and the status `uploading`
+
+At this point it's possible for the caller to upload a file using the pre-signed URL
+returned by the first request.
+
+Once the upload has completed client will call
+
+POST /jobs/:id/process
+
+This will trigger the processing, performing the following:
+  - read the object from the DynamoDB table
+  - add an object to the processing queue, including the `s3_key` and the `job_id`
+  - update the DynamoDB row with the status `processing`
+
+GET /jobs/:id
+will return the job status or 404
+
+GET /jobs
+will return the full list of the jobs
+
+I could quickly implement it with Flask and boto3. 
+Of course I've omitted a number of things for a production ready
+setup, like running flask on a WSGI container and didn't write 
+any tests yet (which I'm confident can be arranged with pytest and moto/localstack).
+
+The *worker cluster* will need to slightly adjust the one 
+I wrote for the terraform version.
+Some remarks:
+- I had to add a polling mechanism to peek for messages from 
+the associated processing queue
+
+### CDK Python instructions

 This is a blank project for Python development with CDK.