Smart Music Helps Musicians Practice More Efficiently

02/10/20 • 55 min

In this episode of Running in Production, Julien Blanchard goes over building Smart Music which uses a combination of Rails, Phoenix and .NET Core. It has roughly half a million users and it’s all hosted on AWS with EKS, ECS and Elastic Beanstalk. It’s been up and running since 2016.

There’s around 20 developers working on the project. We talked about managing git repos with a few apps, TDD, using GraphQL with Phoenix, contexts, multiple databases with Rails, InfluxDB, GitHub Actions and tons more.

Topics Include

2:41 – Roughly half a million users are on the platform (~1.5k requests a minute at times)
3:27 – What Rails, Phoenix and .NET Core are being used for
5:38 – End users of the site interact with the Rails app and .NET Core is for authentication
6:10 – It’s an API back-end driven app and React / EmberJS is used on the front-end
6:35 – Motivation for using Phoenix and Elixir for the data ingesting app
9:28 – About 20 developers work on all of the different parts of the site
9:55 – Organizing the git repos for each of the apps
10:34 – The back-end code has many tens of thousands of lines of code
11:04 – TDD is something their company practices and they like it a lot
12:24 – A JS front-end makes sense for this app since the UI is live and dynamic
13:17 – Trying to visualize a live sheet music application that helps you learn notes
14:02 – Maybe Phoenix LiveView could have been used, but they prefer what they chose
14:33 – The TL;DR on GraphQL and why in this case it works better than a RESTful API
17:55 – Docker isn’t being used in dev, but Kubernetes is being used in production
18:29 – PostgreSQL, InfluxDB and Redis are used to manage the data and for caching
19:32 – They knew from the start that InfluxDB would be needed to store the time data
20:33 – Redis is being used as a cache through AWS ElastiCache
21:49 – nginx is sitting in front of the Rails application with Elastic Beanstalk
22:44 – Motivation for picking AWS over Google Cloud and other providers
23:40 – AWS Aurora is being used to manage PostgreSQL
24:51 – They are using the Rails 6.x feature to select multiple databases
25:33 – Rails is very nice when it comes to getting community driven features merged in
26:08 – Julien also really likes Phoenix and here’s how they use contexts
28:50 – File uploads are sent directly to S3 using the ex_aws Elixir library
30:02 – For Kubernetes, they are using the managed EKS service from AWS
31:07 – (2) pretty beefy boxes with 8 GB of memory power the EKS cluster (overkill)
31:36 – They are still feeling out the resource usage of their services
32:18 – (20)’ish EC2 instances power the Elastic Beanstalk set up for the Rails app
32:54 – CloudFront is being used as a CDN for book covers but not much else
33:38 – Walking us through a code deploy from development to production
34:46 – Getting rid of Jenkins is the next step but GitHub Actions is a bit insecure currently
35:49 – GitHub Actions is a great tool and it’s being used for more than just CI
36:44 – You can use GitHub Actions to run tasks periodically (separate from git pushes)
37:27 – Dealing with big database migrations with scheduled down time
38:54 – New Relic and the ELK stack take care of metrics and logging
39:18 – Sentry.io (self hosted version) is being used to track exceptions
39:42 – The time series data doesn’t end up getting logged by these tools
40:20 – Braintree is used as a payment gateway to handle credit card and PayPal payments
41:44 – Transactional emails are being sent through AWS SES
42:24 – In app notifications are coming soon to SmartMusic (websockets, etc.)
44:05 – Another use case for websockets and events will be for collaboration features
44:49 – The databases are backed up daily and S3 is very redundant in itself for user files
45:31 – VictorOps handles alerting if a service happens to go down
45:58 – New Relic is being used in a few of the applications
46:55 – Handling bot related issues with nginx’s allow / deny IP address feature
48:46 – Best tips? Make a solid proof of concept in your new tech before switching to it fully
50:36 – Biggest mistake? Trying to use your old coding habits in a different language
51:27 – Dealing with N + 1 queries with GraphQL using DataLoader
52:58 – Ecto Multi is awesome for ensuring multiple things happen successfully
54:10 – Check out Julien’s blog, @julienXX on GitHub and he’s on

Topics Include

2:41 – Roughly half a million users are on the platform (~1.5k requests a minute at times)
3:27 – What Rails, Phoenix and .NET Core are being used for
5:38 – End users of the site interact with the Rails app and .NET Core is for authentication
6:10 – It’s an API back-end driven app and React / EmberJS is used on the front-end
6:35 – Motivation for using Phoenix and Elixir for the data ingesting app
9:28 – About 20 developers work on all of the different parts of the site
9:55 – Organizing the git repos for each of the apps
10:34 – The back-end code has many tens of thousands of lines of code
11:04 – TDD is something their company practices and they like it a lot
12:24 – A JS front-end makes sense for this app since the UI is live and dynamic
13:17 – Trying to visualize a live sheet music application that helps you learn notes
14:02 – Maybe Phoenix LiveView could have been used, but they prefer what they chose
14:33 – The TL;DR on GraphQL and why in this case it works better than a RESTful API
17:55 – Docker isn’t being used in dev, but Kubernetes is being used in production
18:29 – PostgreSQL, InfluxDB and Redis are used to manage the data and for caching
19:32 – They knew from the start that InfluxDB would be needed to store the time data
20:33 – Redis is being used as a cache through AWS ElastiCache
21:49 – nginx is sitting in front of the Rails application with Elastic Beanstalk
22:44 – Motivation for picking AWS over Google Cloud and other providers
23:40 – AWS Aurora is being used to manage PostgreSQL
24:51 – They are using the Rails 6.x feature to select multiple databases
25:33 – Rails is very nice when it comes to getting community driven features merged in
26:08 – Julien also really likes Phoenix and here’s how they use contexts
28:50 – File uploads are sent directly to S3 using the ex_aws Elixir library
30:02 – For Kubernetes, they are using the managed EKS service from AWS
31:07 – (2) pretty beefy boxes with 8 GB of memory power the EKS cluster (overkill)
31:36 – They are still feeling out the resource usage of their services
32:18 – (20)’ish EC2 instances power the Elastic Beanstalk set up for the Rails app
32:54 – CloudFront is being used as a CDN for book covers but not much else
33:38 – Walking us through a code deploy from development to production
34:46 – Getting rid of Jenkins is the next step but GitHub Actions is a bit insecure currently
35:49 – GitHub Actions is a great tool and it’s being used for more than just CI
36:44 – You can use GitHub Actions to run tasks periodically (separate from git pushes)
37:27 – Dealing with big database migrations with scheduled down time
38:54 – New Relic and the ELK stack take care of metrics and logging
39:18 – Sentry.io (self hosted version) is being used to track exceptions
39:42 – The time series data doesn’t end up getting logged by these tools
40:20 – Braintree is used as a payment gateway to handle credit card and PayPal payments
41:44 – Transactional emails are being sent through AWS SES
42:24 – In app notifications are coming soon to SmartMusic (websockets, etc.)
44:05 – Another use case for websockets and events will be for collaboration features
44:49 – The databases are backed up daily and S3 is very redundant in itself for user files
45:31 – VictorOps handles alerting if a service happens to go down
45:58 – New Relic is being used in a few of the applications
46:55 – Handling bot related issues with nginx’s allow / deny IP address feature
48:46 – Best tips? Make a solid proof of concept in your new tech before switching to it fully
50:36 – Biggest mistake? Trying to use your old coding habits in a different language
51:27 – Dealing with N + 1 queries with GraphQL using DataLoader
52:58 – Ecto Multi is awesome for ensuring multiple things happen successfully
54:10 – Check out Julien’s blog, @julienXX on GitHub and he’s on

Previous Episode

VA.gov Provides an API to Get Information about Veterans

In this episode of Running in Production, Charley Stran goes over building the developer.va.gov API with Ruby on Rails and React. It’s running on 10+ auto scaling EC2 instances on AWS GovCloud and has been since mid-2018.

There’s around 140,000+ lines of code and ~20 developers. We covered what it’s like working on government contracts, how AWS GovCloud is different than the regular AWS platform, the code base being open source, code reviews and a whole lot more.

Topics Include

2:17 – 20 developers (~50 people total) run just the developer.va.gov site
3:10 – The platform has been up and running for 18+ months
4:28 – Motivation for using Ruby on Rails
5:55 – The application is running Rails 5.2, but they want to upgrade to 6.x
6:14 – It’s currently a single Rails monolith but it may get broken up at some point
8:13 – What’s it like working on a government contract?
9:13 – The app is roughly 140,000+ lines of code which is API driven and uses React
10:25 – The entire application is open source on GitHub (to my surprise)
11:32 – What makes React a good fit for this application? Complicated forms mostly
13:56 – The VA has their own UI design specifications publicly posted
15:09 – Tailwind CSS isn’t being used but Charley likes it
16:07 – Docker is being used in production and it runs on AWS GovCloud
17:59 – PostgreSQL and Redis are used but there’s not a ton of data in the DB
18:45 – How AWS GovCloud is different than the regular AWS platform
20:32 – It’s all on EC2 instances that’s managed by Terraform and Ansible
21:15 – They use Auto Scaling Groups, CloudWatch, SNS, Elasticsearch and more
22:45 – Sentry.io is being used for error reporting
23:03 – Getting external services approved for usage on AWS GovCloud
23:56 – On average 10-15 t3.large instances power the web servers, but it fluctuates a lot
25:41 – The EC2 instances are running the Amazon Linux 2 AMI
26:35 – Each deploy takes about 20 minutes to run from start to finish
27:28 – Charley walks us through deploying from development to production
29:24 – So far he hasn’t had to get woken up at 3am (except from his 2 year old)
30:07 – Jenkins controls their CI pipeline, which is kicked off from git pushing code
30:54 – With multiple instances and an ELB, there are zero downtime deploys
31:16 – Database migrations can sometimes get complicated
32:14 – They aim for 90%+ test coverage
33:10 – Between 2 and 5 developers typically review code before it gets merged
33:52 – Their team works remotely and waiting for builds can get interesting
35:08 – Rubocop analyzes the code base along with Code Climate
35:50 – A “development” environment exists on AWS but developers run the code locally
36:45 – VCR is used to help cache remote API calls to other VA systems
38:27 – Each API has its own version
39:47 – Attempting to get rid of the need for fax machines
40:41 – All of the data is backed up and recovery would be quick if something went wrong
42:18 – How is Terraform being used?
43:03 – Best tips? With undocumented APIs, write tests and pry into the details
44:10 – Biggest mistakes that were corrected? The mocking layer
45:17 – Every developer is accountable for their work and will help to resolve issues
46:27 – Charley’s consulting company Oddball is hiring and you can also find him on Twitter

Links

📄 References

Next Episode

6DOS Helps You Explore Your Personal Network

In this episode of Running in Production, Henry Popp goes over building a platform to help explore your personal network which was built using Phoenix and Elixir. It’s hosted on Google Cloud using a self managed Kubernetes cluster. It’s been up and running since September 2019.

Henry went into great detail about the value of using a service oriented architecture, DDD, event driven design and running a self managed Kubernetes cluster. There’s a lot of great insights in this episode around general code design and scaling that apply to any web framework.

Topics Include

2:11 – 4 developers are actively working on the project
2:50 – It’s been running in production since September 2019
3:13 – Motivation for using Phoenix and Elixir
4:26 – Henry started using Elixir back in late 2014
5:48 – Ditching Umbrella apps for dedicated services
7:35 – 6DOS is built on 6 independent git repos with a service oriented architecture
8:20 – A break down of what those 6 repos are and what they do
10:37 – Each service has its own independent database (Postgres, Neo4j, Elasticsearch)
11:21 – Neo4j is a graph database which is a great fit for their main data model
12:55 – How is Elixir support for Neo4j?
13:46 – Each service talks to each other through RabbitMQ events / notifications
15:43 – Walking through the request / response cycle when a visitor hits the site
17:04 – How did you arrive at this service oriented architecture?
18:33 – It’s easy to get Domain-driven Design (DDD) wrong initially
19:42 – Are Phoenix contexts being used? Nope
20:20 – Monoliths vs micro-services vs something in between and industry trends
20:56 – “Instantaneous complexity”
21:39 – Using an app skeleton project to help spin up new services quickly
23:23 – Using VueJS on the front-end with Webpack, but not through Phoenix
24:43 – Currently 6DOS doesn’t need websockets but that could change later
26:47 – Quite a lot of work happens in the background
27:37 – RabbitMQ handles queueing up all of the jobs
29:10 – Docker is being used in production, but not in development (yet)
29:38 – The work flow for starting everything up locally in development
30:52 – Everything is hosted on a self managed Kubernetes cluster on GCP
31:19 – (3) 2 core master nodes, (3) 2 core worker nodes and extra servers for the databases
32:24 – The self managed Kubernetes cluster was terrifying to set up initially
34:00 – Kubernetes is not a magic button you press to scale your application
35:15 – Auto-recovering from a CrashLoopBackOff error with Kubernetes
37:45 – Those 2 CPU core servers have 8 GB of RAM but the app isn’t using all of that
38:47 – Handling an interesting auto-scaling problem with Kubernetes
40:20 – Performing rolling restarts so there’s no down time for each deploy
40:41 – Dealing with restarting containers while an important action is happening
43:23 – Walking through the deploy process from development to production
43:34 – It starts with a self hosted Gitlab instance with automated CI
44:15 – On the other side, Keel takes over to automate deploying any services
45:12 – Helm is being used for a few things, but not everything
46:17 – Humans needing to accept the release happens within Keel’s UI
47:51 – Secrets are stored directly in the self hosted config repo with strict access rights
49:09 – Balancing your time between low level infrastructure vs app features
49:58 – Handling SSL certificates on the cluster with cert-manager
51:06 – Everything is behind a Cloudflare proxy too
51:20 – Dealing with database migrations when you have automated deployments
52:40 – Migrations get run as part of the app boot up process
54:24 – Design your software like a space ship
55:16 – Diagnosing errors with custom tasks and 3rd party tools
56:23 – No one can agree on how to format API JSON errors
57:32 – Elixir best practices are still being discovered, the future is bright
58:19 – An example of one of Henry’s open source Elixir tools (Pigeon) taking off
59:14 – All of the databases get backed up hourly
1:00:26 – Kubernetes really needs to be configured
1:01:16 – Rate limiting is currently being added to all of the services
1:03:07 – What about alerts if something goes down? It’s a digital notification bomb
1:03:36 – Using UptimeRobot as a second sanity check to make sure things are up
1:04:12 – Hung over at 6am out in the middle of the woods and your server goes down
1:04:55 – Using an external tool like UptimeRobot is worth it<...