Generating reports and KPIs with throw-away databases on AWS

All of us love metrics. All of us want numbers. And completely different stakeholders want completely different numbers. Numbers that may drive key selections inside your group and to your clients. Turning into a knowledge pushed group requires having dependable knowledge within the first place.

No matter is day by day, weekly or month-to-month, there may be at all times a brand new report back to be generated. In Funambol most of those reviews are generated operating a proprietary Java utility that queries the manufacturing MySQL databases. As soon as created, they’re encrypted and delivered to the completely different inside and exterior stakeholders.

If it ain’t broke don’t repair it?

However how will you generate correct reviews, optimize sources, enhance automation with out querying and impacting the dwell deployment? Allow us to first see what points we confronted whereas operating the reviews on the manufacturing database:

  • the execution time of those reviews may differ between a couple of minutes and plenty of hours in keeping with the dimensions of the deployment. And may be affected by the prevailing load on the manufacturing environments. Ought to we dimension the database for the peak load, dealing with site visitors and producing reviews on the identical time? Ought to we dedicate a learn duplicate to reporting exercise solely? Your cloud supplier shall be glad, you’ll be paying however gifting away CPUs and storage.

A drop

  • the lengthy operating queries on the database make it very onerous to attain elasticity. Both you scale your database otherwise you maintain the question operating. You can not obtain each of them and a failure within the report may be expensive.

  • reporting and customers are completely different eventualities. And require very completely different configuration and indexes on the database layer. Compromising to cowl each suggest suboptimal performances of the database. Working reporting on an remoted deployment goes to assist the manufacturing database too as you’ll be able to do away with indexes you do not want anymore. And you’ll save storage and some write IOPS.

  • the outcomes may not be correct as knowledge can change in the course of the execution. It’s a dwell system. If the variety of customers is 1000, you can not report that there are 991 lively and 10 idle customers, as somebody signed up when you generated the report. The distinction may not be vital however what concerning the confidence degree of your stakeholders?

What’s a throw-away database?

We’re all aware of the “Pets vs. Cattle” analogy in software program growth. However right here I outline a throw-away database, our cattle database, as one which we’ve got operating for the minimal time we have to deal with our reporting.

A cow

Allow us to see the way it helps us in decoupling KPI technology and the customers, automating the method as effectively. Funambol deploys its infrastructure on AWS so going ahead on this submit I’ll draw an AWS Structure Diagram and use AWS names and examples. However a lot of the steps may be achieved utilizing the infrastructure and the companies offered by any of the foremost cloud suppliers.

AWS deployment

Allow us to assume you could have an online utility that connects to an Amazon RDS Multi-AZ Deployment. How can we decouple the report technology? We will draw first a diagram with the brand new elements on the appropriate facet.

A throw-away workflow for our KPIs

Allow us to describe the primary steps that we’ve got to place in place in our new workflow, beginning the method from a administration server. In all these we’re utilizing snippets that depend on the AWS CLI, focusing solely on the important thing steps, however you’ll be able to obtain the identical utilizing the AWS SDK within the language you want most.

  • snapshotting the database: one of many first steps is to have the info frozen on the time once we wish to generate the report. We obtain that by making a storage quantity snapshot of the manufacturing DB occasion. For instance, if we wish to generate the KPIs for a system at midnight, that’s the time once we set off the creation of the snapshot:

aws rds create-db-snapshot --db-snapshot-identifier "<my-kpi-snapshot>" --db-instance-identifier "<my-production-db>"

  • we will now create the cattle database from the snapshot: as soon as the snapshot is out there, we create a brand new RDS occasion:

aws rds restore-db-instance-from-db-snapshot --db-snapshot-identifier"<my-kpi-snapshot>" --db-instance-identifier "<my-kpi-db>" (...)

  • we will now configure a distinct parameter group that shall be used for the KPIs solely and can enable to tune the database to the particular queries we’re going to run. We introduce new indexes as effectively and we will optimize the occasion class and kind.

aws rds modify-db-instance --db-instance-identifier "<my-kpi-db>" --vpc-security-group-ids "my-kpi-environment" --backup-retention-period 0 --apply-immediately

  • we will now set off our course of to generate the KPIs on our new non permanent surroundings.

Flow on AWS

  • As soon as the outcomes can be found and delivered to the stakeholders we will **flip off the database **and cease paying for it. We are going to nonetheless have the snapshot we used and will simply reprocess the info if we’ve got any problem.

aws rds delete-db-instance --db-instance-identifier "<my-kpi-db>" --skip-final-snapshot


As any impression to the supply of the reviews might severely impression key stakeholders, we have to have a course of in place to watch and react promptly if the method fail. We will as effectively push the outcomes to CloudWatch Logs and even depend on pattern evaluation / forecasting, implementing our personal algorithms or utilizing CloudWatch Anomaly Detection.

CloudWatch Anomaly Detection


Allow us to see the advantages of including a throw-away database to our deployment:

  • elasticity and value optimization: the manufacturing database doesn’t have to deal with sporadic peaks to generate the reviews and indexes, occasion sizes and configuration may be optimized each on the manufacturing and KPI databases to raised deal with the particular requests. The additional non permanent value of the second (single zone) RDS was considerably lower than the typical value saving on the manufacturing databases because of the cutting down doable in a lot of them.

  • dependable and on-time outcomes: all of the reviews and KPIs at the moment are generated on a database that represents the frozen standing on the particular time: no altering knowledge anymore in the course of the execution of the instruments. The supply time of the reviews is extra predictable as it’s not affected by the variable load of the manufacturing database. Failures in producing the reviews are noticed shortly and might simply and promptly be rectified as we’ve got the frozen standing of the database (the snapshot).

  • automation: a standard automated course of is used throughout all of the dwell deployments of Funambol to generate completely different reviews and KPIs.

  • decoupling: we will optimize database configuration and indexes in keeping with the decoupled use case. As the brand new resolution is now unbiased of the manufacturing database, it makes simpler to plan additional enhancements: as the dimensions of the biggest deployment enhance, we might course of particular knowledge and transfer and consolidate them in a standard warehousing resolution.

Thanks for making it this far! I’m at all times searching for suggestions to make it higher, so please be happy to achieve out to me by way of LinkedIn or email.


This submit initially appeared on the Funambol Tech Blog

Add a Comment

Your email address will not be published. Required fields are marked *