I’ve talked a lot about MITCNC’s migration to Stripe and the underlying software for the WordPress site. This post wraps up the MITCNC series by detailing how we handle data and reporting. Ironically, despite being last in the series of posts, this was my first project with MITCNC. I started work on this project because we didn’t have a good, automated, method of combining data from our disparate systems. We were using iModules for membership management and Eventbrite for event registrations. Answering basic questions such as, “how many members vs. non-members attended an event,” or “what is the breakdown of attendees by class year,” was not possible without a lot of manual effort.
At almost every company I worked at—VistaPrint, edX, Stripe—we had a data warehouse and reporting tools that made answering questions about our users and usage patterns as simple as writing a SQL query. I wanted something similar for MITCNC, and I also wanted the ability to re-run saved reports to generate charts and graphs for our leadership meetings.
The first place I looked was AWS Redshift, which I had used at edX and Stripe. Unfortunately the costs—$180 per month—were prohibitive given our very small scale. Our data only takes up about 8MB! My second (and final) stop was Google Cloud’s BigQuery offering. The pricing—free—is great, and we can generate graphs and chargers using Data Studio. Now that I had a place to store data, I needed to figure out how to extract and transform it.
It turns out Google Cloud offered low-priced tooling for these tasks as well in the form of Cloud Functions and Cloud Scheduler. Every data source we rely upon has a cloud function that runs once every 4 to 24 hours, depending on the nature of the data. These functions are responsible for downloading data from a data source (e.g., Stripe, WordPress database), transforming it, and loading it into a BigQuery table. We currently have functions for the following data sources:
The iModules memberships function is actually two functions. One downloads data to a storage bucket while the other loads that CSV into BigQuery. Why the split? We don’t have access to an API for iModules. An API exists, but grants access to all MIT alumni data rather than only MITCNC data, so we cannot use the API. Instead, we run a headless browser that logs into the iModules admin interface, clicks buttons to run a saved report, downloads the report, and uploads it to the storage bucket. I am simultaneously proud of and horrified by this script but, it gets the job done.
Consolidating all of our data into a single location helps MITCNC as well as MITAA. Since our move to Stripe, I wrote a script that allows an MITAA process to connect to BigQuery and download membership data in a format that is easily imported into MITAA systems. This ensures reporting for both organizations remains intact and up-to-date, regardless of the underlying system that manages the memberships or events.
All of our scripts are available for others to use and expand at https://gitlab.com/mitcnc/reporting. The scripts make use of the following Google Cloud components:
Our usage of BigQuery from a data and query perspective is quite low, so it’s essentially free. I pay less than $1 per month, with the majority of that going to Secret Manager and Cloud Scheduler, followed by Storage (because I haven’t cleaned up old CSV files for the past 1.5 years).
I wish I could share some of our data and visualizations, but they aren’t mine to share. However, if you have questions, don’t hesitate to ask. If you have ideas for improving MITCNC, or the alumni experience in general, please share!