Amplitude is a fantastic analytics tool. At Penny, we use Amplitude for quick-and-dirty funnel tracking through our onboarding flow, referrer tracking to our website, and usage tracking of our app. It’s versatile, easy to use, and free for us since we’re so small!
Amplitude’s enterprise plan gives you access to all the data you ship to Amplitude, but if you’re on the free plan, there’s no fast solution to get your data out if you want to run a more complex custom query. Owning your data for freedom of querying is incredibly useful, but it usually comes at a cost: you have to either pipe your data to both your systems and Amplitude’s, or rebuild Amplitude. Neither is a great solution, especially at our current stage.
Luckily, Amplitude has an export API that allows you to download your raw event data. This would normally be a gigantic endeavor if your app has gigabytes of information in Amplitude, but we’re so small that it took just two minutes to download the data and insert it into a MySQL database. From there, we were able to join Amplitude’s event data with our own data for valuable insights into how we could make Penny better.
Importer Script
You’ll need the httparty
, multiple_files_gzip_reader
, and zipruby
gems to
download and extract the data from Amplitude, which comes in as a zip of gzipped
JSON files. You’ll also need the API key and secret from the settings page of
your Amplitude page. Finally, assuming you’re using ActiveRecord, you’ll need to
run the migration and require
the AmplitudeEvent
model.
This importer script does everything in memory for simplicity. To handle larger datasets that exceed the size of your memory, you should modify the script to stream the downloaded zip file to disk. Once on disk, you should be able decompress it as a stream as well. Alternatively, because Amplitude buckets data by hour, you can import a smaller time window so that the dataset will fit in memory.
Happy Analytics-ing!
With this script in hand, you can get your Amplitude data into your own system and query it to your content. It’s not advisable to do this at scale, but as a tiny team, we’re just looking for quick and effective solutions. At scale, you’d probably want the Enterprise plan to directly access Amplitude’s Redshift stores, and if you wanted to bring that data in-house, you’d do it through Redshift instead of through an API.
Cheers!