finops

A few data engineering tricks when dealing with the Azure Billing export

April 1, 2024

I've been backing on a side project lately to try and make open sources some of the bones of a FinOps visibility tool. You can find the FinOpsPod episode I recorded on the topic recently here. Well now that that's out, I've been properly motivated to ship and while AWS is done enough for now, I have been wrangling the Azure side of things over the weekend. This is what I learned in the last 72 hours.

Azure Blob Storage download progress with TQDM

I searched the internet high and low for how to handle this. AWS makes it fairly easy with the Callback argument you can pass when downloading an object from S3. I guess Azure's version is more recent and it goes like this:

def download_object(self, blob_name, destination_path):
    blob_client = self.container_client.get_blob_client(blob_name)
    filesize = blob_client.get_blob_properties().size
    # just in case the destination path doesn't exist
    os.makedirs(os.path.dirname(destination_path), exist_ok=True)
    with open(destination_path, "wb") as file, tqdm(
        total=filesize,
        unit="B",
        unit_scale=True,
        desc='whatever helpful file description you wish',
        colour="green",
    ) as t:
        bytes_read = 0
        # the progress_hook calls with 2 args - the bytes downloaded so far and the total
        # bytes of the object.  t.update wants the bytes read in that iteration, so we have to
        # make that happen by keeping track of what the total was as of the previous
        # iteration.
        def update_progress(bytes_amount, *args):
            nonlocal bytes_read
            t.update(bytes_amount - bytes_read)
            bytes_read = bytes_amount

        blob_data = blob_client.download_blob(progress_hook=update_progress)
        file.write(blob_data.readall())
        t.close()

In the API docs for the download_blob function you can find the progress_hook kwarg. It isn't called as often as its AWS counterpart so the progress bar isn't nearly as fine-grained, but it's better than nothing in my opinion. The whole thing in general requires more wrangling than the AWS version, but I learned quite a lot in the process.

DuckDB, the ultimate dataeng swiss army knife?

One helpful thing that AWS does in their billing data export is to include a metadata file with each night's export that tells us facts about the export in general. Things like

the timestamp that the export was generated
where you can find the associated billing data files
a unique ID on that particular version of the export and most helpfully
a list of columns and their datatypes in that export.

For this side project I'm using Clickhouse as the backend warehouse. It's really fun to run huge queries on a huge dataset and have them come back in what feels like 100ms, so I'm a rather big fan of Clickhouse at this point though I'm only just getting to know it. There are fussy things, too. Things like its CSV importing, which is ... not super friendly. Here's an example:

Azure's billing exports generate with a Date field that tells you the date of the charge/line item. For some reason, even though my employer is a French company and our bill is in euro, all of the date fields in this bill come across with the US date format – MM/DD/YYYY. After exhaustive searching, I did find a clue in the Clickhouse documentation that it could parse US style dateTime strings, but I cannot find that piece of documentation again AND it was only available after you'd gotten the data into the warehouse (presumably as a String). I want the thing stored as a date to begin with so I started to wonder if I could grab DuckDB and use it to parse this stupid Date column for me correctly.

The answer is yes. DuckDB is also a pretty cool piece of gear and so I'm playing with both of these open-source columnar things at the moment. One thing that Duck has gone out of their way with is making it super easy to ingest data, and to specify all the little weird things that can go wrong in their extremely generous default CSV importer, things like “hey, the dateformat should look like this – {strptime string}”. Super cool and works like a charm, so now I have this CSV in memory as a DuckDB table. What else can I do with it?

Well, why spit it back out as CSV, how about spit it back out as Parquet? Clickhouse will have a much easier time reading a Parquet file as it comes along with all the column names and datatypes, so that's what I'm doing. So, I have this function that downloads all the data_files for a given billing export and for the sake of brevity I'll put it here in its current, non-optimized form:

def download_datafiles(self, data_files: List[str]):
    local_files = []
    # this downloads each of the CSV files and puts them in a local
    # tmp directory
    for data_file in data_files:
        destination_path = f"tmp/{data_file}"
        print(f"Downloading to {destination_path}")
        self.storage_client.download_object(data_file, destination_path)
        local_files.append(destination_path)
    dirname = os.path.dirname(local_files[0])
    con = duckdb.connect()
    # Here we convert the csv to Parquet, because DuckDB is excellent with
    # parsing CSV and Clickhouse is a bit fussy in this regard.  The Azure
    # files come over with dates in the format MM/DD/YYYY, which DuckDB
    # can be made to deal with, but Clickhouse cannot.

    # moreover, Duck can grab any number of CSV files in the target directory and
    # merge them all together for me.  This allows me to generate a single Parquet
    # file from all the CSV files in the directory.  Given that Azure doesn't even
    # gzip these files, this turns 500MB of CSV into 20MB of Parquet.  Not bad.
    con.sql(
        f"""CREATE TABLE azure_tmp AS SELECT * FROM read_csv('{dirname}/*.csv', 
            header = true, 
            dateformat = '%m/%d/%Y'
        )"""
    )
    con.sql(
        f"COPY (SELECT * FROM azure_tmp) TO 'tmp/azure-tmp.parquet' (FORMAT 'parquet')"
    )
    con.close()
    # yes, we do two things in the function right now, it's ok.  We'll refactor
    # and use the "parse the columns and datatypes out of this parquet table" probably
    # all over the place.
    return "tmp/azure-tmp.parquet"

#data #data-engineering #azure #finops

FinOpsLand – Reservations

February 19, 2023

First day of my job at my current employer (almost 7 years ago now) I crack open the company handbook to start onboarding. I remember it saying something like “Reservations are the lifeblood of this company” and thinking “wow, what does that mean?”

I'd had a job prior to this one that had some Stuff on AWS so I was familiar with the concept – something something pay upfront get reduced prices – but it as far from the lifeblood of the company. It was something the IT head took care of, and he didn't seem that pleased to do it either. So, six years after that I find myself in charge of FinOps here and it has a lot to do with reservations. Indeed, reservations are the main lever that you have to pull on the job. Let's talk bout it…

What are reservations?

I recently heard reservations described as a “forward contract”, which Investopedia (one of the most useful resources in this leg of my career) describes thusly

A forward contract is a customized contract between two parties to buy or sell an asset at a specified price on a future date.

The promise of the cloud is that you, developer, can push a button and spin up a VM or any number of other network-connected computing resources that you didn't have to ask IT for. It's why the default pricing model for these resources is called “On Demand”. AWS invented this idea several years ago that if you committed to running that resource for a year or more, you could receive a reduced price. You could pay for the year either entirely upfront to receive the deepest discount, entirely over time as a monthly payment for a shallower discount, or somewhere in between – the “partial upfront”, which mixes an upfront payment with a monthly payment for those resources.

At this point, you might be starting to learn concepts like “amortization” and “the time cost of money” to distinguish why you would choose one of these, but if not I suggest looking them up.

Uh, ok

There is a balancing act in place with reservations. Obviously you want to receive those lower rates wherever possible, but your resource usage might be, indeed probably is, rather variable. “Elastic”. Suppose you overpurchase a reservation, all upfront so that the entire year is paid for, but then the resource is turned off after 6 months. So my latest analog for what FinOps is largely about is a double ended inventory job. You have an inventory of resources to cover with reservations and you have an inventory of the reservations themselves. One can be created or turned off in an instant, the other lives for 1 or 3 years.

#business #finops

Dispatch from FinOpsLand

February 11, 2023

I had a really organized map of things in my head I'd like to tell my younger self about FinOps last night. This morning it is gone. Let this be a lesson to me – jot some notes down. It was a primer course, from the point of view of a data person who was placed in charge of a FinOps practice – how to think about FinOps, what data are you going to need, what do the terms and costs mean, etc.

So what is FinOps?

Well, it's the driest sounding topic that I've ever found incredibly interesting (so far). Essentially, the cloud has upended what used to be an agreeably distant relationship between Engineering teams and Finance teams.

If an Eng team needed to launch a new thing to the young internet in the year 1999, they went through a procurement process with their employers Finance team. A server was purchased and placed in a rack somewhere and the interaction was largely done – Finance depreciated the hardware as they saw fit and Engineering optimized the workloads on that hardware as they saw fit. It was paid for, who cared after that?

Well, The Cloud screwed all that up. The cloud allows engineers to directly spend company money by pressing a button. Pressing buttons and launching resources without asking anybody is fun af, so Eng did it, lots. Some time later the bill comes to the old IT team or to Finance and friction entered the chat.

Finance could no longer control IT outflows. Engineering could no longer be totally ignorant of the company money they were spending. Both sides needed more information to do their jobs and make better decisions and into that dysfunctional gap grew the practice of FinOps.

How does FinOps Op?

“Financial Operations” is, I guess, what it stands for. See, cloud vendors – AWS, Google Cloud Platform aka GCP, and Azure (Microsoft's cloud) – don't make their money by making it easy for an Engineering team to understand the impact of their hardware decisions. They don't make their money by making it easy for Finance teams to surface anomalies in spending. They don't make their money by generating understandably reporting and forecasting tools. They make their money by selling Moar Cloud. And turns out one of the easiest ways to sell Moar Cloud is by making all of the above as difficult as possible!

I'm being cheeky and slightly humorous, or so I tend to think over my morning coffee. Truth is, these are huge suites of very complex products, upon which the largest companies in the world are:

running their enormous, heterogenous workloads
across dozens or hundreds of products within the cloud vendor's catalog and
asking to be able to report on those any one of these workloads in a manner that fits their organization.

So what pops out of these requirements is typically a very granular bill with millions (or billions, so I hear) of line items. Those line items were generated by the various teams that built the products within the suite, so they tend to be pretty heterogenous themselves in terms of data points and consistency.

This is where FinOps finally steps in. It's basically a heavily data-backed job of informing both sides of the equation in as close to realtime as possible about the workloads and the financial impact of the workloads.

I intend next chapter to talk about “reservations”, which is part of the bread and butter of the cost management and therefore FinOps domain.

#business #finops

On FinOps, 5 months later

June 20, 2022

I am on my way to San Diego for the first Platformer meetup thing that I've been to since the Beforetimes, at least the first that's not just my team. On my way to San Diego I stopped by Austin for the FinOpsX conference, an amazing little thing put on by ... maybe the Linux Foundation through some other community community, idk.

Anyway, it was really amazing. Open source is just so much fun, I'm really glad that I found it all those years ago. This conference was smallish, like 400 people, but had all the open source vibes that the Drupal scene had back in the day, or the Python scene had the only time I dipped my toes in a few years back.

I'm really rather enjoying my gig these days after a long dark winter of feeling pretty hollowed out. Been doing some work and am feeling much healthier now. One of the lovely things about going to a conference or just getting outside your bubble in general is learning a little about how other companies operate.

See, here inside the bubble at Platform it seems sometimes like everything is going to take so long, and it's so difficult coordinating all these people and their worklives. Sometimes it's hard to see the bigger picture. One of my main takeaways is that we actually have our act together in many, many ways at Platform and one of the way in which I am most proud is in our data setup where pretty much everything we need is in a place where you can find it. Most companies don't even have this much.

I am about to fully inherit the FinOps function here when one of my people moves on to a new gig in a few weeks, and I'm mostly pretty excited about the opportunity to remake a little part of the world here that seems to cause a couple people some stress.

So yeah, somehow data and finance are the things that are interesting to me now.

#life #finops

On FinOps

January 24, 2022

Just thought I'd drop myself a line here and remind me about that time that I was getting FinOps certified, because it's so much more interesting than I would've thought.

Basically, back in the old days, there were data centers and if you wanted a new resource in one of those data centers you had to go through a procurement cycle involving finance and probably a procurement team. You'd buy the resource and that would count as Cap Ex in your P&L or whatever. It'd get installed and then you could use it. That Cap EX would be depreciated and the world would keep turning, pretty predictably, just like the Finance teams likes it.

This meant much longer planning and procurement loops for most technology teams, loops that are gone now in the era of “Cloud” and “devops” generally. This is mostly great. It also meant that the old methods of controlling costs are gone and that the ability to spend company money has been handed directly to development teams. This is potentially bad.

This should require much more feedback between the two teams – Eng and Finance – and much greater visibility into the company's resource usage for the Eng teams spending the money.

This is FinOps. A continual process of building, monitoring, and optimizing that allows companies to move SO much faster than they used to be able to.

#business #finops

My current understanding of our AWS CUR bill

March 15, 2021

Hello, me. This post is to jot down what I think I know about the AWS CUR bill at this point in time. There are entire companies built around helping other companies understand this bill, so this should be considered a primer.

There are several columns that matter in this giant bill, and many dozens that do not matter, and about a hundred or so that fall somewhere in between. The ones that I know about so far that really matter when you're trying to decode a bill AT SCALE are:

Line Item unblended cost

Basically everything that is not an EC2 instance that is covered by some sort of reservation or Saving Plan is going to show up in this column – storage, network costs, etc.

Confusingly, line items that have a Saving Plan applied to them will also show up in this column, making it necessary to check against the “Line Item Type” column to see if it falls under plain old “Usage”, in which case you're paying the on demand price. If the Line Item Type column says anything other than “Usage” then it's not what you're actually paying for that resource.

For resource usage line items that have a Reservation applied to them this column should be $0, but for resource usage line items that have a Savings Plan applied to them this column will contain the full, undiscounted price and you need to look elsewhere to find out what you're really paying.

Reservation Effective Cost

Any resource usage line items that have some sort of reservation applied to them will show up in this column. This column and the Unblended Cost column are, afaict, mutually exclusive. A line item can have a cost in one or the other, but not both. From the documentation:

The sum of both the upfront and hourly rate of your RI, averaged into an effective hourly rate.

These line items have a Line Item Type of “DiscountedUsage“

Public on demand cost

This column is mostly useful just for sanity checking as you do your math against the other columns. If you correctly sum up your Savings Plan Effective Costs, your Reservation Effective Costs, and your on demand Unblended Costs then you should have a number that is lower than the sum of this number.

Ideally it should be much lower, and I have not yet figured out if it is possible for the sum of all your discounted effective costs to be higher than this number.

If you incorrectly forecast your future usage and purchase discounts in excess of what you actually use, then you're potentially throwing away more money than what you'd be paying on demand. This is the nightmare scenario of course, and the Finops practitioner makes their money by avoiding this for their liege.

Savings Plan Effective Cost

The proportion of the Savings Plan monthly commitment amount (Upfront and recurring) that is allocated to each usage line.

This cost column is mutually exclusive with the Reservation Effective Cost, but not with the Unblended Cost. The Unblended column will contain the full, on-demand price for the resource in question but for those with a Line Item Type of “SavingsPlanCoveredUsage” the Saving Plan Effective Cost should be used.

I haven't tried this yet, but off the top of my head it would look something like

CASE 
	WHEN line_item_type = 'DiscountedUsage' THEN reservation_effective_cost
	WHEN line_item_type = 'SavingsPlanCoveredUsage' THEN savings_plan_effective_cost
	ELSE unblended_cost
END AS actual_cost,

CASE 
	WHEN line_item_type = 'DiscountedUsage' THEN 'Reservation Effective Cost'
	WHEN line_item_type = 'SavingsPlanCoveredUsage' THEN 'Savings Plan Effective Cost'
	WHEN line_item_type = 'Usage' then 'Unblended Cost'
	ELSE CONCAT('Unblended Cost: ', line_item_type)
END AS actual_cost_source

Good luck!

#business #finops