Home / Intelligence Log / Data Architecture Data Architecture

Automated Client Reporting with Python and Google Sheets API

The Reporting Problem at Scale

Client reporting is deceptively time-consuming. Building one report manually takes an hour. Building twelve takes a day — and that is before accounting for the inevitable data corrections, formatting inconsistencies, and version control confusion when the report gets emailed back with comments. For consultants managing more than five active clients, manual reporting is a meaningful drag on capacity that compounds every month.

The solution is a reporting engine that treats each client as a parameterized configuration: the same extraction, transformation, and loading logic running against each client's data source, writing structured output to a dedicated Google Sheets workbook, and flagging anomalies automatically. Once built, the marginal cost of adding a new client report is near zero.

Pipeline Architecture

The engine follows a standard ETL pattern with a client-scoped run loop: extract from the source database, transform into the reporting schema, and load into the target Google Sheet. Each run creates or updates a tab named by convention — Report_CLIENTID_YYYY_MM — which keeps the workbook organized and makes historical tabs easy to find by date.

Google Sheets API v4 with a service account is the right authentication approach for server-side automation. Unlike OAuth flows that require a user to authorize each session, a service account holds its credentials in a JSON keyfile that the engine loads at startup — no browser interaction, no token refresh failures at 3 AM.

The Reporting Engine

Python reporting_engine.py
from google.oauth2 import service_account
from googleapiclient.discovery import build
import psycopg2, pandas as pd
from datetime import date

SCOPES = ['https://www.googleapis.com/auth/spreadsheets']

class ClientReportingEngine:
    def __init__(self, keyfile: str, db_conn: str):
        creds = service_account.Credentials.from_service_account_file(
            keyfile, scopes=SCOPES
        )
        self.sheets = build('sheets', 'v4', credentials=creds).spreadsheets()
        self.db     = psycopg2.connect(db_conn)

    def run(self, client_id: str, spreadsheet_id: str):
        raw  = self._extract(client_id)
        data = self._transform(raw)
        self._load(spreadsheet_id, client_id, data)

    def _extract(self, client_id: str) -> pd.DataFrame:
        query = """
            SELECT date_trunc('day', created_at) AS day,
                   transaction_type,
                   SUM(amount)                  AS revenue,
                   COUNT(*)                     AS txn_count
            FROM transactions
            WHERE client_id = %(client_id)s
              AND created_at >= NOW() - INTERVAL '90 days'
            GROUP BY 1, 2
            ORDER BY 1"""
        return pd.read_sql(query, self.db, params={'client_id': client_id})

    def _transform(self, df: pd.DataFrame) -> pd.DataFrame:
        pivot = df.pivot_table(
            index='day', columns='transaction_type',
            values='revenue', aggfunc='sum', fill_value=0
        )
        pivot['total'] = pivot.sum(axis=1)
        pivot['wow_change'] = pivot['total'].pct_change(7) * 100
        return pivot.reset_index().round(2)

    def _load(self, spreadsheet_id: str, client_id: str, df: pd.DataFrame):
        tab = f"Report_{client_id}_{date.today().strftime('%Y_%m')}"
        self._ensure_sheet_exists(spreadsheet_id, tab)

        values = [df.columns.tolist()] + df.values.tolist()
        self.sheets.values().update(
            spreadsheetId=spreadsheet_id,
            range=f"{tab}!A1",
            valueInputOption='USER_ENTERED',
            body={'values': values}
        ).execute()

    def _ensure_sheet_exists(self, spreadsheet_id: str, title: str):
        meta = self.sheets.get(spreadsheetId=spreadsheet_id).execute()
        existing = [s['properties']['title'] for s in meta['sheets']]
        if title not in existing:
            self.sheets.batchUpdate(
                spreadsheetId=spreadsheet_id,
                body={'requests': [{'addSheet': {'properties': {'title': title}}}]}
            ).execute()

WoW Metrics and Pivot Tables

The week-over-week change calculation — pct_change(7) on the daily total revenue column — gives each report day's revenue as a percentage change from the same day seven days prior. This is more informative than a rolling average for operational reporting because it surfaces directional momentum without smoothing out the day-of-week patterns that dominate most transaction data.

The pivot table transformation turns a normalized transaction log into a report-ready format: one row per day, one column per transaction type, with a total and WoW change column appended. This format loads cleanly into Google Sheets and can be sorted, filtered, and charted without any manual manipulation on the client's side.

Scheduling and Slack Alerting

Python scheduler.py
import requests, schedule, time

CLIENT_CONFIGS = [
    {'client_id': 'ACME',  'spreadsheet_id': '1BxiM...'},
    {'client_id': 'CORP2', 'spreadsheet_id': '9KzpQ...'},
]
SLACK_WEBHOOK = "https://hooks.slack.com/services/..."

def alert(message: str):
    requests.post(SLACK_WEBHOOK, json={'text': message})

def run_all_clients():
    engine = ClientReportingEngine('service_account.json', DB_CONN)
    for cfg in CLIENT_CONFIGS:
        try:
            engine.run(cfg['client_id'], cfg['spreadsheet_id'])
            alert(f":white_check_mark: Report complete: {cfg['client_id']}")
        except Exception as e:
            alert(f":x: Report FAILED for {cfg['client_id']}: {e}")

schedule.every().day.at("06:00").do(run_all_clients)
while True:
    schedule.run_pending()
    time.sleep(60)

Output Sanity Checks

Automated reports that produce wrong numbers without anyone noticing are worse than no report at all. An output sanity check runs before the Slack success message fires and verifies that the report meets basic validity criteria: total revenue is greater than zero, the row count falls within an expected range, and no column contains entirely null values.

When a sanity check fails, the alert fires a failure message rather than a success message, with enough detail for a human to diagnose the problem. This pattern means that a failed report surfaces immediately and visibly rather than quietly producing a zero-revenue row that a client might not notice until their next weekly review.

Tired of Building Reports by Hand?

We build automated reporting pipelines that deliver accurate, formatted client reports on schedule — without your team touching a spreadsheet.

Talk to Our Team