{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introductory example\n", "\n", "In this notebook, we compute income taxes and social security contributions for example\n", "data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "from gettsim import InputData, MainTarget, TTTargets, copy_environment, main, tt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating the Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first step in GETTSIM's workflow is to define the targets of the taxes and transfers\n", "system you are interested in. The key sequences of the nested dictionary below are the\n", "paths GETTSIM will use as targets. For instance, via the path `einkommensteuer` and\n", "`betrag_m_sn`, we request the amount of income tax to be paid monthly at the\n", "Steuernummer level. *Note: Of course, the income tax is paid annually and calculated at\n", "that level, but GETTSIM will do the conversion for you.*\n", "\n", "The values on the lowest level of the dictionaries (called leaves) will be used as the\n", "column names of the resulting DataFrame. Here, `income_tax_m` will be the name of the\n", "column containing the income tax results.\n", "\n", "In this example, we are interested in the income tax and the social insurance\n", "contributions paid when being in regular employment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TT_TARGETS = {\n", " \"einkommensteuer\": {\"betrag_m_sn\": \"income_tax_m\"},\n", " \"sozialversicherung\": {\n", " \"pflege\": {\n", " \"beitrag\": {\n", " \"betrag_versicherter_m\": \"long_term_care_insurance_contribution_m\"\n", " }\n", " },\n", " \"kranken\": {\n", " \"beitrag\": {\"betrag_versicherter_m\": \"health_insurance_contribution_m\"}\n", " },\n", " \"rente\": {\n", " \"beitrag\": {\"betrag_versicherter_m\": \"pension_insurance_contribution_m\"}\n", " },\n", " \"arbeitslosen\": {\n", " \"beitrag\": {\n", " \"betrag_versicherter_m\": \"unemployment_insurance_contribution_m\"\n", " }\n", " },\n", " },\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we need to find out which input data we actually need to calculate the targets we\n", "are interested in. We can do this by specifying a template as the `main_target` of\n", "`gettsim.main`. The template returns the input variables needed to compute the specified\n", "`tt_targets`.\n", "\n", "Some of these inputs are computed from other inputs. If you already know the value of\n", "such a computed input, you can provide it directly in the template call. GETTSIM will\n", "then exclude its upstream dependencies from the template, giving you a shorter list of\n", "remaining inputs to fill. For example, the old-age pension benefit\n", "(`sozialversicherung__rente__altersrente__betrag_m`) depends on many pension-related\n", "inputs (entitlement points, contribution months, etc.). Since nobody in our scenario is\n", "retired, we provide it as 0, which removes all of those upstream inputs from the\n", "template." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "main(\n", " main_target=MainTarget.templates.input_data_dtypes.tree,\n", " policy_date_str=\"2025-01-01\",\n", " tt_targets=TTTargets.tree(TT_TARGETS),\n", " input_data=InputData.tree(\n", " {\n", " \"p_id\": pd.Series([0]),\n", " \"sozialversicherung\": {\n", " \"rente\": {\n", " \"altersrente\": {\"betrag_m\": pd.Series([0])},\n", " },\n", " },\n", " }\n", " ),\n", " include_warn_nodes=False,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output above is a nested dictionary whose leaves are dtype hints. Each leaf\n", "corresponds to an input variable that GETTSIM needs. To build the mapper (below), we\n", "replace each dtype hint with a column name from our input DataFrame." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we create some example data. Our example household consists of a married couple\n", "(both 30 years old, both employed) with a 10-year-old child. Here, we use a pandas\n", "DataFrame with column names that are different from the ones GETTSIM expects." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DATA = pd.DataFrame(\n", " {\n", " \"age\": [30, 30, 10],\n", " \"working_hours\": [35, 35, 0],\n", " \"disability_grade\": [0, 0, 0],\n", " \"birth_year\": [1995, 1995, 2015],\n", " \"hh_id\": [0, 0, 0],\n", " \"p_id\": [0, 1, 2],\n", " \"self_employed\": [False, False, False],\n", " \"income_from_self_employment\": [0, 0, 0],\n", " \"income_from_rent\": [0, 0, 0],\n", " \"income_from_employment\": [5000, 4000, 0],\n", " \"income_from_forest_and_agriculture\": [0, 0, 0],\n", " \"income_from_capital\": [500, 0, 0],\n", " \"income_from_other_sources\": [0, 0, 0],\n", " \"contribution_to_private_pension_insurance\": [0, 0, 0],\n", " \"childcare_expenses\": [0, 0, 0],\n", " \"person_that_pays_childcare_expenses\": [-1, -1, 0],\n", " \"joint_taxation\": [True, True, False],\n", " \"contribution_private_health_insurance\": [0, 0, 0],\n", " \"has_children\": [True, True, False],\n", " \"single_parent\": [False, False, False],\n", " \"is_child\": [False, False, True],\n", " \"spouse_id\": [1, 0, -1],\n", " \"parent_id_1\": [-1, -1, 0],\n", " \"parent_id_2\": [-1, -1, 1],\n", " \"in_training\": [False, False, False],\n", " \"id_recipient_child_allowance\": [-1, -1, 0],\n", " }\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we define a mapping from GETTSIM's expected input structure to our data. At each\n", "leaf, we either put a column name from `DATA` or a constant value." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MAPPER = {\n", " \"alter\": \"age\",\n", " \"arbeitsstunden_w\": \"working_hours\",\n", " \"behinderungsgrad\": \"disability_grade\",\n", " \"geburtsjahr\": \"birth_year\",\n", " \"hh_id\": \"hh_id\",\n", " \"p_id\": \"p_id\",\n", " \"einnahmen\": {\n", " \"bruttolohn_m\": \"income_from_employment\",\n", " \"kapitalerträge_m\": \"income_from_capital\",\n", " \"renten\": {\n", " \"betriebliche_altersvorsorge_m\": 0.0,\n", " \"geförderte_private_vorsorge_m\": 0.0,\n", " \"sonstige_private_vorsorge_m\": 0.0,\n", " \"aus_berufsständischen_versicherungen_m\": 0.0,\n", " },\n", " },\n", " \"einkommensteuer\": {\n", " \"einkünfte\": {\n", " \"ist_hauptberuflich_selbstständig\": \"self_employed\",\n", " \"aus_gewerbebetrieb\": {\"betrag_m\": \"income_from_self_employment\"},\n", " \"aus_vermietung_und_verpachtung\": {\"betrag_m\": \"income_from_rent\"},\n", " \"aus_forst_und_landwirtschaft\": {\n", " \"betrag_m\": \"income_from_forest_and_agriculture\"\n", " },\n", " \"aus_selbstständiger_arbeit\": {\"betrag_m\": \"income_from_self_employment\"},\n", " \"sonstige\": {\n", " \"alle_weiteren_m\": \"income_from_other_sources\",\n", " \"rente\": {\n", " \"alter_beginn_leistungsbezug_sonstige_private_vorsorge\": 65,\n", " },\n", " },\n", " },\n", " \"abzüge\": {\n", " \"beitrag_private_rentenversicherung_m\": (\n", " \"contribution_to_private_pension_insurance\"\n", " ),\n", " \"kinderbetreuungskosten_m\": \"childcare_expenses\",\n", " \"p_id_kinderbetreuungskostenträger\": \"person_that_pays_childcare_expenses\",\n", " },\n", " \"gemeinsam_veranlagt\": \"joint_taxation\",\n", " },\n", " \"sozialversicherung\": {\n", " \"rente\": {\n", " \"jahr_renteneintritt\": 2080,\n", " \"altersrente\": {\n", " \"betrag_m\": 0.0,\n", " },\n", " \"erwerbsminderung\": {\n", " \"betrag_m\": 0.0,\n", " },\n", " },\n", " \"kranken\": {\n", " \"beitrag\": {\"privat_versichert\": \"contribution_private_health_insurance\"}\n", " },\n", " \"pflege\": {\"beitrag\": {\"hat_kinder\": \"has_children\"}},\n", " },\n", " \"familie\": {\n", " \"alleinerziehend\": \"single_parent\",\n", " \"kind\": \"is_child\",\n", " \"p_id_ehepartner\": \"spouse_id\",\n", " \"p_id_elternteil_1\": \"parent_id_1\",\n", " \"p_id_elternteil_2\": \"parent_id_2\",\n", " },\n", " \"kindergeld\": {\n", " \"in_ausbildung\": \"in_training\",\n", " \"p_id_empfänger\": \"id_recipient_child_allowance\",\n", " },\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In practice, you would probably save the template to disk (e.g. as a YAML file), edit\n", "the leaves there, and read it back in as the mapper. Remember to allow for unicode\n", "characters, since many variable names contain Umlaute.\n", "\n", "```python\n", "import yaml\n", "\n", "with PATH_FOR_TEMPLATE.open(\"w\") as f:\n", " yaml.dump(TEMPLATE, f, allow_unicode=True)\n", "\n", "# Edit the leaves in the template, then read it back in\n", "with PATH_FOR_TEMPLATE.open(\"r\") as f:\n", " MAPPER = yaml.safe_load(f)\n", "```\n", "\n", "Some inputs may not be directly relevant to the scenario at hand. For example,\n", "`jahr_renteneintritt` and `alter_beginn_leistungsbezug_sonstige_private_vorsorge` only\n", "matter for people who actually receive pensions. Because GETTSIM's DAG is static, these\n", "inputs are still required even when the corresponding benefit is zero. In such cases,\n", "assign a reasonable default value — the exact number does not matter (as long as the\n", "resulting benefit itself is zero), but it must be a valid input (e.g. a plausible year,\n", "not 0 or `None`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculating taxes and transfers\n", "\n", "GETTSIM's `main` function is powered by a DAG. This has several advantages:\n", "- You can select any part of the DAG as a target, giving access to intermediate results.\n", "- You can feed any part of the DAG as input, overwriting specific parts (e.g. the\n", " policy environment).\n", "- You can skip parts of the DAG (e.g. safety checks on input data) to speed up\n", " computation, at the expense of less informative error messages.\n", "\n", "First, we compute the targets defined above using the input data. In a second example,\n", "we manipulate the policy environment to see why the interface DAG is useful.\n", "\n", "### Simple computation\n", "\n", "Let's calculate taxes and transfers first:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = main(\n", " policy_date_str=\"2025-01-01\",\n", " input_data=InputData.df_and_mapper(\n", " df=DATA,\n", " mapper=MAPPER,\n", " ),\n", " main_target=MainTarget.results.df_with_mapper,\n", " tt_targets=TTTargets.tree(TT_TARGETS),\n", " include_warn_nodes=False,\n", ")\n", "result.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Manipulating the policy environment\n", "\n", "First, we obtain the policy environment for the policy date we're interested in. Similar\n", "to above, we call the `main` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status_quo = main(\n", " policy_date_str=\"2025-01-01\",\n", " main_target=MainTarget.policy_environment,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us modify the policy environment by increasing the contribution rate of the public\n", "pension insurance by 1 percentage point. \n", "\n", "The first step is to create a copy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "increased_rate = copy_environment(status_quo)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "The contribution rate is a `ScalarParam` object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(status_quo[\"sozialversicherung\"][\"rente\"][\"beitrag\"][\"beitragssatz\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We get the current `value` of the `ScalarParam` out. We then inject a new `ScalarParam` object into the same place of `policy_environment`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "old_beitragssatz = status_quo[\"sozialversicherung\"][\"rente\"][\"beitrag\"][\"beitragssatz\"]\n", "increased_rate[\"sozialversicherung\"][\"rente\"][\"beitrag\"][\"beitragssatz\"] = (\n", " tt.ScalarParam(value=old_beitragssatz.value + 0.01)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can compute taxes and transfers with the increased contribution rate:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = main(\n", " main_target=MainTarget.results.df_with_mapper,\n", " policy_date_str=\"2025-01-01\",\n", " policy_environment=increased_rate,\n", " input_data=InputData.df_and_mapper(\n", " df=DATA,\n", " mapper=MAPPER,\n", " ),\n", " tt_targets=TTTargets.tree(TT_TARGETS),\n", " include_warn_nodes=False,\n", ")\n", "result.T" ] } ], "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 }