{ "cells": [ { "cell_type": "markdown", "id": "6b089476", "metadata": {}, "source": [ "# Understanding Leaspy's Data Containers: `Data` and `Dataset`\n", "\n", "In `leaspy`, transforming raw data (like a CSV) into a model-ready format involves two key classes: `Data` and `Dataset`. Understanding their distinct roles is crucial for having full control of your analysis.\n", "\n", "---\n", "\n", "## 1. The `Data` Class: The User Interface\n", "\n", "The `Data` class is your **primary tool** for loading, organizing, and inspecting data. It acts as a flexible, patient-centric container that bridges the gap between raw spreadsheets and the model.\n", "\n", "## Key Features & Methods\n", "\n", "### **Loading Data**\n", "Use the factory method to load from a pandas DataFrame. Notice that there is a slight difference when you work with joint models." ] }, { "cell_type": "code", "execution_count": 7, "id": "c6de245e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " ID TIME EVENT_TIME EVENT_BOOL Y0 Y1 Y2 Y3\n", "0 116 78.461 85.5 1 0.44444 0.04 0.0 0.0\n", "1 116 78.936 85.5 1 0.60000 0.00 0.0 0.2\n", "2 116 79.482 85.5 1 0.39267 0.04 0.0 0.2\n" ] } ], "source": [ "import os\n", "import pandas as pd\n", "import leaspy\n", "from leaspy.io.data import Data\n", "\n", "leaspy_root = os.path.dirname(leaspy.__file__)\n", "data_path = os.path.join(leaspy_root, \"datasets/data/simulated_data_for_joint.csv\")\n", "df = pd.read_csv(data_path, dtype={\"ID\": str}, sep=\";\")\n", "\n", "data = Data.from_dataframe(df) \t\t\t\t\t\t\t# <-\n", "# For joint models (longitudinal + time-to-event):\n", "data_joint = Data.from_dataframe(df, data_type='joint')\t# <-\n", "print(df.head(3))" ] }, { "cell_type": "markdown", "id": "73706687", "metadata": {}, "source": [ "### **Inspection**: \n", "Access data naturally by patient ID or index. This is made thanks to the iterators handdling inside `Data`, it also allows you to iterate using for loops. So you can:\n", "* select data usint the brackets (`data['116']`)\n", "* check some attributes (`data.n_individuals`) \n", "* convert the whole dataset or some individuals back into a dataframe object (`data[['116']].to_dataframe()`)\n", "* iterate into each individual (`for individual in data:`)\n", "* generate an iterator (`for i, individual in enumerate(data):`)" ] }, { "cell_type": "code", "execution_count": 16, "id": "6e92c403", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of patients: 17\n", "patient data (observations shape): (9, 6)\n", "patient data (dataframe):\n", " ID TIME EVENT_TIME EVENT_BOOL Y0 Y1 Y2 Y3\n", "0 116 78.461 85.5 1.0 0.44444 0.04 0.0 0.0\n", "1 116 78.936 85.5 1.0 0.60000 0.00 0.0 0.2\n", "2 116 79.482 85.5 1.0 0.39267 0.04 0.0 0.2\n", "3 116 79.939 85.5 1.0 0.58511 0.00 0.0 0.0\n", "4 116 80.491 85.5 1.0 0.57044 0.00 0.0 0.0\n", "5 116 81.455 85.5 1.0 0.55556 0.20 0.1 0.2\n", "6 116 82.491 85.5 1.0 0.71844 0.20 0.1 0.6\n", "7 116 83.463 85.5 1.0 0.71111 0.32 0.2 0.6\n", "8 116 84.439 85.5 1.0 0.91111 0.52 0.6 1.0\n", "\n", "Iterating over first 3 patients:\n", " - Patient 116: 9 visits\n", " - Patient 142: 11 visits\n", " - Patient 169: 7 visits\n", "Patient ID: 116\n", "Patient ID: 142\n", "Patient ID: 169\n" ] } ], "source": [ "patient_data = data['116'] # Get a specific individual\n", "n_patients = data.n_individuals # Get total count\n", "print(f\"Number of patients: {n_patients}\")\n", "print(f\"patient data (observations shape): {patient_data.observations.shape}\")\n", "print(f\"patient data (dataframe):\\n{data[['116']].to_dataframe()}\")\n", "print(\"\\nIterating over first 3 patients:\")\n", "for individual in data:\n", " if len(individual.timepoints) == 10: break\n", " print(f\" - Patient {individual.idx}: {len(individual.timepoints)} visits\")\n", "for i, individual in enumerate(data):\n", " if i >= 3: break # Stop after 3 iterations using the index 'i'\n", " print(f\"Patient ID: {individual.idx}\")" ] }, { "cell_type": "markdown", "id": "12eb4a50", "metadata": {}, "source": [ "### **Managing Cofactors**\n", "Easily attach patient characteristics (e.g., genetics, demographics). It is used to group populations when using `plotter.plot_distribution`, so plotter can color different cofactors. Lets generate a dataset and its parameters to show how it works." ] }, { "cell_type": "code", "execution_count": 50, "id": "8867dfe4", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA4wAAAH5CAYAAADKurD5AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQAALH1JREFUeJzt3Qu41VWdN/AfcADxAsQdRiC0FLznZZQ0x5TE63ih0jQDJU1HmBQ1pZBEK9J8ssm8jD0m+iRlNqSJpSleSkVFZsjUInEoaORSOoCogMJ+n7Xe95yXowsNOLDP5nw+z/N387/svdc5y33+53vWrVWlUqkEAAAAvEPrdx4AAACARGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgqC5q0Jo1a+Lll1+O7bbbLlq1alXt4gAAANSUSqUSr732WvTp0ydat269ZQXGFBb79u1b7WIAAADUtPnz58f222+/ZQXG1LJY/8V17Nix2sUBAACoKcuWLcuNcPXZaosKjPXdUFNYFBgBAAA2zPsN8TPpDQAAAEUCIwAAAEUCIwAAAFvOGEYAAKD5Wr16dbz11lvVLkaL1rZt22jTps1Gv47ACAAANNnafgsXLowlS5ZUuyhEROfOnaNXr14btXa9wAgAADSJ+rDYo0eP2HrrrTcqqLDhUnB/4403YvHixXm/d+/eG/xaAiMAANAk3VDrw2LXrl2rXZwWr0OHDvkxhcZUJxvaPdWkNwAAwEarH7OYWhZpHurrYmPGkwqMAABAk9ENdcuqC4ERAACAIoERAACAjZ/05oYbbsjbn/70p7y/6667xvjx4+PII4/M+ytWrIgLLrggfvzjH8fKlStj6NChcf3110fPnj0bXmPevHlxzjnnxMMPPxzbbrttDB8+PCZOnBh1debfAQCALdHISTM26/vdPGK/qHUf/OAH47zzzstbzbQwbr/99vHNb34zZs6cGc8880wceuihcdxxx8Xzzz+fz59//vlxzz33xJ133hmPPvpovPzyy3HiiSc2mjnp6KOPjlWrVsUTTzwRt956a0yaNCmHTgAAgGoYMWJEHu/3zm3OnDnR0q1Xs96xxx7baP/rX/96bnF88sknc5i8+eabY/LkyTlIJrfccksMGjQonz/ggAPiV7/6Vbzwwgvx4IMP5lbHvfbaK6644oq4+OKL47LLLot27do17VcHAADwdzjiiCNyfllb9+7do6Xb4DGMqbUwdT19/fXXY/DgwbnVMU3XOmTIkIZrBg4cGP369Yvp06fn/fS4++67N+qimrqtLlu2rKGVsiR1b03XrL0BAAA0lfbt20evXr0abW3atIm777479t5779hqq61ihx12iAkTJsTbb7/d8LzUEvnv//7vccwxx+RlLFKDWco9qXXykEMOiW222SY++tGPxksvvdTwnPTv1FMz5aI0TG+//fbLjWrvJa1x+fnPfz6H2I4dO+ZGut/+9reb9HuyQYHxd7/7Xf6i0jf07LPPjp/97Gexyy67xMKFC3MLYefOnRtdn74J6VySHtcOi/Xn68+tSxrj2KlTp4atb9++61tsAACA9fKb3/wmPve5z8UXv/jF3FMyBcM0pC71tFxb6jWZrps1a1ZuNDvllFPiC1/4QowdOzYP5atUKjFq1KiG65cvXx5HHXVUTJs2Lf7rv/4rt26m3pxpvpd1+dSnPhWLFy+OX/7yl7mxLoXYww47LF599dXmFRh33nnn/I146qmn8uQ1adKa9M3blNI3eunSpQ3b/PnzN+n7AQAALcvUqVNzw1j9lgLahAkT4pJLLsmZJ7UufuITn8jhMAXHtZ1++unx6U9/Onbaaac83C5NEnrqqafm3pSpxTEFzkceeaTh+j333DMHyt122y0+/OEP59fccccd4+c//3mxbI899lg8/fTTea6YfffdNz/n6quvzo11P/3pTzfp92W9pyZNrYgf+tCH8r/32WefmDFjRvzbv/1bnHTSSXkym9RUunYr46JFi3JzbpIe0xe6tnS+/ty6pNbMtAEAAGwKH//4x/P8LPVSV9I99tgjHn/88UYtimloXlod4o033shdUJN03Tt7UKaheGsfS89JQ+tSd9LUwpjmcLn33ntjwYIFuYvrm2++uc4WxtT1ND2na9eujY6n56zd1XVT2Oi1LNasWZPHGKbw2LZt29ysOmzYsHxu9uzZ+YtOYxyT9Ji+2akptUePHvnYAw88kL9pqVsrAABANaSAWN8wVm/58uW5lXHtlR/qpTGN9VIOWntM47qOpeyUXHjhhTkHpVbC9J4dOnSIT37yk7kBriSVo3fv3o1aKeu9c0hgVQNj6hqa1lxME9m89tpreUbUVOj7778/jy0cOXJkjBkzJrp06ZJD4OjRo3NITDOkJocffngOhqeddlpcddVVedziuHHj4txzz9WCCMAWZXOvObal2BLWTgO2HHvvvXduBHtnkNxYqdUyLeVxwgknNATC+rXu11WOlJ3S2vVpfcbNab0CY2oZTIM5U7NpCoip6TWFxdSXN7nmmmuidevWuYUxtTqmPrvXX399w/PTLEOpb3Aa+5iCZErxqT/w5Zdf3vRfGQAAwEYYP358nv00NZilFsCUdVL30Oeeey6+9rWvbfDrpjGIU6ZMyRPdpNbHSy+9tKH1sSStRJHy0/HHH58b3tJYybTmferSmkJnGtfYLAJjWmfxvaRm2euuuy5v69K/f//4xS9+sT5vCwAA1LBa7T0wdOjQ3OCVGriuvPLK3M00zYKalrfYGN/+9rfjjDPOyMttdOvWLU+U815LB6ZQmTLUV77ylTzBzl//+tc8B8zBBx/8rlUomlqrSprjtcakb2Zq4UwzpqaurwDQ3OiS2rJ+qQQiT+oyd+7cGDBgQKPxfTTPOvl7M9V6L6sBAABAyyAwAgAAUCQwAgAAUCQwAgAAUCQwAgAAUCQwAgAAUCQwAgAAUCQwAgAAUCQwAgAANJE//elP0apVq5g1a1ZsCeqqXQAAAGALN/mkzft+p9yxXpePGDEibr311vjCF74QN954Y6Nz5557blx//fUxfPjwmDRpUrQ0WhgBAIAWr2/fvvHjH/843nzzzYZjK1asiMmTJ0e/fv2ipRIYAQCAFm/vvffOoXHKlCkNx6ZMmZLD4kc+8pGGY/fdd18cdNBB0blz5+jatWscc8wx8dJLL73naz/33HNx5JFHxrbbbhs9e/aM0047Lf72t79FLRAYAQAAIuKMM86IW265pWH/Bz/4QZx++umNrnn99ddjzJgx8cwzz8S0adOidevWccIJJ8SaNWuKr7lkyZI49NBDc+hMz0mBc9GiRfHpT386aoExjAAAABHx2c9+NsaOHRt//vOf8/7jjz+eu6k+8sgjDdcMGzas0XNSqOzevXu88MILsdtuu73rNb/3ve/lsPiNb3yj0XNSa+Yf//jH2GmnnaI5ExgBAAAicvA7+uij8+Q2lUol/7tbt26NrnnxxRdj/Pjx8dRTT+VupfUti/PmzSsGxt/+9rfx8MMP5+6o75S6sgqMAAAANdQtddSoUfnf11133bvOH3vssdG/f//4/ve/H3369MmBMQXFVatWFV9v+fLl+TlXXnnlu8717t07mjuBEQAA4P854ogjcvhLaykOHTq00blXXnklZs+encPixz72sXzssccee9/JdP7jP/4jPvjBD0ZdXe3FL5PeAAAA/D9t2rSJ3//+93lMYps2bRqd+8AHPpBnRr3ppptizpw58dBDD+UJcN5LWsfx1Vdfjc985jMxY8aM3A31/vvvz5PprF69Opo7gREAAGAtHTt2zNs7pRlR0yQ4M2fOzN1Qzz///PjWt74V7yV1W02T56RwePjhh8fuu+8e5513Xl6WI71ec9eqkkZz1phly5ZFp06dYunSpcWKBIBqGzlpRrWLUJNuHrFftYsAbKC0yP3cuXNjwIABsdVWW1W7OMR718nfm6maf6QFAACgKgRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAACgyaxZs6baRaAJ66Juo18BAABo8dq1a5fXFXz55Zeje/fueb9Vq1bVLlaLVKlUYtWqVfHXv/4110mqiw0lMAIAABstBZO03t+CBQtyaKT6tt566+jXr1+umw0lMAIAAE0itWSlgPL222/H6tWrq12cFq1NmzZRV1e30a28AiMAANBkUkBp27Zt3qh9Jr0BAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgqK58GABaoMknNdlLjV60JLYU1/b8WrWLAECVaGEEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAABg4wPjxIkTY7/99ovtttsuevToEccff3zMnj270TWHHHJItGrVqtF29tlnN7pm3rx5cfTRR8fWW2+dX+eiiy6Kt99+e32KAgAAwCZWtz4XP/roo3Huuefm0JgC3pe//OU4/PDD44UXXohtttmm4bozzzwzLr/88ob9FAzrrV69OofFXr16xRNPPBELFiyIz33uc9G2bdv4xje+0VRfFwAAAJszMN53332N9idNmpRbCGfOnBkHH3xwo4CYAmHJr371qxwwH3zwwejZs2fstddeccUVV8TFF18cl112WbRr125DvxYAAACayxjGpUuX5scuXbo0On777bdHt27dYrfddouxY8fGG2+80XBu+vTpsfvuu+ewWG/o0KGxbNmyeP7554vvs3Llynx+7Q0AAIBm1MK4tjVr1sR5550XBx54YA6G9U455ZTo379/9OnTJ5599tnccpjGOU6ZMiWfX7hwYaOwmNTvp3PrGjs5YcKEDS0qAAAAmzMwprGMzz33XDz22GONjp911lkN/04tib17947DDjssXnrppdhxxx036L1SK+WYMWMa9lMLY9++fTe06AAAAGyqLqmjRo2KqVOnxsMPPxzbb7/9e167//7758c5c+bkxzS2cdGiRY2uqd9f17jH9u3bR8eOHRttAAAANKPAWKlUclj82c9+Fg899FAMGDDgfZ8za9as/JhaGpPBgwfH7373u1i8eHHDNQ888EAOgbvsssv6fwUAAABUv0tq6oY6efLkuPvuu/NajPVjDjt16hQdOnTI3U7T+aOOOiq6du2axzCef/75eQbVPfbYI1+bluFIwfC0006Lq666Kr/GuHHj8munlkQAAABqsIXxhhtuyDOjHnLIIbnFsH6744478vm0JEZaLiOFwoEDB8YFF1wQw4YNi3vuuafhNdq0aZO7s6bH1Nr42c9+Nq/DuPa6jQAAANRYC2Pqkvpe0kQ0jz766Pu+TppF9Re/+MX6vDUAAAC1tA4jAAAAWy6BEQAAgCKBEQAAgCKBEQAAgCKBEQAAgCKBEQAAgCKBEQAAgCKBEQAAgCKBEQAAgKK68mEA+P9GTpoRLcHoRUuqXQQAaFa0MAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFBUF+th4sSJMWXKlPjDH/4QHTp0iI9+9KNx5ZVXxs4779xwzYoVK+KCCy6IH//4x7Fy5coYOnRoXH/99dGzZ8+Ga+bNmxfnnHNOPPzww7HtttvG8OHD82vX1a1XcQCAzWD0onGb780md46acsod1S4BQPNpYXz00Ufj3HPPjSeffDIeeOCBeOutt+Lwww+P119/veGa888/P+65556488478/Uvv/xynHjiiQ3nV69eHUcffXSsWrUqnnjiibj11ltj0qRJMX78+Kb9ygAAANgorSqVSmVDn/zXv/41evTokYPhwQcfHEuXLo3u3bvH5MmT45Of/GS+JrVGDho0KKZPnx4HHHBA/PKXv4xjjjkmB8n6Vscbb7wxLr744vx67dq1e9/3XbZsWXTq1Cm/X8eOHTe0+AD8nUZOmhEtwWZtSaNor75aGAE2h783U23UGMb04kmXLl3y48yZM3Or45AhQxquGThwYPTr1y8HxiQ97r777o26qKZuq6nAzz//fPF9UtfWdH7tDQAAgE1rgwPjmjVr4rzzzosDDzwwdtttt3xs4cKFuYWwc+fGfx1M4TCdq79m7bBYf77+XEka35jSb/3Wt2/fDS02AAAAmzowprGMzz33XJ7cZlMbO3Zsbs2s3+bPn7/J3xMAAKCl26BpSUeNGhVTp06NX//617H99ts3HO/Vq1eezGbJkiWNWhkXLVqUz9Vf8/TTTzd6vXS+/lxJ+/bt8wYAAEAzbWFM8+OksPizn/0sHnrooRgwYECj8/vss0+0bds2pk2b1nBs9uzZeRmNwYMH5/30+Lvf/S4WL17ccE2acTUNtNxll102/isCAABg87cwpm6oaQbUu+++O7bbbruGMYdpXGFalzE9jhw5MsaMGZMnwkkhcPTo0TkkphlSk7QMRwqGp512Wlx11VX5NcaNG5dfWysiALRss+YviVpybTOYQfjmEftVuwjAFmy9AuMNN9yQHw855JBGx2+55ZYYMWJE/vc111wTrVu3jmHDhuXZTdMMqNdff33DtW3atMndWc8555wcJLfZZpsYPnx4XH755U3zFQEAALD5A+Pfs2TjVlttFdddd13e1qV///7xi1/8Yn3eGgAAgFqY9AaAGjb5pPV+yuhFtdVNEACo8rIaAAAAbNkERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAJomMP7617+OY489Nvr06ROtWrWKu+66q9H5ESNG5ONrb0cccUSja1599dU49dRTo2PHjtG5c+cYOXJkLF++fH2LAgAAQHMKjK+//nrsueeecd11163zmhQQFyxY0LD96Ec/anQ+hcXnn38+HnjggZg6dWoOoWedddaGfQUAAABsEnXr+4Qjjzwyb++lffv20atXr+K53//+93HffffFjBkzYt99983Hrr322jjqqKPi6quvzi2XAAAAbKFjGB955JHo0aNH7LzzznHOOefEK6+80nBu+vTpuRtqfVhMhgwZEq1bt46nnnqq+HorV66MZcuWNdoAAACoscCYuqPedtttMW3atLjyyivj0UcfzS2Sq1evzucXLlyYw+Ta6urqokuXLvlcycSJE6NTp04NW9++fZu62AAAAGxsl9T3c/LJJzf8e/fdd4899tgjdtxxx9zqeNhhh23Qa44dOzbGjBnTsJ9aGIVGAACAGl9WY4cddohu3brFnDlz8n4a27h48eJG17z99tt55tR1jXtMYyLTjKprbwAAANR4YPzLX/6SxzD27t077w8ePDiWLFkSM2fObLjmoYceijVr1sT++++/qYsDAADApuqSmtZLrG8tTObOnRuzZs3KYxDTNmHChBg2bFhuLXzppZfiS1/6UnzoQx+KoUOH5usHDRqUxzmeeeaZceONN8Zbb70Vo0aNyl1ZzZAKAABQwy2MzzzzTHzkIx/JW5LGFqZ/jx8/Ptq0aRPPPvts/PM//3PstNNOMXLkyNhnn33iN7/5Te5WWu/222+PgQMH5jGNaTmNgw46KG666aam/coAAADYvC2MhxxySFQqlXWev//++9/3NVJL5OTJk9f3rQEAANiSxjACAABQmwRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAigRGAAAAiurKhwEAeD+jF42rdhEiJneOZueUO6pdAqCJaGEEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgqK58GGDLNHLSjGjpRi9aUu0iAAA1QgsjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAATRMYf/3rX8exxx4bffr0iVatWsVdd93V6HylUonx48dH7969o0OHDjFkyJB48cUXG13z6quvxqmnnhodO3aMzp07x8iRI2P58uXrWxQAAACaU2B8/fXXY88994zrrruueP6qq66K7373u3HjjTfGU089Fdtss00MHTo0VqxY0XBNCovPP/98PPDAAzF16tQcQs8666yN+0oAAABoUnXr+4QjjzwybyWpdfE73/lOjBs3Lo477rh87LbbbouePXvmlsiTTz45fv/738d9990XM2bMiH333Tdfc+2118ZRRx0VV199dW65BAAAYAsbwzh37txYuHBh7oZar1OnTrH//vvH9OnT8356TN1Q68Nikq5v3bp1bpEsWblyZSxbtqzRBgAAQA0FxhQWk9SiuLa0X38uPfbo0aPR+bq6uujSpUvDNe80ceLEHDzrt759+zZlsQEAAKjVWVLHjh0bS5cubdjmz59f7SIBAABs8Zo0MPbq1Ss/Llq0qNHxtF9/Lj0uXry40fm33347z5xaf807tW/fPs+ouvYGAABADQXGAQMG5NA3bdq0hmNpvGEamzh48OC8nx6XLFkSM2fObLjmoYceijVr1uSxjgAAANToLKlpvcQ5c+Y0muhm1qxZeQxiv3794rzzzouvfe1r8eEPfzgHyEsvvTTPfHr88cfn6wcNGhRHHHFEnHnmmXnpjbfeeitGjRqVZ1A1QyoAAEANB8ZnnnkmPv7xjzfsjxkzJj8OHz48Jk2aFF/60pfyWo1pXcXUknjQQQflZTS22mqrhufcfvvtOSQedthheXbUYcOG5bUbAQAAaD5aVdLiiTUmdXNNs6WmCXCMZwTWx8hJM6KlG71oXLWLADShvfp2jmbnlDuqXQKgiTJVTcySCgAAwOYnMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFAkMAIAAFBUVz4MsAWYfNK7Do1etKQqRQEAqEVaGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACgSGAEAACiyDiMAQA2bNb/5rS977aQZ0dzdPGK/ahcBaoIWRgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIoERgAAAIrqyocBAGDDjF40Lpq9yZ03/3uecsfmf0/YSFoYAQAAKBIYAQAAKBIYAQAAKBIYAQAAKBIYAQAAKBIYAQAAKBIYAQAAKBIYAQAA2DyB8bLLLotWrVo12gYOHNhwfsWKFXHuuedG165dY9ttt41hw4bFokWLmroYAAAANMcWxl133TUWLFjQsD322GMN584///y455574s4774xHH300Xn755TjxxBM3RTEAAADYCHWb5EXr6qJXr17vOr506dK4+eabY/LkyXHooYfmY7fccksMGjQonnzyyTjggAM2RXEAAABoLi2ML774YvTp0yd22GGHOPXUU2PevHn5+MyZM+Ott96KIUOGNFybuqv269cvpk+fvs7XW7lyZSxbtqzRBgAAQI0Fxv333z8mTZoU9913X9xwww0xd+7c+NjHPhavvfZaLFy4MNq1axedO3du9JyePXvmc+syceLE6NSpU8PWt2/fpi42AAAAm7pL6pFHHtnw7z322CMHyP79+8dPfvKT6NChwwa95tixY2PMmDEN+6mFUWgEAACo8WU1UmviTjvtFHPmzMnjGletWhVLlixpdE2aJbU05rFe+/bto2PHjo02AAAAajwwLl++PF566aXo3bt37LPPPtG2bduYNm1aw/nZs2fnMY6DBw/e1EUBAACgml1SL7zwwjj22GNzN9S0ZMZXv/rVaNOmTXzmM5/J4w9HjhyZu5d26dIltxSOHj06h0UzpAIAAGzhgfEvf/lLDoevvPJKdO/ePQ466KC8ZEb6d3LNNddE69atY9iwYXn206FDh8b111/f1MUAAABgI7WqVCqVqDFp0pvUWpnWdTSeEVinySe969Cs+Y3HUAPQMu3Vt/Gs/ZvFKXds/veEjcxUm3wMIwAAALWpybukApvPyEkzql2EZm30Iq2JAAAbQwsjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARXXlw0AtGb1oXLWLAADAFkgLIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEUCIwAAAEV15cMAAECTmnxStUvQfJ1yR7VLwDpoYQQAAKBIYAQAAKBIYAQAAKDIGEYAAFqcWfOXVLsINWevvp2rXQSqQAsjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARQIjAAAARXXlw9AMTT6p2iVodkYvWlLtIgAAsAXTwggAAECRFkYAAOB9zZq/6Xo2XTtpRmypbh6xX9QyLYwAAAAUCYwAAAAUCYwAAAAUGcPYHJkNFAAAaAa0MAIAAFCkhZEtftYtAACgBgPjddddF9/61rdi4cKFseeee8a1114b//iP/xi1buRGTgtsMXYAAKBFd0m94447YsyYMfHVr341/vM//zMHxqFDh8bixYurVSQAAACaQwvjt7/97TjzzDPj9NNPz/s33nhj3HvvvfGDH/wgLrnkkkbXrly5Mm/1li5dmh+XLVsWzdGqN5dv1POXr3i7ycoCAADN3cb+/tycLWummaW+XJVK5T2va1V5vys2gVWrVsXWW28dP/3pT+P4449vOD58+PBYsmRJ3H333Y2uv+yyy2LChAmbu5gAAABbtPnz58f222/fvFoY//a3v8Xq1aujZ8+ejY6n/T/84Q/vun7s2LG5+2q9NWvWxKuvvhpdu3aNVq1aNVnC7tu3b/6GdezYsUlek01HfdUOdVVb1FftUFe1Q13VFvVVO9TVxknthq+99lr06dOn9mdJbd++fd7W1rlz503yXul/Nv/D1Q71VTvUVW1RX7VDXdUOdVVb1FftUFcbrlOnTs1z0ptu3bpFmzZtYtGiRY2Op/1evXpVo0gAAAA0h8DYrl272GeffWLatGmNupmm/cGDB1ejSAAAADSXLqlpTGKa5GbffffNay9+5zvfiddff71h1tTNLXV5TUt8vLPrK82T+qod6qq2qK/aoa5qh7qqLeqrdqirzaMqs6TW+973vhff+ta3YuHChbHXXnvFd7/73dh///2rVRwAAACaS2AEAACg+arKGEYAAACaP4ERAACAIoERAACAIoERAACAohYXGC+77LJo1apVo23gwIEN51esWBHnnntudO3aNbbddtsYNmxYLFq0qKplbqner64OOeSQd50/++yzq1rmlux//ud/4rOf/Wz+7HTo0CF23333eOaZZxrOp/m1xo8fH717987nhwwZEi+++GJVy9ySvV99jRgx4l2fryOOOKKqZW6JPvjBD76rHtKW7lOJe1Zt1Zf7VvOxevXquPTSS2PAgAH5Z+COO+4YV1xxRb5X1XPfqp26cs/aQtdhrKZdd901HnzwwYb9urr//204//zz4957740777wzOnXqFKNGjYoTTzwxHn/88SqVtmV7r7pKzjzzzLj88ssb9rfeeuvNWj7+r//93/+NAw88MD7+8Y/HL3/5y+jevXu+qX7gAx9ouOaqq67KS+fceuut+Yd++uE/dOjQeOGFF2Krrbaqavlbmr+nvpJ0s73lllsa9q1ztfnNmDEj/7JU77nnnotPfOIT8alPfSrvu2fVVn0l7lvNw5VXXhk33HBDviel3zXSH8zSWuDpc/Sv//qv+Rr3rdqpq8Q9a9NpkYExhY5evXq96/jSpUvj5ptvjsmTJ8ehhx6aj6X/8QYNGhRPPvlkHHDAAVUobcu2rrpa+0b7XufZfD/M+/bt2+gHdbq51kt/BfzOd74T48aNi+OOOy4fu+2226Jnz55x1113xcknn1yVcrdU71dfa99sfb6qK4X5tX3zm9/Mf13/p3/6J/esGquveu5bzcMTTzyR70dHH310Q+vwj370o3j66afzvvtW7dRVPfesTafFdUlN0l/S+/TpEzvssEOceuqpMW/evHx85syZ8dZbb+UuB/VSF8h+/frF9OnTq1jilmtddVXv9ttvj27dusVuu+0WY8eOjTfeeKNqZW3Jfv7zn8e+++6b/4reo0eP+MhHPhLf//73G87PnTs3Fi5c2Oizlf4yuP/++/tsNcP6qvfII4/k8zvvvHOcc8458corr1SlvPxfq1atih/+8Idxxhln5O5W7lm1VV/13Leah49+9KMxbdq0+OMf/5j3f/vb38Zjjz0WRx55ZN5336qduqrnnrXptLgWxvRBnzRpUv6facGCBTFhwoT42Mc+lruNpB8M7dq1i86dOzd6TvprUjpH86mr7bbbLk455ZTo379/DpTPPvtsXHzxxTF79uyYMmVKtYve4vz3f/937i4yZsyY+PKXv5y7ZaVuIunzNHz48IbPT/osrc1nq3nWV33XntS1MbU8vvTSS/m6dHNOvyi1adOm2l9Ci5RaNZYsWZLH6iTuWbVVX4n7VvNxySWXxLJly/IfWdLPtNSV+Otf/3r+43TivlU7dZW4Z21aLS4wrv3XiD322COHkvTD+yc/+UkeSEtt1NXIkSPjrLPOajifJuxIg9IPO+yw/IMidQFi81mzZk1usfrGN76R91OLVQr2N954Y0MAobbqa+3uVunzlT6D6XOV/oKbPmdsfqn7afq5mMIGtVlf7lvNR/pdIrX2pi7daVzcrFmz4rzzzsv15b5Ve3XlnrVptcguqWtLf5ndaaedYs6cObnfc+pCkv4iuLY045w+0c2rrkpSoEzWdZ5NJ/3Ss8suuzQ6lsZR1Xchrv/8vHP2Rp+t5llfJalbeOpG5/NVHX/+85/zBGCf//znG465Z9VWfZW4b1XPRRddlFuuUtBIAeO0007Lk0hNnDgxn3ffqp26KnHPalotPjAuX748/2Uv/QK1zz77RNu2bXM/6Xqpq0j6JWrw4MFVLSeN66ok/cUpWdd5Np0042b6rKwtjTVILcJJ6iKSbrBrf7ZS95KnnnrKZ6sZ1lfJX/7ylzwexOerOtJkNmlsTv2kD4l7Vm3VV4n7VvWksaOtWzf+NTh1XUw9MBL3rdqpqxL3rCZWaWEuuOCCyiOPPFKZO3du5fHHH68MGTKk0q1bt8rixYvz+bPPPrvSr1+/ykMPPVR55plnKoMHD84bzauu5syZU7n88stzHaXzd999d2WHHXaoHHzwwdUudov09NNPV+rq6ipf//rXKy+++GLl9ttvr2y99daVH/7whw3XfPOb36x07tw519Wzzz5bOe644yoDBgyovPnmm1Ute0v0fvX12muvVS688MLK9OnT8+frwQcfrOy9996VD3/4w5UVK1ZUu/gtzurVq/N96eKLL37XOfes2qkv963mZfjw4ZV/+Id/qEydOjXXx5QpU/LvGF/60pcarnHfqo26cs/a9FpcYDzppJMqvXv3rrRr1y7/z5f20w/xeumHwL/8y79UPvCBD+RfoE444YTKggULqlrmluq96mrevHn5JtulS5dK+/btKx/60IcqF110UWXp0qXVLnaLdc8991R22223XB8DBw6s3HTTTY3Or1mzpnLppZdWevbsma857LDDKrNnz65aeVu696qvN954o3L44YdXunfvXmnbtm2lf//+lTPPPLOycOHCqpa5pbr//vvT6tTFz4t7Vu3Ul/tW87Js2bLKF7/4xRzut9pqqxzev/KVr1RWrlzZcI37Vm3UlXvWptcq/aepWy0BAACofS1+DCMAAABlAiMAAABFAiMAAABFAiMAAABFAiMAAABFAiMAAABFAiMAAABFAiMAAABFAiMAAABFAiMAAABFAiMAAABR8n8AH/e2vUe9KYgAAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import torch\n", "from leaspy.io.logs.visualization import Plotter\n", "from leaspy.io.outputs import Result\n", "\n", "n_individuals = 2000\n", "patient_ids = [str(i) for i in range(n_individuals)]\n", "\n", "df_longitudinal = pd.DataFrame({'ID': np.repeat(patient_ids, 3), 'TIME': np.tile([60, 70, 80], n_individuals), 'Y0': np.random.rand(n_individuals * 3)})\n", "data = Data.from_dataframe(df_longitudinal)\n", "\n", "df_cofactors = pd.DataFrame({'gender': np.random.choice(['Male', 'Female'], size=n_individuals)}, index=patient_ids)\n", "df_cofactors.index.name = 'ID'\n", "data.load_cofactors(df_cofactors, cofactors=['gender'])\n", "\n", "individual_parameters = {'tau': torch.tensor(np.random.normal(70, 5, (n_individuals, 1))), 'xi': torch.tensor(np.random.normal(0, 0.5, (n_individuals, 1)))}\n", "result_obj = Result(data, individual_parameters)\n", "\n", "Plotter().plot_distribution(result_obj, parameter='tau', cofactor='gender')" ] }, { "cell_type": "markdown", "id": "d35263b9", "metadata": {}, "source": [ "It is important to note that attaching external data to the class through `data.load_cofactors` is different from loading cofactors inside the model using `factory_kws`:\n", "\n", "| Feature | **Covariates** (via `factory_kws`) | **Cofactors** (via `load_cofactors`) |\n", "| :--- | :--- | :--- |\n", "| **Purpose** | Used **inside the model** to modulate parameters (e.g., in `CovariateLogisticModel`). | Used for **analysis/metadata** (e.g., plotting, stratification) but ignored by the model's math. |\n", "| **Loading** | Loaded **during** `Data` creation. | Loaded **after** `Data` creation. |\n", "| **Storage** | Stored as a `numpy.ndarray` in `individual.covariates`. | Stored as a `dict` in `individual.cofactors`. |\n", "| **Constraints** | Must be **integers**, constant per individual, and have no missing values. | Can be any type (strings, floats, etc.). |\n" ] }, { "cell_type": "markdown", "id": "689f4849", "metadata": {}, "source": [ "### **Best Practice:**\n", "Always create a `Data` object first. It validates your input and handles irregularities (missing visits, different timelines) gracefully.\n", "\n", "---\n", "\n", "## 2. The `Dataset` Class: The Internal Engine\n", "\n", "The `Dataset` class is the **high-performance numerical engine**. It converts the flexible `Data` object into rigid PyTorch Tensors optimized for mathematical computation.\n", "\n", "### What it does\n", "* **Tensorization**: Converts all values to PyTorch tensors.\n", "* **Padding**: Standardizes patient timelines by padding them to the maximum number of visits (creating a rectangular data block).\n", "* **Masking**: Creates a binary mask to distinguish real data from padding.\n", "\n", "### When to use it explicitly?\n", "You rarely need to instantiate `Dataset` yourself. However, it is useful for **optimization**:\n", "1. **Memory Efficiency**: For massive datasets, convert `Data` $\\to$ `Dataset` and delete the original `Data`/`DataFrame` to free up RAM.\n", "2. **Performance**: If you are running multiple models on the same data, creating a `Dataset` once prevents `leaspy` from repeating the conversion process for every `fit()` call.\n", "\n", "---\n", "\n", "## 3. Workflow & Best Practices\n", "\n", "### The Standard Workflow\n", "The most common and recommended workflow is straightforward:\n", "\n", "```\n", "[CSV / DataFrame] -> Data.from_dataframe() -> [Data Object] -> model.fit(data)\n", "```\n", "\n", "Inside `model.fit(data)`, Leaspy automatically converts your `Data` object into a `Dataset` for computation.\n", "\n", "### Guidelines for `model.fit()`\n", "\n", "| Input Type | Verdict | Reason |\n", "| :--- | :--- | :--- |\n", "| **`Data` Object** | ✅ **Recommended** | **Safe & Standard.** Handles all model types (including Joint models) correctly. Easy to inspect before fitting. |\n", "| **`Dataset` Object** | ⚡ **Optimization** | **Fast.** Use for heavy datasets or repeated experiments to skip internal conversion steps. |\n", "| **`pd.DataFrame`** | ❌ **Avoid** | **Risky.** Fails for complex models (e.g., `JointModel`) that require specific loading parameters. Leads to inconsistent code. |" ] } ], "metadata": { "kernelspec": { "display_name": "leaspy", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.19" } }, "nbformat": 4, "nbformat_minor": 5 }