## Automating the Daily Grind of Data Science
According to a [data science report by Anaconda](https://www.anaconda.com/resources/whitepaper/state-of-data-science-report-2022?utm_source=imaginepro.ai), data scientists spend a staggering 60% of their time just cleaning and organizing data. These routine, time-consuming tasks are perfect candidates for automation with an AI assistant like ChatGPT.
This article provides a practical guide on how to offload five common data science tasks to ChatGPT using effective prompts. We'll use a real-world data project from Gett, a London-based taxi app, to demonstrate how these steps work in practice.

*Image by Author | Canva*
## Case Study: Analyzing Failed Ride Orders from Gett
In [this data project](https://platform.stratascratch.com/data-projects/insights-failed-orders?utm_source=blog&utm_medium=click&utm_campaign=kdn+routine+tasks+that+chatgpt+can+handle&utm_source=imaginepro.ai), the challenge is to analyze failed ride orders for Gett to understand why some customers did not successfully get a car. Here is a description of the dataset provided:

We will now walk through a five-step process to show how ChatGPT can handle the routine tasks involved in this data project.

### Step 1: Data Exploration and Analysis
Every data exploration starts with the same commands: `.head()`, `.info()`, and `.describe()`. We can instruct ChatGPT to run these for us by providing the project description and the dataset.

Use the following prompt, pasting the project description found [here](https://platform.stratascratch.com/data-projects/insights-failed-orders?utm_source=blog&utm_medium=click&utm_campaign=kdn+routine+tasks+that+chatgpt+can+handle&utm_source=imaginepro.ai):
Here is the data project description: [paste here]
Perform basic EDA, show head, info, and summary stats, missing values, and correlation heatmap.
ChatGPT quickly provides a summary, highlights key columns, identifies missing values, and generates a correlation heatmap.

### Step 2: Data Cleaning
Our initial exploration revealed missing values in both datasets.

Let's ask ChatGPT to handle this with a clear prompt:
Clean this dataset: identify and handle missing values appropriately (e.g., drop or impute based on context). Provide a summary of the cleaning steps.
ChatGPT then provides a summary of its actions, which include converting date columns, dropping invalid orders, and imputing missing values for `m_order_eta`.

### Step 3: Generate Visualizations
To create effective visualizations, we can guide ChatGPT using a technique called [Retrieval-Augmented Generation](https://arxiv.org/abs/2005.11401?utm_source=imaginepro.ai). We provide a link to a resource on choosing the right plots, like [this article](https://www.stratascratch.com/blog/using-visualizations-for-your-exploratory-data-analysis/?utm_source=blog&utm_medium=click&utm_campaign=kdn+routine+tasks+that+chatgpt+can+handle&utm_source=imaginepro.ai), and ask it to apply that knowledge.
Before generating visualizations, read this article on choosing the right plots for different data types and distributions: [LINK]. Then, show most suitable visualizations for this dataset and explain why each was selected and produce the plots in this chat by running code on the dataset.
ChatGPT generated six different graphs, each with a justification for its selection and an explanation of the insights.


### Step 4: Prepare Data for Machine Learning
With our data cleaned and explored, it's time for ML preparation. This involves tasks like [encoding categorical variables](https://medium.com/aiskunks/categorical-data-encoding-techniques-d6296697a40f?utm_source=imaginepro.ai) and [scaling numerical features](https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/?utm_source=imaginepro.ai).
Here is the prompt we use:
> Prepare this dataset for machine learning: encode categorical variables, scale numerical features, and return a clean DataFrame ready for modeling. Briefly explain each step.
ChatGPT processes the data and confirms that the features have been scaled and encoded, making the dataset ready for modeling.

### Step 5: Apply a Machine Learning Model
For the final step, [machine learning modeling](https://www.stratascratch.com/blog/machine-learning-modeling/?utm_source=blog&utm_medium=click&utm_campaign=kdn+routine+tasks+that+chatgpt+can+handle&utm_source=imaginepro.ai), we can use a structured prompt to guide the AI.
> Use this dataset to predict order_status_key. Apply a multiclass classification model (e.g., Random Forest), and report evaluation metrics like accuracy, precision, recall, and F1-score. Use only the 5 most relevant features and explain your modeling steps.
After running the prompt, ChatGPT delivers the results, including feature selection, model explanation, and performance metrics.

## Bonus: Automating the Workflow with Gemini CLI
Google's Gemini has an [open-source agent](https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/?utm_source=imaginepro.ai) that you can interact with from your terminal. It offers a generous free tier for running commands.
First, install the CLI:
sudo npm install -g @google/gemini-cli
Then, start it with:
gemini

We can use Gemini CLI to build a [Streamlit](https://streamlit.io/?utm_source=imaginepro.ai) app that automates all five steps we just covered. By feeding it a detailed prompt outlining the entire workflow, Gemini will write the code and run the app for you.

After a few approvals, a complete Streamlit app is ready to go.

Here is the app in action:

## Final Thoughts
In this walkthrough, we used ChatGPT to handle routine data science tasks from cleaning and exploration to modeling. We then took it a step further, using Gemini CLI to build a dashboard that automates the entire process.
By leveraging AI for these repetitive steps in a real data [project from Gett](https://platform.stratascratch.com/data-projects/insights-failed-orders?utm_source=blog&utm_medium=click&utm_campaign=kdn+routine+tasks+that+chatgpt+can+handle&utm_source=imaginepro.ai), you can save significant time and focus on more strategic analysis. While AI isn't perfect, it's an invaluable tool for streamlining your workflow.
---
**Nate Rosidi** is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL. You can follow him on [Twitter](https://twitter.com/StrataScratch?utm_source=imaginepro.ai).
Data Science
ChatGPT
Automation