Line Chart with Custom Confidence Interval in Altair

Line Chart with Custom Confidence Interval in Altair

Altair is a Python library for declarative statistical visualization. One of its key features is the ability to create visually appealing charts with ease. In this article, we’ll explore how to customize confidence intervals in line charts using Altair.

Confidence Intervals in Data Visualization

Confidence intervals (CIs) are a statistical measure that provides an estimated range for an unknown population parameter. In the context of visualization, CIs can be used to convey the uncertainty associated with data points or trends.

Why Use Confidence Intervals?

  • Show Data Variability: CIs highlight the potential range of values for the true population mean.
  • Improve Data Interpretation: They provide a more complete understanding of the data by incorporating uncertainty.
  • Enable Informed Decisions: CIs aid in making more informed decisions based on the data’s inherent variability.

Creating Line Charts with Confidence Intervals in Altair

Let’s dive into a practical example using Altair to generate a line chart with customized confidence intervals.

Example Dataset

We’ll use a hypothetical dataset representing monthly sales data for two products (A and B):

Month Product Sales
1 A 100
2 A 120
3 A 110
1 B 80
2 B 90
3 B 105

Code Implementation

Here’s the Altair code to create a line chart with confidence intervals:

 import altair as alt import pandas as pd # Sample Data data = pd.DataFrame({ "Month": [1, 2, 3] * 2, "Product": ["A"] * 3 + ["B"] * 3, "Sales": [100, 120, 110, 80, 90, 105] }) # Calculate Confidence Intervals confidence_intervals = data.groupby("Product")["Sales"].agg(["mean", "std"]) confidence_intervals["upper"] = confidence_intervals["mean"] + 1.96 * confidence_intervals["std"] confidence_intervals["lower"] = confidence_intervals["mean"] - 1.96 * confidence_intervals["std"] # Create Line Chart with Confidence Bands alt.Chart(data).mark_line().encode( x="Month:N", y="Sales:Q", color="Product:N", tooltip=["Product", "Month", "Sales"] ).transform_calculate( mean="datum.Sales - (datum.Sales - mean(datum.Sales))" ).transform_aggregate( mean="mean(Sales)", groupby=["Product"] ).transform_lookup( lookup="Product", from_=alt.LookupData(data=confidence_intervals, key="Product", fields=["upper", "lower"]) ).mark_area(opacity=0.2).encode( x="Month:N", y="upper:Q", y2="lower:Q", color="Product:N", ).interactive() 

Explanation

  1. Import Libraries: Import necessary libraries, namely altair and pandas.
  2. Data Preparation: Create a Pandas DataFrame to store the sample sales data.
  3. Calculate Confidence Intervals: Group the data by product, calculate the mean and standard deviation for each product. Compute the upper and lower bounds of the confidence interval using a z-score of 1.96 (for a 95% confidence level).
  4. Line Chart Creation:
    • Use alt.Chart(data).mark_line() to create a line chart with the original data.
    • Use encode() to map the data columns to chart elements (x-axis, y-axis, color, tooltip).
    • Use transform_calculate() to create a “mean” field that helps with confidence band calculation.
    • Use transform_aggregate() to calculate the mean for each product.
    • Use transform_lookup() to join the confidence interval bounds with the main data.
    • Use mark_area() to draw the confidence bands as shaded areas.
    • Use encode() to map the confidence bounds to the y-axis and fill color.
  5. Interactivity: Add .interactive() to enable interactive features like zooming and panning.

Output

The code generates a line chart that shows the monthly sales trend for each product, along with the confidence bands representing the estimated range of values. The shaded areas provide a visual indication of the uncertainty associated with the data.

 <vega-lite-view> { "config": { "view": { "width": 400, "height": 200 } }, "data": { "values": [ { "Month": 1, "Product": "A", "Sales": 100 }, { "Month": 2, "Product": "A", "Sales": 120 }, { "Month": 3, "Product": "A", "Sales": 110 }, { "Month": 1, "Product": "B", "Sales": 80 }, { "Month": 2, "Product": "B", "Sales": 90 }, { "Month": 3, "Product": "B", "Sales": 105 } ] }, "mark": "line", "encoding": { "x": { "field": "Month", "type": "nominal", "title": "Month" }, "y": { "field": "Sales", "type": "quantitative", "title": "Sales" }, "color": { "field": "Product", "type": "nominal", "title": "Product" }, "tooltip": [ { "field": "Product", "type": "nominal" }, { "field": "Month", "type": "nominal" }, { "field": "Sales", "type": "quantitative" } ] }, "transform": [ { "calculate": "datum.Sales - (datum.Sales - mean(datum.Sales))", "as": "mean" }, { "aggregate": [ { "op": "mean", "field": "Sales", "as": "mean" } ], "groupby": [ "Product" ] }, { "lookup": "Product", "from": { "data": { "values": [ { "Product": "A", "mean": 110, "std": 10, "upper": 130.04, "lower": 89.96 }, { "Product": "B", "mean": 91.67, "std": 12.5, "upper": 116.67, "lower": 66.67 } ] }, "key": "Product", "fields": [ "upper", "lower" ] } } ], "mark": "area", "encoding": { "x": { "field": "Month", "type": "nominal", "title": "Month" }, "y": { "field": "upper", "type": "quantitative", "title": "Sales" }, "y2": { "field": "lower", "type": "quantitative" }, "color": { "field": "Product", "type": "nominal", "title": "Product" } }, "opacity": 0.2 } </vega-lite-view> 

Conclusion

By incorporating confidence intervals into line charts using Altair, we can provide a more nuanced and insightful visualization of data trends. This enhanced visual representation helps us understand data variability and make more informed decisions.

Leave a Reply

Your email address will not be published. Required fields are marked *