Credit:¶

Lyft First Data Exploration: https://www.kaggle.com/gpreda/lyft-first-data-exploration

1. Introduction¶

This is an Exploratory Data Analysis (EDA) Kernel for Lyft Motion Prediction for Autonomous Vehicles competition dataset.

We start with the analysis preparation, which requires, for this competition, to install and load several packages for load and manage l5kit dataset.
We follow with data exploration, reviewing the agents, the scenes, the frames and following with inspection of the animated scenes.

2. Analysis preparation¶

2.1. Install & load packages¶

We will have to install l5kit to access the data.

import os
import numpy as np
import pandas as pd
from l5kit.data import ChunkedDataset, LocalDataManager
from l5kit.dataset import EgoDataset, AgentDataset
from l5kit.rasterization import build_rasterizer
from l5kit.configs import load_config_data
from l5kit.visualization import draw_trajectory, TARGET_POINTS_COLOR
from l5kit.geometry import transform_points
from l5kit.data import PERCEPTION_LABELS
from tqdm import tqdm
from collections import Counter
import matplotlib.pyplot as plt
import seaborn as sns 
from matplotlib import animation, rc
from matplotlib.ticker import MultipleLocator
from IPython.display import display, clear_output
import PIL
from IPython.display import HTML

rc('animation', html='jshtml')

2.2. Configuration¶

We set the local dataset configurations before accessing it. We set the path for the l5kit data folder by setting the environment variable L5KIT_DATA_FOLDER and we load the lyft configuration files from a yaml file from an external dataset.

os.environ["L5KIT_DATA_FOLDER"] = "/kaggle/input/lyft-motion-prediction-autonomous-vehicles"
cfg = load_config_data("/kaggle/input/lyft-config-files/visualisation_config.yaml")
print(cfg)

{'format_version': 4, 'model_params': {'model_architecture': 'resnet50', 'history_num_frames': 0, 'history_step_size': 1, 'history_delta_time': 0.1, 'future_num_frames': 50, 'future_step_size': 1, 'future_delta_time': 0.1}, 'raster_params': {'raster_size': [224, 224], 'pixel_size': [0.5, 0.5], 'ego_center': [0.25, 0.5], 'map_type': 'py_semantic', 'satellite_map_key': 'aerial_map/aerial_map.png', 'semantic_map_key': 'semantic_map/semantic_map.pb', 'dataset_meta_key': 'meta.json', 'filter_agents_threshold': 0.5}, 'val_data_loader': {'key': 'scenes/sample.zarr', 'batch_size': 12, 'shuffle': False, 'num_workers': 16}}

3. Load data¶

We load, using the l5kit local data manager, the dataset. L5kit uses zarr data format; here the arrays are divided into chunks and compressed.

# local data manager
dm = LocalDataManager()
# set dataset path
dataset_path = dm.require(cfg["val_data_loader"]["key"])
# load the dataset; this is a zarr format, chunked dataset
chunked_dataset = ChunkedDataset(dataset_path)
# open the dataset
chunked_dataset.open()
print(chunked_dataset)

+------------+------------+------------+---------------+-----------------+----------------------+----------------------+----------------------+---------------------+
| Num Scenes | Num Frames | Num Agents | Num TR lights | Total Time (hr) | Avg Frames per Scene | Avg Agents per Frame | Avg Scene Time (sec) | Avg Frame frequency |
+------------+------------+------------+---------------+-----------------+----------------------+----------------------+----------------------+---------------------+
|    100     |   24838    |  1893736   |     316008    |       0.69      |        248.38        |        76.24         |        24.83         |        10.00        |
+------------+------------+------------+---------------+-----------------+----------------------+----------------------+----------------------+---------------------+

4. Data exploration¶

4.1. Explore the dataset¶

We load and show entities in the dataset.

4.1.1. Agents¶

We start with the agents.

agents = chunked_dataset.agents
agents_df = pd.DataFrame(agents)
agents_df.columns = ["data"]; features = ['centroid', 'extent', 'yaw', 'velocity', 'track_id', 'label_probabilities']

for i, feature in enumerate(features):
    agents_df[feature] = agents_df['data'].apply(lambda x: x[i])
agents_df.drop(columns=["data"],inplace=True)
print(f"agents dataset: {agents_df.shape}")
agents_df.head()

agents dataset: (1893736, 6)

The fields in the agents dataset are the following:

centroid - the agent position (in plane - two dimmensions);
extent - the agent dimmensions (three dimmensions, let's called length, width, height);
yaw - the agent oscilation/twist about the vertical plane;
velocity - the speed of the agent - in euclidian space;
track_id - index of track associated to the agent;
label_probabilities - gives the probability for the agent to belong to one of 17 different agent type; we will explore these labels in a moment;

Let's look to the distribution of few of these values.

Centroid distribution¶

agents_df['cx'] = agents_df['centroid'].apply(lambda x: x[0])
agents_df['cy'] = agents_df['centroid'].apply(lambda x: x[1])

fig, ax = plt.subplots(1,1,figsize=(8,8))
plt.scatter(agents_df['cx'], agents_df['cy'], marker='+')
plt.xlabel('x', fontsize=11); plt.ylabel('y', fontsize=11)
plt.title("Centroids distribution")
plt.show()

Extent distribution¶

agents_df['ex'] = agents_df['extent'].apply(lambda x: x[0])
agents_df['ey'] = agents_df['extent'].apply(lambda x: x[1])
agents_df['ez'] = agents_df['extent'].apply(lambda x: x[2])

sns.set_style('whitegrid')

fig, ax = plt.subplots(1,3,figsize=(16,5))
plt.subplot(1,3,1)
plt.scatter(agents_df['ex'], agents_df['ey'], marker='+')
plt.xlabel('ex', fontsize=11); plt.ylabel('ey', fontsize=11)
plt.title("Extent: ex-ey")
plt.subplot(1,3,2)
plt.scatter(agents_df['ey'], agents_df['ez'], marker='+', color="red")
plt.xlabel('ey', fontsize=11); plt.ylabel('ez', fontsize=11)
plt.title("Extent: ey-ez")
plt.subplot(1,3,3)
plt.scatter(agents_df['ez'], agents_df['ex'], marker='+', color="green")
plt.xlabel('ez', fontsize=11); plt.ylabel('ex', fontsize=11)
plt.title("Extent: ez-ex")
plt.show();

Yaw¶

Let's see yaw distribution.

fig, ax = plt.subplots(1,1,figsize=(8,8))
sns.distplot(agents_df['yaw'],color="magenta")
plt.title("Yaw distribution")
plt.show()

Velocity¶

Let's look to velocity distribution.

agents_df['vx'] = agents_df['velocity'].apply(lambda x: x[0])
agents_df['vy'] = agents_df['velocity'].apply(lambda x: x[1])

fig, ax = plt.subplots(1,1,figsize=(8,8))
plt.title("Velocity distribution")
plt.scatter(agents_df['vx'], agents_df['vy'], marker='.', color="red")
plt.xlabel('vx', fontsize=11); plt.ylabel('vy', fontsize=11)
plt.show();

track id¶

print("Number of tracks: ", agents_df.track_id.nunique())
print("Entries per track id (first 10): \n", agents_df.track_id.value_counts()[0:10])

Number of tracks:  2547
Entries per track id (first 10): 
 1     14922
2     12377
3     10179
5      9108
6      8605
4      8224
9      7927
7      7371
10     7345
8      7050
Name: track_id, dtype: int64

Let's look to the distribution of the label probabilities.

probabilities = agents["label_probabilities"]
labels_indexes = np.argmax(probabilities, axis=1)
counts = []
for idx_label, label in enumerate(PERCEPTION_LABELS):
    counts.append(np.sum(labels_indexes == idx_label))

agents_df = pd.DataFrame()
for count, label in zip(counts, PERCEPTION_LABELS):
    agents_df = agents_df.append(pd.DataFrame({'label':label, 'count':count},index=[0]))
agents_df = agents_df.reset_index().drop(columns=['index'], axis=1)

print(f"agents probabilities dataset: {agents_df.shape}")
agents_df

agents probabilities dataset: (17, 2)

There are 4 different active agents present in the dataset, as following:

PERCEPTION_LABEL_UNKNOWN - majority;
PERCEPTION_LABEL_CAR;
PERCEPTION_LABEL_CYCLIST;
PERCEPTION_LABEL_PEDESTRIAN.

Let's look to their distribution:

f, ax = plt.subplots(1,1, figsize=(10,4))
plt.scatter(agents_df['label'], agents_df['count']+1, marker='*')
plt.xticks(rotation=90, size=8)
plt.xlabel('Perception label')
plt.ylabel(f'Agents count')
plt.title("Agents perception label values count distribution")
plt.grid(True)
ax.set(yscale="log")
plt.show()

4.1.2. Scenes¶

Let's look now to the scenes.

scenes = chunked_dataset.scenes
scenes_df = pd.DataFrame(scenes)
scenes_df.columns = ["data"]; features = ['frame_index_interval', 'host', 'start_time', 'end_time']
for i, feature in enumerate(features):
    scenes_df[feature] = scenes_df['data'].apply(lambda x: x[i])
scenes_df.drop(columns=["data"],inplace=True)
print(f"scenes dataset: {scenes_df.shape}")
scenes_df.head()

scenes dataset: (100, 4)

f, ax = plt.subplots(1,1, figsize=(6,4))
sns.countplot(scenes_df.host)
plt.xlabel('Host')
plt.ylabel(f'Count')
plt.title("Scenes host count distribution")
plt.show()

Let's show the scenes frame index succesion, on the same graph with the host.

scenes_df['frame_index_start'] = scenes_df['frame_index_interval'].apply(lambda x: x[0])
scenes_df['frame_index_end'] = scenes_df['frame_index_interval'].apply(lambda x: x[1])
scenes_df.head()

f, ax = plt.subplots(1,1, figsize=(8,8))
spacing = 498
minorLocator = MultipleLocator(spacing)
ax.yaxis.set_minor_locator(minorLocator)
ax.xaxis.set_minor_locator(minorLocator)
plt.xlabel('Start frame index')
plt.ylabel(f'End frame index')
plt.grid(which = 'minor')
plt.title("Frames scenes start and end index (grouped per host)")
sns.scatterplot(scenes_df['frame_index_start'], scenes_df['frame_index_end'], marker='|',  hue=scenes_df['host'])
plt.show()

4.1.3. Frames¶

We are now looking to the frames.

frames_df = pd.DataFrame(chunked_dataset.frames)
frames_df.columns = ["data"]; features = ['timestamp', 'agent_index_interval', 'traffic_light_faces_index_interval', 
                                          'ego_translation','ego_rotation']
for i, feature in enumerate(features):
    frames_df[feature] = frames_df['data'].apply(lambda x: x[i])
frames_df.drop(columns=["data"],inplace=True)
print(f"frames dataset: {frames_df.shape}")
frames_df.head()

frames dataset: (24838, 5)

The frames are described by:

timestamp;
agent index interval;
traffic light faces index interval;
ego translation;
ego rotation;

Let's look to ego translations.

Ego translations¶

frames_df['dx'] = frames_df['ego_translation'].apply(lambda x: x[0])
frames_df['dy'] = frames_df['ego_translation'].apply(lambda x: x[1])
frames_df['dz'] = frames_df['ego_translation'].apply(lambda x: x[2])

sns.set_style('whitegrid')
plt.figure()

fig, ax = plt.subplots(1,3,figsize=(16,5))

plt.subplot(1,3,1)
plt.scatter(frames_df['dx'], frames_df['dy'], marker='+')
plt.xlabel('dx', fontsize=11); plt.ylabel('dy', fontsize=11)
plt.title("Translations: dx-dy")
plt.subplot(1,3,2)
plt.scatter(frames_df['dy'], frames_df['dz'], marker='+', color="red")
plt.xlabel('dy', fontsize=11); plt.ylabel('dz', fontsize=11)
plt.title("Translations: dy-dz")
plt.subplot(1,3,3)
plt.scatter(frames_df['dz'], frames_df['dx'], marker='+', color="green")
plt.xlabel('dz', fontsize=11); plt.ylabel('dx', fontsize=11)
plt.title("Translations: dz-dx")

fig.suptitle("Ego translations in 2D planes of the 3 components (dx,dy,dz)", size=14)
plt.show();

<Figure size 432x288 with 0 Axes>

fig, ax = plt.subplots(1,3,figsize=(16,5))
colors = ['magenta', 'orange', 'darkblue']; labels= ["dx", "dy", "dz"]
for i in range(0,3):
    df = frames_df['ego_translation'].apply(lambda x: x[i])
    plt.subplot(1,3,i + 1)
    sns.distplot(df, hist=False, color = colors[ i ])
    plt.xlabel(labels[i])
fig.suptitle("Ego translations distribution", size=14)
plt.show()

Ego rotations¶

Let's also plot Ego rotations components distributions. The rotation matrix is 3 x 3.

fig, ax = plt.subplots(3,3,figsize=(16,16))
colors = ['red', 'blue', 'green', 'magenta', 'orange', 'darkblue', 'black', 'cyan', 'darkgreen']
for i in range(0,3):
    for j in range(0,3):
        df = frames_df['ego_rotation'].apply(lambda x: x[i][j])
        plt.subplot(3,3,i * 3 + j + 1)
        sns.distplot(df, hist=False, color = colors[ i * 3 + j  ])
        plt.xlabel(f'r[ {i + 1} ][ {j + 1} ]')
fig.suptitle("Ego rotation angles distribution", size=14)
plt.show()

Trafic lights faces index interval¶

frames_df['tlfii0'] = frames_df['traffic_light_faces_index_interval'].apply(lambda x: x[0])
frames_df['tlfii1'] = frames_df['traffic_light_faces_index_interval'].apply(lambda x: x[1])
sns.set_style('whitegrid')
plt.figure()
fig, ax = plt.subplots(1,1,figsize=(8,8))
plt.scatter(frames_df['tlfii0'], frames_df['tlfii1'], marker='+')
plt.xlabel('Trafic lights faces index interval [0]', fontsize=11); plt.ylabel('Trafic lights faces index interval [1]', fontsize=11)
plt.title("Trafic lights faces index interval")
plt.show()

<Figure size 432x288 with 0 Axes>

Agents index interval¶

fig, ax = plt.subplots(1, 2, figsize=(12, 5))
colors = ['cyan', 'darkgreen']
for i in range(0,2):
    df = frames_df['agent_index_interval'].apply(lambda x: x[i])
    plt.subplot(1, 2, i + 1)
    sns.distplot(df, hist=False, color = colors[ i ])
    plt.xlabel(f'agent index interval [ {i} ]')
fig.suptitle("Agent index interval", size=14)
plt.show()

5. References¶

[1] Lyft Understanding the data and EDA, https://www.kaggle.com/nxrprime/lyft-understanding-the-data-and-eda
[2] Lyft Scenes Visualizations, https://www.kaggle.com/jpbremer/lyft-scene-visualisations/
[3] Lyft l5kit, https://github.com/lyft/l5kit
[4] Lyft l5kit data visualization, https://github.com/lyft/l5kit/blob/master/examples/visualisation/visualise_data.ipynb
[5] Lyft l5kit agents motion prediction, https://github.com/lyft/l5kit/blob/master/examples/agent_motion_prediction/agent_motion_prediction.ipynb

	centroid	extent	yaw	velocity	track_id	label_probabilities
0	[665.0342407226562, -2207.51220703125]	[4.3913283, 1.8138304, 1.5909758]	1.016675	[0.0, 0.0]	1	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
1	[717.6612548828125, -2173.760009765625]	[5.150925, 1.9530917, 2.04021]	-0.783224	[0.0, 0.0]	2	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
2	[730.681396484375, -2180.678955078125]	[2.9482825, 1.4842174, 1.1125067]	-0.321747	[0.0, 0.0]	3	[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
3	[671.2536010742188, -2204.745361328125]	[1.7067024, 0.9287868, 0.6282158]	0.785501	[0.0, 0.0]	4	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
4	[669.7763061523438, -2213.004638671875]	[0.25109944, 0.6343781, 1.654377]	1.492359	[0.0, 0.0]	5	[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...

	label	count
0	PERCEPTION_LABEL_NOT_SET	0
1	PERCEPTION_LABEL_UNKNOWN	1324481
2	PERCEPTION_LABEL_DONTCARE	0
3	PERCEPTION_LABEL_CAR	519385
4	PERCEPTION_LABEL_VAN	0
5	PERCEPTION_LABEL_TRAM	0
6	PERCEPTION_LABEL_BUS	0
7	PERCEPTION_LABEL_TRUCK	0
8	PERCEPTION_LABEL_EMERGENCY_VEHICLE	0
9	PERCEPTION_LABEL_OTHER_VEHICLE	0
10	PERCEPTION_LABEL_BICYCLE	0
11	PERCEPTION_LABEL_MOTORCYCLE	0
12	PERCEPTION_LABEL_CYCLIST	6688
13	PERCEPTION_LABEL_MOTORCYCLIST	0
14	PERCEPTION_LABEL_PEDESTRIAN	43182
15	PERCEPTION_LABEL_ANIMAL	0
16	AVRESEARCH_LABEL_DONTCARE	0

	frame_index_interval	host	start_time	end_time
0	[0, 248]	host-a013	1572643684617362176	1572643709617362176
1	[248, 497]	host-a013	1572643749559148288	1572643774559148288
2	[497, 746]	host-a013	1572643774559148288	1572643799559148288
3	[746, 995]	host-a013	1572643799559148288	1572643824559148288
4	[995, 1244]	host-a013	1572643824559148288	1572643849559148288

	frame_index_interval	host	start_time	end_time	frame_index_start	frame_index_end
0	[0, 248]	host-a013	1572643684617362176	1572643709617362176	0	248
1	[248, 497]	host-a013	1572643749559148288	1572643774559148288	248	497
2	[497, 746]	host-a013	1572643774559148288	1572643799559148288	497	746
3	[746, 995]	host-a013	1572643799559148288	1572643824559148288	746	995
4	[995, 1244]	host-a013	1572643824559148288	1572643849559148288	995	1244

	timestamp	agent_index_interval	traffic_light_faces_index_interval	ego_translation	ego_rotation
0	1572643684801892606	[0, 38]	[0, 0]	[680.6197509765625, -2183.32763671875, 288.541...	[[0.5467331409454346, -0.837294340133667, 0.00...
1	1572643684901714926	[38, 85]	[0, 0]	[681.1856079101562, -2182.42236328125, 288.608...	[[0.5470812916755676, -0.837059736251831, 0.00...
2	1572643685001499246	[85, 142]	[0, 0]	[681.7647094726562, -2181.522705078125, 288.68...	[[0.5479603409767151, -0.8364874720573425, 0.0...
3	1572643685101394026	[142, 200]	[0, 0]	[682.3414306640625, -2180.624267578125, 288.75...	[[0.5491225123405457, -0.8357341885566711, 0.0...
4	1572643685201412346	[200, 254]	[0, 0]	[682.9197998046875, -2179.73046875, 288.827392...	[[0.5504215955734253, -0.8348868489265442, -7....