Building a Real-Time Data Visualization of CitiBike NYC

by @jehiah on 2013-07-02 13:00UTC
Filed under: All , Data , Citibike , d3

New York City’s Bike share program (CitiBike) launched on May 27th. After being one of the first few hundred people to sign up for an annual membership (they now have 50k members) things got off to a rough start, and I got my bike key a week late. Since then though I have enjoyed riding citibike’s communiting to work, running errands, and along the westside greenway.

As with starting any large program, there are rough patches at first. I have my own complaints about searching for an empty bike station. I have been curious to understand how common problems are of empty or full bike stations. As I have encountered empty stations, I wanted a way to conceptuatilizing patterns in bikeshare usage. NYC has set daily ridership records that are 2x higher than DC’s which helps give some context, but it doesn’t help understand the relation between time of day and bike usage. Are all the bikes out at any given point in time? Are bikes used more in the morning or evening? Those are some of the questions I wanted to visualize the answers for.

CitiBike says on their website that they “look forward to sharing Citi Bike system data”. Unfortunately, so far they have released only summary blog posts. As a result, I have put together a visualization based on data used by their bike station map.

Click through on the image below for the live-updating status that might help in understanding New York’s bikeshare system.

citibikenyc status

Methodology

The dataset used is currently updated every 6 minutes from https://citibikenyc.com/stations/json/ which lists every station, it’s status, and the current status of the docks at that station.

Since there is no published data or counts of active bikes (data is only for the stations, and docks), I’m currently deriving an estimate for the total number of bikes by taking the maxximum number of docked bikes over the past week. I expect this max for docked bikes to happen during the middle of the night on one of the nights. From that number, I then estimate the number of active bikes by subtracting the number docked at any given time.

Unfortunately this doesn’t properly adjust the count of bikes as stations come online, or go offline, but it should be close enough to give a visual reference for bike activity.

Stations are then ranked according to utilization. Empty stations are bad, as are full stations because they either strand people, or force them to make unplanned excursions to return bikes. As many people have noticed, there are stations that are effectively full or empty but are not reflected properly in citibike’s systems and show as available or full on citibikes website when they are not. That carries over to this dataset.

Other CitiBike Data

After working on this visualization, I was pointed to some other awesome works showing the geographic dispersion of bikes throught the day. Both are worth checking out.

Subscribe via RSS ı Email
Jehiah Czebotar