Part 2: Using Big Data and Data Science to Solve Traffic Congestion Woes in Developing Countries – Overview of Design Architecture & Resolving Typical Data Acquistion Problems - TenPoint7
18304
post-template-default,single,single-post,postid-18304,single-format-standard,bridge-core-3.0.1,qode-page-transition-enabled,ajax_fade,page_not_loaded,,qode-title-hidden,qode-theme-ver-30.4,qode-theme-bridge,wpb-js-composer js-comp-ver-6.7.0,vc_responsive

Part 2: Using Big Data and Data Science to Solve Traffic Congestion Woes in Developing Countries – Overview of Design Architecture & Resolving Typical Data Acquistion Problems

In my last blog post, I introduced the overall inspiration and objectives of the Intelligent Transportation System (ITS) that is being designed for Ho Chi Minh City, Vietnam.

In this followup blog, I thought I would share some high level design aspects of the ITS framework, and also briefly highlight some of the data acquisition issues, such as noise, that were encountered and methods that were utlized to address such noise.

In that previous post, I mentioned three main business & operational goals of this undertaking:

  1. Ease of Deployment
  2. Cost Efficient System
  3. Effectiveness

Associated with the above are two broader technical objectives:

  1. Efficient procedure of collecting and processing traffic data
  2. Effective traffic regulation and availabilty of traffic information to online end-users

ITS Framework

The diagram below depicts the high level ITS framework being considered:

High-Level ITS Framework Diagram

The three main components of the framework include:

  • Vehicles and GPS devices (Block A)
  • Control Center hosting all collected data and algorithms (Block B)
  • Regulators (Block C)

Data acquisition from multiple data sources (Block A)

ITS supports collecting traffic data from multiple available devices that include GPS devices on cars, buses, taxis, as well as traffic cameras and sensor systems. Widely used wired and wireless technologies for communicating between local devices and Control Center is another data source. Furthermore, GPS fitted mopeds are also used to collect traffic data given the popularity of that mode of transportation in Vietnam. ITS also includes a GPS mobile application that will serve as another additional data source, subject to local privacy laws.

Addressing data issues and propagating traffic information to End-Users and Regulators: role of the Control Center (Block B)

Since traffic data is collected from multiple data sources, preprocessing and standardization of data needs to occur such that unified and clean (i.e. less noisy) data is made available for downstream processing. Some existing algorithms and techniques were leveraged for data acquistion and mapping, such as the online/offline Map-Matching Algorithm (MMA) corresponding to online/offline traffic data.

Raw GPS data acquired is cleaned before being mapped onto the digital map extracted from an OSM dataset (OSM stands for OpenStreetMap, an open source project aimed to create a free editable map of the world). We considered using MMA because there are several noise issues with raw GPS data such as delayed/lost data (due to satellite latency) and deviation of data off-map. The latter is a frequent problem as vehicles navigate through the city with many high buildings obstructing GPS signals (check out the right map image below depicting this deviation).

Comparison of data using MMA

The picture on the left is after using MMA, and the right is before using MMA

To transform raw GPS data to actual coordinates on the road and resolve such issues, a hidden Markov Model was utilized for offline map-matching (check out the left map above for the transformation). For online MMA, we simply used a “k- nearest distance algorithm” for this task.

I will cover more details on these models in my next blog post.

Aside from data and algorithm processing, the Control Center is also responsible for supporting self-regulation by sending traffic information to travellers (end-users) via multiple channels, such as website and mobile applications, allowing them to make better informed decisions.

Decentralized traffic control system – the Regulators (Block C)

Part of the ITS framework includes a decentralized traffic control set-up that would co-ordinate traffic management across the city –  a more suitable, efficient and cost effective method for SE Asian cities than some of the current centralized or legacy manual methods deployed. ITS’ goal is to facilitate traffic regulation via transferring traffic information to multiple regulators, e.g. traffic lights and other traffic systems, in order to coordinate traffic flows which reduce congestion across the city.

To achieve this, predicting traffic flows is a critical and necessary output of the ITS framework. After a reasonable amount of de-noising is completed on acquired data, Machine Learning algorithms (curently being designed) take over to analyze and predict traffic flows.

Since data cleansing is such a universal issue with most data projects at scale, I will share some more details on my next blog post on how we effectively used a hidden Markov Model and kNN to address some of the typical data issues encountered and mentioned earlier. Stay tuned.

An Mai, PhD (Data Scientist & Researcher, An@TenPoint7.com)



TenPoint7