Prepared for the Intelligent Transportation Systems (ITS) Joint Program Office (JPO)
Research and Innovative Technology Administration
United States Department of Transportation (USDOT)
Federal Highway Administration (FHWA)
Federal Transit Administration (FTA)
Management and control of transportation systems are dependent on data sources describing the performance of the system (e.g., measured average facility speed) and the state of system controls (e.g., the currently active signal timing plan). Because managers cannot simply look out of a window and understand how well a large and complex system is performing, the nature and quality of data streaming from available sources to a large degree dictate how transportation systems can be understood and managed. Traditional data streams for transportation system management are based on the deployment of infrastructure-based sensors passively detecting vehicles as they pass selected locations in the system. This infrastructure-based, passive-acquisition data paradigm has enabled a range of familiar infrastructure-centric control systems for functions like signal control, incident management, and congestion monitoring.
Several concurrent technological trends have the potential to reshape the traditional infrastructure-based, passive acquisition data paradigm. First, vehicles and hand-held devices are increasingly capable of systematically collecting and communicating a broad range of probe data. Probe data inherently describe the motion and state of mobile entities in the system (as opposed to data describing entities passing fixed locations). Vehicles capable of acting as probes span the full range of light vehicles, roadway and rail transit vehicles, and freight carriers. Second, modern wireless communication technology permits an active exchange of data with and between vehicles, travelers, roadside devices, and system operators. An active paradigm allows for a systematic yet dynamic and selective exchange of vehicle status and traveler behavior data. This is inherently different than the uniform capture of vehicle location and speed data generated through passive detection. Integrating probe data sources with traditional infrastructure-based data sources into multi-source data streams may enhance the capability of current forms of system control or significantly reduce their costs. Multi-source data streams may also enable nontraditional, transformative forms of system control and management. These transformative forms of management may have the potential to increase system productivity and traveler mobility significantly while concurrently reducing environmental and safety impacts.
The U.S. DOT ITS JPO is engaged in assessing the potential of the multi-source, active-acquisition data paradigm to enhance current operational practices and transform future surface transportation systems management. This research is a collaborative initiative spanning the Intelligent Transportation Systems Joint Program Office , Federal Highway Administration, the Federal Transit Administration , and the Federal Motor Carrier Safety Administration (FMCSA). One foundational element of connected vehicle research is the Real-Time Data Capture and Management Program. Program objectives include:
- Enable systematic data capture from vehicles, mobile devices, and infrastructure;
- Develop data environments that enable the integration of data from multiple sources for use in transportation management and performance measurement;
- Reduce costs of data management and eliminate technical and institutional barriers to the capture, management, and sharing of data
The Real-Time Data Capture and Management Program plays a key role supporting other connected vehicle initiatives identified in strategic plans in the areas of Safety, Mobility, and Environment. Many of these initiatives will require systematic capture and management of data over time to realize their objectives. The cross-cutting Real-Time Data Capture and Management Program is chartered to coordinate across these initiatives to identify comprehensive data needs. Further, the Real-Time Data Capture and Management Program is responsible for designing laboratory experiments and field tests that meet these identified data needs in the most cost-effective way. Data collected in these experiments and tests will be systematically structured and documented. The resulting well-documented, distributed data resource will allow data captured from diverse sources to be integrated, shared, and leveraged by a broad range of researchers, private sector partners, and system operators. Without a cross-cutting data capture and management program, the alternative is ad hoc data collection and management by each initiative. Such an approach will result in the funding of redundant data collection activities, supporting isolated analysts working with individual data sets targeting a single issue. Further, a cross-cutting data capture and management program could serve a key role in motivating and defining emerging standards or data-related rule-making.
The vision of the connected vehicle Real-Time Data Capture and Management program is to enhance current operational practices and transform future transportation systems management through the active acquisition and systematic provision of integrated data from fixed sensors, vehicles and travelers to researchers, application developers, and system operators.
This purpose of this document is to provide a vision for the Real-Time Data Capture and Management Program, including a description of the program objectives, key issues, and projected outcomes. In addition, the document serves to place the proposed program in context with other ongoing or planned federal initiatives. The document outlines a series of three phases envisioned for the program and how activities in these phases support the set of desired program outcomes.
One core concept in the Real-Time Data Capture and Management Program is the data environment. A data environment is defined as:
- a well-organized collection of data of specific type and quality,
- captured and stored at regular intervals from one or more sources,
- systematically shared in support of one or more applications
The data environment concept is shown in Figure 1a. In Figure 1a, raw data are captured from the transportation system and managed within the data environment (represented by the orange sphere). These data are transformed into information supporting one or more applications. Note that all data need not be stored in the same location to be well-organized collection. If data can be readily assembled from multiple sources then distributed forms of data management can also be considered within the data environment core concept.
Figure 1a. The Data Environment Concept
Figure 1b illustrates key attributes that characterize a specific data environment and differentiates it from other data environments. Along the longitudinal “y” axis, the data environment is defined by the sources of the raw data entering the data environment and the applications identified for the data environment. On the latitudinal “x” axis the data environment is defined by what data will be stored and how these data will be structured. Defining a data environment requires a clear and documented understanding of these key considerations. Let’s consider the attributes of the data environment by examining the interior of the data environment sphere from the perspective of each of the four key considerations.
Figure 1b. Key Attributes Defining a Data Environment
Sources and Uses
Imagine that we slice the data environment along the longitudinal “y” axis and open up the sphere like a book to examine the related issues of sources and uses (Figure 2). Our data environment is now broken into two hemispheres representing these two issues. On the flat interior face of one hemisphere we can identify the primary sources of all data available within the data environment: traveler, vehicle, and infrastructure. Traveler data may include a multi-modal trace of an individual traveler through the transportation system, as well as traveler behavior data describing the motivations, decisions, and outcomes of travel. Mobile devices may play a key role in obtaining this traveler-focused data, particularly as travelers make mode and route choices on a complex multi-modal trip. Light vehicles may report location and speed within the system, as well as internal vehicle status data such as fuel consumption rate or externally measured data such as recorded external temperature. Roadway and rail transit vehicles may contribute similar location, speed and status data as well as passenger counts and schedule adherence data. Roadway freight carriers, rail cars transporting freight, or cargo containers might supplement a standard location and position report with gross weight data or data regarding the type and time-critical nature of goods carried. Public sector fleet vehicles may be able to contribute other key data related to their primary functions, e.g., snowplows reporting blade position or estimates of roadway snow depth. Infrastructure sensors describe facility performance conditions as well as the state of system controls. This includes sensors that measure the speed and volume of vehicles that pass a particular location on the infrastructure.
Figure 2. Data Sources and Supported Applications within a Data Environment
On the interior face of the other hemisphere we can examine the applications that the data environment has been designed to support: safety, mobility or environmental. In this illustration, these considerations are shown as general areas, but this aspect of the data environment is best understood as specific data requirements for particular applications. For example, a data environment might be envisioned to support a multi-modal predictive traveler information application that would require a combination of traffic count data from infrastructure sensors, travel time information from vehicles acting as a probes, transit schedule adherence and passenger count data, and data describing route choice decisions from travelers. These requirements may have other attributes beyond just data type and sources – e.g., frequency of update, quality, or quantity. This specific data environment may also turn out to be close to what is needed to support a particular speed harmonization application for improved freeway safety under congested conditions. In this case, the Real-Time Data Capture and Management Program might propose a single data environment be created to support the development of both the predictive traveler information and speed harmonization applications.
Structure and Aggregation
Now imagine that we examine the interior of the data environment by splitting it open like a clam along the equator of the sphere. We again create two hemispheres, this time related to the issues of data aggregation and data structure (Figure 3). On the interior face of the top hemisphere we consider whether data will be stored only in a pure raw state or at some level of aggregation (or both). In general, the inclination of the Real-Time Data Capture and Management Program is to store and provide data in both a raw state and in various states of aggregation in order to serve the needs of different potential data users.
Figure 3. Data Aggregation and Structure within a Data Environment
On the interior face of the southern hemisphere a hexagonal grid illustrates how the data has been structured with respect to other key considerations for the data environment. These include intellectual property issues, privacy concerns, how the data will be physically stored and accessed, and how the data conform to standards, regulation, and quality requirements. Quality may be described in terms of accuracy, reliability, frequency and/or other data attributes. Note that all data elements within the environment need not conform to a single uniform policy with respect to each issue. For example, access to individual vehicle drive train status data or hazardous cargo location may be restricted while vehicle speed and location data may have fewer restrictions. Some data may be more easily shared in aggregate form rather than in raw form. One of the challenges in creating a viable and useful data environment will be managing the benefits of broadly sharing data while still protecting intellectual property rights and privacy. The Real-Time Data Capture and Management Program will adapt concepts and proven processes from open source software development and collaborative research practices to address this challenge.
Note that the data environment concept does not necessarily imply a single centralized federal repository. If the rules for participation within a data environment are clear, it is possible that disparate data sources can be housed in a distributed form and integrated on-the-fly. Data may stream into the virtual data environment from private or public sources, be integrated by private or public sector entities, and then applied by the same or different array of private and public sector entities. Clearly, a reliance on well-formed standards and strong coordination with the USDOT standards program will be required to realize this notion of on-the-fly data integration.
Figure 4. Sharing Framework for a Data Environment
Access and Sharing
A final set of key considerations addresses how the data environment will be accessed, shared, and managed. These considerations are illustrated as rings bracing the data environment (Figure 4) and include:
- meta-data, a high-level description of the data environment, what data types it contains, and the general conditions under which data were captured;
- virtual warehousing, web-based tools or other mechanisms to support ready access to the data environment and a forum for collaboration;
- history/context; the organizational context under which the data were assembled, including the objectives and intent of sharing data; and
- governance, the rules under which the data environment can be accessed by data contributors or content users, and procedures for resolving disputes
Given the objective of the program to leverage the investment in data capture and management by sharing, prospective partners must be able to clearly understand not only what data are available within the environment, but also what the rules of engagement are with respect to their use of data drawn from the environment. Further, participants supplying data elements into a data environment should clearly understand what that participation implies with respect to the sharing and re-use of contributed data.
Figure 5. Evolution of Data Environments
Current and End-State Data Environments
It is clear that theReal-Time Data Capture and Management Program does not begin from a zero-data state. Current technologies and archiving platforms already provide researchers and system operators with organized and sizeable data environments. However, the current state can be characterized as being highly dependent on passive infrastructure-based sensors. Further, many of these data resources are single-source. For example, there may be one database for freeway loop data, another for arterial occupancy data, and a third for periodic “floating car” travel time runs. On the left side of Figure 5, a current state-of-the art integrated operational data environment is illustrated considering typical data sources. Traveler data is limited to behavioral data collected from small samples of infrequently conducted surveys. Location and speed data from a small sample of vehicles are obtained from one of several competing probe technologies. Some transit vehicles report passenger counts. Speed and traffic count data are available from infrastructure-based sensors deployed on selected high-volume freeway segments and some key arterial locations. The relative contribution of all possible travelers, vehicles, and facility segments contributing to the data environment in a geographic region is illustrated for each source through a partial shading of each of the three corresponding slices of the data environment cross-section. The capability of such a data environment merging current sources to support mobility and other applications remains a relevant research question.
In addition to understanding the potential of integrating current data sources, theReal-Time Data Capture and Management Program must also consider a more forward-thinking question: What should the composition and capability of some projected high-value, end-state, data environments be?
One conjecture about such a high-value end state is shown on the right side of Figure 5. Note that this is just one possible end-state we might consider. In this case, we capture and manage data from a sample of travelers, nearly all vehicles, and some key subset of the infrastructure. For such a data environment, one can conjecture various applications not possible using current data sources. A proposed data environment can encourage the development of new and potentially transformative applications. In turn, data needs associated with new applications can motivate the creation or evolution of a data environment. Later in this section, a sample interaction between data environments and supported applications is presented.
Although a thoughtful consideration of the potential of projected end-state environments to support transformative applications is needed, the identification and testing of promising data environments (representing interim steps between the current and desired end-state) is an equally important and practical near-term goal. TheReal-Time Data Capture and Management Program enables the field testing and trial deployment of promising candidate data environments. The nature and benefits of these candidate test environments, when examined and tested, will shed light on the potential and desirability of potential end-states.
Data Environments and Supported Applications – An Illustrative Example
To explore the potential of moving beyond the current data environment, however, the program will identify a collection of candidate data environments (center of Figure 5) specifically designed to support one or more applications drawn from mobility, safety and environmental applications. Consider the current-state data environment presented in Figure 5. Combining vehicle flow data with the variation of segment or trip-level travel times, one might conjecture developing a performance measurement application that characterizes the overall economic productivity generated by a regional transportation system. In Figure 6, this relationship is shown between “Data Environment I” and this productivity-oriented performance measurement application.
Other useful applications may also be supported by Data Environment I. For example, one might develop an application that identifies point-to-point travel times for display on a variable message signs (VMS) that implicitly accounts for the impact the provision of the VMS messages will have on future network flows and performance.
In testing of the predictive VMS application, one might find that the application has significantly improved impact if particular traveler behavior data were systematically generated and captured (e.g., a profile of decisions made at diversion points downstream of the VMS). This enhancement of Data Environment I results in the larger and more complex Data Environment II. The inclusion of behavioral data may or may not improve the Performance Measurement application, but if Data Environment II were made available, this potential value could be explored.
Now consider the end-state data environment presented in Figure 5. We might conjecture that such a data environment might support transformational notions of transportation system management. For example, one might propose operating a freeway system with dynamic optimized speed and flow targets for each lane. To meet these targets, the system manager might consider lane-level pricing strategies with discounts for automated adherence to the changing optimal speed targets tailored by vehicle weight. In order to maximize overall economic productivity from the system, pricing policy may favor high-occupancy vehicles or transit vehicles with large passenger counts. Likewise, heavy vehicles might be provided with incentives to travel on particular lanes to improve flow and productivity. Dynamically grouping vehicles by size and weight may also reduce fuel consumption by reducing individual heavy vehicle wind resistance. Since the management system is highly complex and changes dynamically, roadway users of all types (from multi-modal travelers to transit agencies to shippers) would require frequently updated, highly accurate lane pricing and target speed data tailored to individual vehicles. This form of freeway management may have significant benefits in improving mobility and system productivity without increasing roadway right of way.
Figure 6. Joint Evolution of Data Environments and Supported Applications
One key observation one might make is that these data environments logically build from one to the next. For example, we may find that Data Environment III lies on an evolutionary path building on Data Environment II – requiring only enhanced behavioral data and a natural growth of probe vehicle market penetration.
Program Phasing and Projected Outcomes
There is a clear federal role in leading a data capture and management program to facilitate cost-effective and coordinated research on a variety of data-driven ITS applications of national significance. These applications will require current, accurate, objective, consistent, and standardized data sets, gathered in a manner that respects the privacy of individuals and the intellectual property rights of data providers. Success in this area requires sharing of data among a variety of stakeholders in the public and private sectors. U.S. DOT is uniquely qualified to act as a convener of these stakeholders to facilitate the establishment of data-sharing environments.
Relationship to Other Connected Vehicle and ITS Program Areas
Connected Vehicle Applications: TheReal-Time Data Capture and Management Program will coordinate with other U.S. DOT initiatives to identify promising data environments that have the potential to support applications. Enhanced or transformative applications will be identified within the Dynamic Mobility Applications (DMA) and Electronic Payment programs as well as in efforts identified within the Safety and Environmental areas. The data capture and applications efforts will be best served by an open and ongoing collaboration refining the data environments established in the Real-Time Data Capture Program as the data needs of applications are also developed and refined.
Other Connected Vehicle Program Areas: The Real-Time Data Capture and Management Program will have strong connections with other connected vehicle program activities. For example, the Michigan Test Bed is likely to be a key generator of data captured and managed by the Real-Time Data Capture and Management Program. These data will be of particular value in candidate data environment assessment. The Systems Engineering effort will update the connected vehicle system architecture to accommodate new communications technologies and a range of in-vehicle devices. These enhancements will accelerate growth in the number of connected vehicle probe vehicles contributing to data environments developed under the Real-Time Data Capture and Management program.
Road Weather Management: The FHWA Road Weather Management has several programs demonstrating successful weather-related data capture and management practices. The Clarus initiative is a six-year effort to develop and demonstrate an integrated surface transportation weather observation data management system. These ongoing efforts, along with upcoming planned initiatives, can be utilized to integrate weather data into prospective data environments.
NGSIM: The Next Generation Simulation (NGSIM) program has successfully supported sharing of detailed data sets tracing the tactical movement of all vehicles observed traversing freeway bottlenecks and congested arterial networks. Initiated nearly a decade ago, lessons learned from this effort in the early engagement of stakeholders and the development of a high-quality well-documented data archive may prove useful to the Real-Time Data Capture and Management Program.
ICM: The USDOT will be supporting one or more deployments of Integrated Corridor Management systems as a part of the demonstration phase of the ICM program. Participating sites may have a role in the provision of test data sets.
Data.gov: The purpose of Data.gov effort is to increase public access to high value, machine readable datasets generated by the executive branch of the federal government. Data environments established in the data capture program may be suitable to be included in the Data.gov resource. Data.gov can serve as a useful portal for academic researchers and private sector partners interested in developing mobility applications.
Figure 7. Data Capture and Management High-Level Program Phasing Plan.
A high-level plan for the phasing of the Real-Time Data Capture and Management Program is shown in Figure 7. This figure shows the three primary phases of the program over time along the x-axis and program activity tracks along the y-axis. Of note is the fact that although the three phases have some overlap, the entire program is expected to conclude within five years.
The program begins with a Foundational Analysis phase lasting 18 months. In this phase, the program will consider with the question: What data environments do we need to support a connected transportation system? The Real-Time Data Capture and Management Program will assemble and analyze data needs from candidate applications identified in multiple ITS program areas, drawing from Mobility Applications and other strategic initiatives. The number, type and attributes of candidate data environments and their supported applications will be identified within the Real-Time Data Capture and Management Program and vetted through stakeholder interaction in a series of workshops. Prototype data environments and test data sets will be released to provide tangible and broadly accessible examples of the core data environment concept. Data sets will also be available to support applications research and development. These prototypes and test data sets will also serve as precursors to resolve potentially thorny issues in a range of cross-cutting issues such as hosting, aggregation, access, and intellectual property rights. The test data sets will elicit feedback on data needs from early mobility applications development. This phase includes the tasks necessary to further refine program activities and to develop a concept definition, a set of scenario-based concepts of operations, and high level requirements. This includes an assessment of the state-of-the-practice and the state-of-the art with respect to data capture and data management.
Phase 2 (Research, Development and Testing) begins with a refinement of the program plan based on the outputs and outcomes of the Foundational Analysis phase. Prior to launching into any field testing, there is a decision point where the program must justify that the identified data environments and their supported applications are both relevant to the broader ITS program and that substantive research can be feasibly conducted within the phase. In Figure 7, these two elements of the Phase 2 decision gate are shown as two critical questions:
- “Is the program well-defined and connected to the ITS Program?”
- “Is there substantive research to be conducted in a proof-of-concept test?”
If these questions can be satisfactorily answered, then the primary concern of Phase 2 will be: Can these promising data environments be realized? Testing of promising data environments begins with the technical feasibility as demonstrated by the provision of data from proof-ofconcept field tests. These field tests may be supplemented by analytical methods to estimate what fraction of traveler or vehicle participation and/or infrastructure sensor deployment density is required to realize desired impacts from supported applications. Cross-cutting assessment of how the data environments are structured and administered will also continue in this phase. This phase includes activities to fill technology gaps and to address the needs identified in the Foundational Analysis phase. For example, research may be needed to develop better data fusion techniques for data from multiple sources or to develop automated data scrubbing techniques to ensure data quality. Development of data definitions, data frameworks, institutional arrangements, and data collection, storage, and dissemination techniques will be necessary. It is expected that data environments developed in this phase will permit re-use by researchers and operations staff both as a static archive of stored data or as a dynamic repository of current transportation conditions. The data environments realized in this phase will be critical resources to applications development and testing. As shown in Figure 6, multiple applications from different strategic initiatives (e.g., mobility, safety, or environment) may be supported by a single data environment.
If the results of the Phase 2 effort are encouraging enough to spark interest from deployment partners (drawn from both public and private sector) then the most promising data environments and their supported applications will be considered for pilot deployment in Phase 3. Figure 7 shows the Phase 3 decision gate dependent on the following question:
- “Do the results from the POC test(s) support the need to conduct pilot deployments?”
In Phase 3, the program will be concerned with the question: Can these data environments be demonstrated operationally? This phase will demonstrate that the collection, storage, and dissemination of real transportation data can occur in an operational environment. The standards, procedures, protocols, and tools developed in Phase 2 will be further tested and refined. This phase will support specific applications developed in other connected vehicle program areas and will demonstrate both inter-state and regional data sharing. The ultimate goal is to deploy and demonstrate data environments that will remain operational beyond the life of the program. Lessons learned from Phase 2 will be critical to the success of these demonstrations. Phase 3 also includes an evaluation of the performance of the data environments and the impact of the supported applications. For example, did the data environment successfully support the targeted applications? Was the promise of the data environment identified in testing realized in operational deployment? Was the impact of the supported applications consistent with estimates made in Phase 2?
Summary of Projected Outcomes
A successful Real-Time Data Capture and Management Program will be characterized by a number of outcomes. Unlike the applications that depend on the data environments developed in the Data Capture program, the data capture program itself does not directly improve mobility, safety or reduce environmental impacts. However, the success of the cross-cutting data capture program can be characterized by the establishment and use of the data environments created by the program.
- Establishment of One or More Multi-Source Data Environments. A successful Real-Time Data Capture and Management Program will create multiple data environments. Data will be captured and managed through a data warehouse or distributed network. Large-scale data sets derived from connected vehicles and devices and other sources are available for research and for the development and testing of safety, mobility, and environment applications.
- Broad Collaboration Surrounding Data Environment Utilization. Data environments of recognized value are being actively used by internal (US DOT) stakeholders and researchers as well as external stakeholders and researchers. The provision of well-organized data and clear rules about participation has attracted a broad range of users and contributors for the supported data environments. These multiple stakeholders represent a robust and diverse collection of private sector, academic, and public sector entities. Data is captured and managed consistently according to the governance established for each data environment. In addition, data accessed from the data environments are used appropriately according to established rules of engagement. The community of stakeholders around the data environment is motivated to self-police issues related to data access, quality, integrity and utilization.
- Implementation of Data Management Processes Representing Best Practices. Data are collected and shared using national and international standard message sets and interfaces such as SAE J2735 and relevant NTCIP standards. Multi-state and regional demonstrations have been conducted to motivate and test the use of these standards. Unambiguous metrics and a systematic methodology for measuring data accuracy and reliability have been established and are adopted by stakeholders. Public sector transportation managers and their representatives are actively engaged in the appropriate data standards development processes.
- Successful Testing and Pilot Deployment Results in Enhanced Operational Practices. At least one multi-state and one regional demonstration integrating data from multiple sources have been successfully conducted. Deployment partners drawn from both the public and private sector have been cultivated early in the program and are stakeholders in both testing and evaluation. These partners are motivated to participate in a pilot data environment deployment to enable new mobility, safety and environmental applications. These data capture and management processes are successful enough that deployment partners choose to integrate these processes into ongoing operational practice beyond the duration of the Real-Time Data Capture and Management Program.
Key near-term steps in the development of the Real-Time Data Capture and Management Program include the continued refinement of the core concepts of the program. This document is one part of a broader vision that will include a program charter and other materials that relate the objectives of the program as well as define roles and responsibilities for prospective program partners and stakeholders. Further, this vision must be paired with a practical set of logically connected projects and program activities that provide a path to realizing the goals of the program.
To be successful, the Real-Time Data Capture and Management Program requires a high degree of coordination with other ITS programs. This is not a one-time engagement but the beginning of an ongoing collaboration to refine data needs and structure relevant and feasible data environment development efforts. The success of the program also requires active interaction with stakeholders outside of the portfolio of federal research and development efforts. To this end, the program must take advantage of opportunities in workshops or other venues to engage these stakeholders and motivate their participation and collaboration.
Within Phase 1, the Real-Time Data Capture and Management Program will also assess the feasibility of providing test data sets and setting up prototype data environments. These activities will have great practical benefit in terms of engaging and dealing with cross-cutting institutional issues like intellectual property and the use of standards. This activity will also help to tangibly demonstrate to stakeholders both inside and outside of U.S. DOT how the core data environment concept can be realized and utilized to cost-effectively advance multiple research and development efforts.