|
1
|
- Mobility Program Summer Webinar Series
- Mohammed Yousuf
- FHWA Office of Operations R&D
- August 17, 2011
|
|
2
|
- Brief introduction of ITS Research and Mobility Program
- Purpose of project
- Issues and Innovations
- Fundamental challenges and best practices
- Recommendations on technologies and methods with the most promise for
data capture and management in a multi-source data environment
- Next steps
- Getting involved
- Discussion
|
|
3
|
- To improve Safety, Mobility and Environment
- Research of technologies and
- applications that use
- wireless communications
- to provide connectivity:
- Among vehicles of all types
- Between vehicles and roadway
- infrastructure
- Among vehicles, infrastructure,
- and wireless consumer devices
- FCC Allocated Spectrum at 5.9 GHz for Transportation Safety (known as
DSRC)
|
|
4
|
|
|
5
|
|
|
6
|
|
|
7
|
- Assess industry best practices in data capture and management methods
and technologies that are applicable to the DCM Program
- Industries: Aviation; Freight Logistics; Internet Search Engines; Rail
Transit; Transportation Systems Management
- Focus areas: quality assurance; access, security, and privacy; storage
and backup; operations and maintenance; critical failures
- Identify emerging concepts and technologies that might potentially
address issues related to the new paradigm for data capture and
management
- Industries: Information Technology; Aviation; Freight and Transit;
Government; Defense; Smart Home and Infrastructure Monitoring; Banking,
Finance, and E-Commerce; Health Care and Bioinformatics
- Recommend methods that have the most value for capturing and
managing/reporting data in a multi-source data environment
|
|
8
|
|
|
9
|
|
|
10
|
- Issue: Potential data explosion
due to new forms of data will likely over-burden the computational and
communication systems
- Large volumes of data with connected vehicles, infrastructure, and
travelers
- Approximately 1.2 MB of accelerometer data generated per vehicle per
mile (Source: Cooperative Transportation Systems Pooled Fund Study on
Pavement Assessment by Auburn University)
- Translates to 2 TB of data per day just for pavement assessment for
Washington, DC
- Capturing, transmitting, cleaning, and storing large volumes of data
can over-burden the system and be cost-prohibitive
- Innovation: Dynamic Interrogative
Data Capture (DIDC)
|
|
11
|
|
|
12
|
|
|
13
|
|
|
14
|
|
|
15
|
|
|
16
|
|
|
17
|
|
|
18
|
|
|
19
|
- Increased efficiency
- Identify critical data elements
- Collect, clean, transmit, analyze, and store only the required amount
of data
- Energy and cost savings
- Increased availability of critical data sets
|
|
20
|
- Identify critical data elements to query
- For every data element that is stored, but not used, there is a cost
associated with the capture, cleaning, transmission, and storage
- Determine when to query
- If a data element is considered unimportant, and not captured, the data
will not be available for potential future applications
- Intelligence needs to built into each device
- Does bandwidth cost savings outweigh putting intelligence into each
device?
|
|
21
|
- Issue: Envisioned transformative
applications require new forms of real time and archived data that are
extremely costly to obtain, or create possible privacy conflicts if
required from all vehicles or travelers
- Innovation: Crowdsourcing
|
|
22
|
|
|
23
|
- Examples:
- Inrix: provides traffic information using crowdsourced traffic data,
traditional sensor data, and other relevant data (e.g., incidents,
weather, construction, special events)
- crowdsources data from 3 million GPS enabled vehicles and devices
covering 450,000 miles of roadways
- Waze: provides 100% crowsourced, free real-time traffic information on
mobile devices
- crowdsources data from GPS enabled vehicles of volunteers for
real-time traffic information and maps (passive participation is
sufficient)
- crowdsources data for map correction (requires active participation)
|
|
24
|
|
|
25
|
- No control over crowds
- Some problems may not get solved within the time frame of interest, or
in some instances, may not get solved at all
- Little control over quality of crowdsourced product (data)
- Perception of privacy intrusion
- Can hinder participation in crowdsourced projects
- Expectation of in-kind compensation for participation
- Possible in kind compensation: Recognition, transparency
- Crowdsourced data is almost always given back to users, for no cost
|
|
26
|
|
|
27
|
- What is Virtual Data Warehousing?
- Functional, virtual equivalent of conventional data warehouse (e.g.,
CPU time, storage space, operating systems, database)
- Allows data to be integrated dynamically from heterogeneous data
sources that are housed in different locations
- Allows for rapid sharing of large amounts of data
- Minimizes data integrity issues
- Requires less time and expense to develop
- Promising Innovations in VDW Technology:
- Cloud computing
- Data federation
|
|
28
|
|
|
29
|
|
|
30
|
|
|
31
|
- Data security
- Use encryption to protect against snooping during data transit
- Use intrusion detection and prevention mechanisms
- Be aware of service provider’s security policies
- Reliability and availability
- Perform periodic off-line data backups
- Google successfully used tapes to recover data deleted inadvertently in
a software roll‐out in February 2011
- Data transfer bottlenecks
- Use of private cloud physically close to the customer can reduce the
problem, although at a high cost
- Legal compliance
- Use service providers with strong security controls
- Data consistency
- Users perceive eventual consistency as strong consistency
- Google Apps platform; and Amazon’s S3 (Simple Storage Service),
SimpleDB and EC2 Elastic Compute Cloud) are successful implementers of
eventual consistency
|
|
32
|
|
|
33
|
- Transparency of underlying heterogeneity
- Consumer sees a single uniform interface
- Consumer doesn’t need to know where the data is stored or how it is
stored
- Time-to-market advantage
- Reduces development time significantly when multiple sources have to be
integrated
- Reduced development and maintenance costs
- Develop integrated view once, and leverage multiple times
- Integrate disparate data sources without consolidating to a single
location
- No consistency issues
|
|
34
|
- Assumes data already in storage
- Does not scale well
- High management and maintenance effort and cost
- Data transfer bottlenecks
- Does not address reliability and availability issues
- No data replication
- Be aware of storage and backups at original location
- Data security
- Protect against snooping during data transit (e.g., use encryption)
- Be aware of security procedures at original location
|
|
35
|
- Quality Assurance
- Access, Security, Privacy
- Storage and Backups
- Operations and Maintenance
- Critical Failures
|
|
36
|
- Collect redundant data from multiple sensors
- Data can be combined so that false positives are filtered out
- Use standard industry reference files when possible
- Reduces erroneous information
- Data quality is highly industry dependent
- Choice between high‐output, real‐time data,
- and scrubbed, pseudo real‐time
data
- Greater the overall veracity required,
- the more process‐intensive
it is to enforce that
- veracity, and the higher the
delay in
- disseminating the data
- Typically, fast, general data‐quality analysis for
- real‐time systems, and
thoroughly scrubbed
- and sanitized data for
historical and post real‐time analysis and display
|
|
37
|
- Access is most often controlled by the holder of the data
- Systems have been designed to see that the right people have access only
to the data they need
- Access to the data is usually password protected
- Protection of the source of the data is highly protected
- Within the internet search industry, it is so highly protected that
there is no concrete evidence of exactly how it is protected
- Access and security regarding the ability to search is completely
disregarded since the ability to perform a search is of paramount
importance in the internet search industry
|
|
38
|
- Determine what data needs to be stored and for how long
- Once the system is out of test mode, is there any need to retain all
that information?
- Most industries do not have hard-rule on how this should be done
- In aviation industry, data are typically kept for only a brief period
of time before being discarded.
If incident occurs, data are spooled off for review before being
destroyed
- Frequent backups and storage off site is typical
- Google (Search Engines) does real‐time streaming backups across
multiple data centers in order to ensure that searches are always
available
- Perform preventative maintenance regularly
- Consider allowing a third party to handle data storage needs
- Cost of keeping all traffic data may be prohibitive for the government,
but profitable for a third party
|
|
39
|
- Start small with an implementation which addresses the most critical
needs, defined either geographically or by category of information
- Focus first on known critical data‐elements first to ensure that
you have the capacity, ability, and availability of those items, before
expanding
- Take a lesson from the search engine industry – great systems are built
slowly over time from small, hardened implementations
- Ensure that core competencies are answered before trying to do
everything
- Build for scalability
- Avoid situation where a system is built to perform very well for a test
setup, but does not scale well in the real-world
- Leverage technologies such as clustered databases, virtual warehousing,
virtual servers, etc.
- Use multiple servers to distribute load for real time databases
- Databases can grow quite large very quickly
- Easier to solve the problem once approximate data sizes and elements
have been defined
|
|
40
|
- Determine level (granularity), amount, and transmission frequency of
data are needed
- Data overload will negate the usefulness of the data
- Data overload can cause critical messages to be overlooked
- Avoid an overreaction or early reaction based on small sample sizes
- Determine what is critical to communicate
- In airline industry, alert systems do not collect all continuously
streamed data; only data needed to alert an operator of a problem
- Make data available as soon as feasible
- Even if more processing needs to occur in the background, providing a
real-time feed for current data is an attractive option to users
|
|
41
|
- Do not be dependent on a single person to rectify a critical failure
- Flight‐Plan System Crash in 2009 was due to failure of a single
part, which was easily replaceable; but there was only one technician
who was qualified to do it, and it took over six hours for him to
arrive and make the repair
- Systems that need to be highly available will necessitate elevated labor
costs
- If any failure of the system can be catastrophic, then it is necessary
to keep round‐the clock staff to fix issues
|
|
42
|
- Explore DIDC concept
- Develop one or more prototype DIDC applications to capture data from
mobile sources (vehicles, travelers, etc.)
- Examine crowdsourcing
- Development of transformative mobility applications (e.g., multi-modal
traffic signal system, transit signal priority, queue warning, speed
harmonization)
- Data collection (e.g., travel times, queues)
- Investigate Cloud Computing
- Examine strengths/benefits of cloud types (public, private, etc.)
|
|
43
|
- Assess recommendations for innovative concepts and methods, and
downselect promising ideas
- Issue solicitations for developing and testing selected innovations
|
|
44
|
|
|
45
|
- Got an interesting concept or method for data capture and management?
- Respond to upcoming procurements for:
- further research and development of innovative data capture and
management concepts
- building Phase 2 data environments to enable development of mobility
applications
- Participate in future stakeholder engagement activities (e.g., users
needs meetings) and provide feedback on direction of the Mobility
Program
|
|
46
|
|