Fun with tables.

table from mysql -H html output:

InjurySeverity IncidentID IncidentDateTime Onroad CrossingFeature
5 2371350 2010-04-17 01:18:00 99th Ave Camelback Rd
5 2401688 2010-08-13 00:00:00 99th Ave Union Hills Dr
5 2416950 2010-10-31 11:03:00 99th Ave Lindgren Ave

table from myphpadmin:

IncidentID IncidentDateTime Onroad CrossingFeature
2446148 2010-09-23 06:41:00 Via Linda 117th St
2453234 2010-09-23 07:45:00 Missouri Ave 63rd Ave
2445966 2010-09-23 08:22:00 University Dr Hardy Dr
2451975 2010-09-23 11:10:00 Ina Rd La Cholla Blvd
2451005 2010-09-23 13:49:00 Colombo Ave Charleston Rd
2454504 2010-09-23 15:10:00 Glenn St Orlando Ave
2454482 2010-09-23 15:31:00 8th St Tyndall Ave
2453231 2010-09-23 15:44:00 Camelback Rd 79th Ave
2454471 2010-09-23 15:45:00 Oklahoma St 12th Ave
2448706 2010-09-23 16:13:00 3rd Ave Encanto Blvd
2448069 2010-09-23 17:00:00 Delaware Dr Apache Trl

Arizona Bicyclist and Ped Crash Map

Volunteer Needed

crashmap

URGENT: Google has announced the tool used to present the crash maps, below, called “Fusion Tables” is being discontinued Dec 3, 2019.
Azbikelaw is looking for a volunteer familiar with GIS and web-presentation to help effect transition to something else.

Please see this article for more info. Below is the historical data and procedures.

Continue reading “Arizona Bicyclist and Ped Crash Map”

ADOT Traffic Collision Database

It turns out (who knew?) that ADOT sells their crash database for a nominal sum. I purchased the 2010 version, the latest full-year available (2011 is supposed to be ready in July). This data is either similar to (or synonymous with) something referred to as the Arizona (or ADOT?) Safety Data Mart — thus the acronym asdm sprinkled throughout.

The data is delivered on a DVD (or CD) which contains three large text files; corresponding to Incident, Person, and Crash -level data. It is also accompanied with 5 photocopied pages of “Column Headings”, and about two-dozen pages of photocopied “Definitions”. [it is really strange that they would distribute this information on paper!?]. I was surprised to find out that very little of the data aligns with FARS/GES, which seems quite strange to me. [UPDATE: as of 2018 I have online access and can download the files directly so no longer need to order the DVD/CD]

Anyway, I pieced together most of the info using those photocopies and assembled it for ready-reference in a spreadsheet adsm.xls , there is one worksheet for columns (fields) and another for defintions (ENUM). I don’t have all the definitions there, some i didn’t care about and were very lenghy, like vehicle color, state abbreviations, and so forth; nearly all of that is avaible at the referece material

Reference Material

UPDATE: adot re-organized their site and now has a lot of this infor organized on one page at Arizona Crash Report Forms, Resources and Training Events including as of this writing, 2021 crash forms, manuals, and training presentations.

Below is some historical info collected back as far as 2010:

  • Background info on the ALISS database,  and related terms: AIDW (Adot Information Data Warehouse), and Safety Data Mart.
  • My spreadsheet adsm.xls; provides a unifying list of fields, and enumerations

If the document links, above, go dead; there are local copies in asdm/.

The Database

azbikelaw.org is making this data available publicly via a MySQL database accessible via the internet. Special thanks to Justin Pryzby for completing this work.

Years 2001-2003, and 2009-Newest
hostname:     mysql.azbikelaw.org
databasename: asdm
username:     asdmuser
password:     Contact us
access via myPhpAdmin currently not available

Crash Map 
See crashmap-data

The tables were loaded from raw text files into tables and then re-processed with to create 3 tables: 2010_incident, 2010_person, and 2010_unit.

Interactive (e.g. myPhpAdmin, or MySQL Query Browser) users will want to refer to views: pretty_2010_incident, pretty_2010_person, and pretty_2010_unit which substitute enumerated fields for the original data fields, and leaves out the description-only fields. For example there are 3 fields all carrying the same information: TravelDirection = 1 (int), TravelDirectionDesc = “NORTH” (text), and a synthetic field eTravelDirection=’NORTH’ (enum). The first two are not in the pretty_unit view, while the unit2 table has all three. The table unit should not be used, and is likely to be dropped to save space.

Furthermore, all database users should rely on the enumerated field, eTravelDirection in the above example, and not on the integer values, as they are liable to change in subsequent years, whereas the enumerations can be kept consistent when subsequent data from a new year is imported.

There are some additional helper tables, such as LOVCity, LOVCounty and county which are handy for looking up things like city codes (e.g. Phoenix has a CityID of 241).

Data Completeness

A word about data completeness: As explained in more detail in this comment below. The data received from ADOT represents data frozen in time mid-year (currently the cutoff date appears to be May 31 of the following year). Data does continue to trickle in, however, and for unknown reasons (why shouldn’t 6 months be enough time for PD’s to send all their corrected data from the preceding year?). ADOT initially releases a version of Crash Facts based on the May 31 data; and then subsequently re-publishes an update a few months later. My dataset is frozen in time at the May 31st (or June 1st) version; which doesn’t make me happy but tends to be statistically insignificant.

Trickling example: The number of pedalcyclist crashes in 2012: 2121 / 2134 / 2141 as of May 31 2013 / Oct 28 2013 / Late Nov 2013. I.e. 2012 crash reports continue to trickle in in Late November of the next year!

GIS issues: a widely varying (from year-to-year) number of incidents have a lat/long of zero. More about that at crashmap-data.

I have noted in a few cases (only fatals?) where the data in asdm seems to be massaged relative to the acr, see comment below about “fiddling” (also called “fudgery”) with the CollisionManner and PostedSpeed.

2010

The first dataset available is 2010, which is the most recent full year available from ADOT.

2009

I went backwards and purchased the 2009 data; I believe this is the oldest dataset available under the current schema — which kindof makes sense as the ACR form was re-vamped for 2009.

2011

2011 data became available in mid-July 2012 ( I received it in the mail from ADOT Risk Management 7/27); and is now ready for querying. This year the data was distributed all on a CD, so thankfully no photocopies of headings and definitions hanging around.

As might be supposed, the tables are named 2011_incident, 2011_person, and 2011_unit ; along with the pretty views. The table structure is identical to 2010.

2012

2012 data became available June 10, 2013, according to an email report. It once again was $15, and a CD was delivered very promptly upon mailing ADOT Risk Management a check. This year the contact was Sarah Greener, Litigation & Public Records Supervisor. The data fields were identical to last (and in fact, all previous) year. I only needed to edit the .sql’s to substitute in 2012_ instead of 2011_ or whatever.

2013

2013 became available in mid-June as usual and the Adot risk mgmt people got it promptly mailed out to me for $17 (2 charge for mail, which apparently was waived or forgotten in previous years. See also arizona-crash-facts-2013 and

One area where data quality has degraded noticeably is with the number of incidents listing a geo-location of 0,0. Many of these also have no OnRoad nor CrossingFeature. This year there are almost 3,000 such incidents; last year there were only 663; and prior years were 2,679, 2,017, and 9,048(! 2009 was a bad year for much inconsistency) respectively. I don’t really see a pattern, e.g. as to particular agencies, in other words they’re sprinkled around the state. The 3,000 include about 50 bicyclist incidents including some with incapacitating injury. I am also pretty shocked to find among the 3,000 forty fatalities and over a hundred incapacitating injuries — this just seems horribly lax.

select count(*) from 2013_incident where Latitude=0 OR Longitude=0;

2014

Everything seemed fine until discovering there’s something wrong with UnitID this year; Up until this year, UnitID (as well as PersonID and IncidentID) have all been unique, even year-over-year. This year, however, there are many (~ 30,000?) “reused” UnitIDs. It’s not clear if this is a bug, or just what… Is the data ok otherwise? who knows. What’s clear is ADOT traffic records people prefer to toil in anonymity and don’t deign to bless us (the general public) with any explanations of their work.

e.g. UnitID=4976526 appears in both 2012 and 2014. Boo. You can test for this condition by running the following (long, it takes over a minute) query; the result should be 0 rows, but it returns some 30,000!:

SELECT UnitID, count(1) c FROM (SELECT * FROM 2014_unit UNION SELECT * FROM 2013_unit UNION SELECT * FROM 2012_unit UNION SELECT * FROM 2011_unit UNION SELECT * FROM 2010_unit UNION SELECT * FROM 2009_unit) x GROUP BY 1 HAVING c>1 ORDER BY 2;

This seems rather odd; I expect to be able to use any of those IDs as a primary key, regardless of year, so that can no longer work. In any event to cover up adot’s short-comings; I’ve added XX0,000,000 where XX is the last two digits of the year to UnitID. For now, just did it to the 2014 tables…

UPDATE 2014_unit SET UnitID = UnitID + 140000000;
UPDATE 2014_person SET UnitID = UnitID + 140000000;

Also, there is an error incident=2935635, is incorrectly listed as 2 bicycle crash; it should be bike-MV (there is no such thing a crash not involving any motor vehicle).

2015

I had failed to notice at the time, but in 2014 four fields were added to the tables (this means, by inference, the Arizona Crash Form was updated, also):

  • incident table: Offset Direction , Secondary Crash Flag (The latter seems to not have a def’n?)
  • unit table: Distracted Driving and Distracted Driving Desc

These fields are added to the “end” (i.e. the far right-hand side) of the respective .csv files. As such, they can be safely ignored, as is presently done for 2014.

I would have thought this would break the “ADOTALL” tables, but it still seems to work, albeit without the new fields…

2019

Beginning 2019 there were some changes —  in the incident table, they changed how the Totals were reported in an odd way, the columns:

TotalMotoristsInjuries,TotalNonMotoristsInjuries
became:
TotalMotorists1,TotalNonMotorists1

the latter (non motorists) seems to be exactly as it was. However, for motorist injuries it’s not. The field now called TotalMotorists1 appears to be simply the number of motorists(?i haven’t verified that).
Can (i’m guessing) compute Total motorist injuries by subtracting non-motorist injuries from TotalInjuries.

All Years, 2009-2020

I found it was becoming a drag to re-run queries across year; so in Oct 2014 I created three tables, incident, person, unit that hold all contiguous year in 3 big tables. As new years come by I will add them as well. Complicated queries to these tables need to be somewhat judicious as the amount of data gets larger and larger; mysql.azbikelaw.org is hosted on my home server and is an old laptop!

There is a shell script to build all year’s tables at once, build-asdm-all.sh which is in the ADOTALL folder (analogous to ADOT20xx folders). SEE ABOVE, 2014 FOR A PROBLEM with this approach — unfortunately UnitID is no long unique, so it can’t be a primary key.

See note, just above, about year 2015 data and new fields.

This is current at least through data year 2020.

2001-2003

Justin got these discs from Adot in September 2013 (i.e. these are way out of chronological order)… in any event they still have exactly the same columns/tables layout. The only exception was the values STOP and YIELD in ControlType in the Unit table are reversed; so the definition of eControlType had to be adjusted. (so they are the reverse of 2009-2012 values).

FARS and ASDM

[warning: the table referred to below, farsxref, has fallen out of date] : As of now, FARS (federal, fatality-only data) for years 2010 and 2011 is available with full PBCAT crash typing, see 2010 FARS contains PBCAT data for more info on PBCAT. There’s a somewhat hand-crafted table called farsxref which contains an adot IncidentID, Year, and Fars identifier (the adot IncidentID is unique, for fars, the year and identifier have to be combined to uniquely identify an incident). Anyway, in the table is row for pedalcyclists only (at the moment; there’s not particular reason it shouldn’t have all fatals). Anyways, FARS data will eventually be added to the incident dump query.

Database improvements?

See comment below for a list of suggestions.

Shell Scripts

There are shell scripts to build the databases from raw text input files; build-asdm.sh build-fars.sh and there are also classifying scripts which produce sometimes interesting snapshot views of the tables; classify-asdm.sh and classify-fars.sh

See azbikelaw-gets-its-own-server (the slug is now just azbikelaw) which is password protected; actually it’s a “private” post, it’s invisible (not found) unless logged in.

Examples

In no particular order, here are some examples for routine lookups.

By Street Name(s), use OnRoad and/or CrossingFeature.  This finds incidents near Catalina Hwy and Houghton Rd.

select * from 2013_incident where (OnRoad LIKE "Catalina%" and CrossingFeature LIKE "Hough%") OR (OnRoad LIKE "Hough%" and CrossingFeature LIKE "Catalina%");
or to get a feel for dangerousness level, instead select things like:
SUM(TotalMotoristsInjuries), SUM(TotalNonMotoristsInjuries), SUM(TotalMotoristsFatalities), SUM(TotalNonMotoristsFatalities)

By Street Name and Latitude/Longitude:

This example was to find any crashes along Park Ave, in Tucson, between Park Ave and Irvington (on the North) and Valencia (to the south), where a road diet was done sometime in 2014.  Since it’s a north/south road, use Latitude to discriminate. To find Latitudes/Longitudes use a tool such as latlong.net. You can, by the way, form a google maps url if you know lat/long like so:  maps.google.com/?q=32.16334,-110.95618 . In any event, the Latitudes in the query are “opened up” a bit to catch more crashes near the begin/endpoints:

SELECT eCollisionManner, count(*)
FROM 2012_incident AS i WHERE 
(OnRoad LIKE "Park Ave%" OR CrossingFeature LIKE "Park Ave%") AND 
(Latitude > 32.13410 AND Latitude < 32.16334 ) AND
(IncidentYear > 2008 AND IncidentYear < 2013) AND
CityId = 310 GROUP BY 1;

An example of an east/west road, University Drive in Tempe, between Priest Drive and ~ Ash Ave can be found here; (which is more complicated because it only selects MV-Bike crashes, by the way).

Other States

CA: The Statewide Integrated Traffic Records System (SWITRS) is collected and maintained by the California Highway Patrol (CHP). UC Berkeley maintains an elaborate geo-coded front-end to the database they call TIMS (Traffic Injury Managment system) and great in-depth learning tools for the database itself.

OH:  Ohio Traffic Crash Reports as reported to the Ohio Department of Public Safety for the current year and the past five years. I think they are full reports(?)

FARS and PBcat

Commencing with the recently-released 2010 data FARS (The USDOT’s Fatality Analysis and Reporting System) will have far more specialized detail on Pedestrian and Bicyclists crashes.

“Motorist Failure to Yield — signed intersection” One of several dozen crash types defined by PBCAT

618 cyclists (person type 6 bicyclist, and 7 other pedalcyclist) were killed in 2010 in traffic collisions — and as noted at the link above, only collisions with motor vehicles in-transport are tracked by FARS. So for example, a bicyclist who lost control and died as a result of crashing into a tree would not be tracked here, nor would a bicyclist who strikes a parked motor vehicle. Continue reading “FARS and PBcat”