It turns out (who knew?) that ADOT sells their crash database for a nominal sum. I purchased the 2010 version, the latest full-year available (2011 is supposed to be ready in July). This data is either similar to (or synonmyous with) something referred to as the Arizona (or ADOT?) Safety Data Mart — thus the acronym asdm sprinkled throughout.
The data is delivered on a DVD which contains three large text files; corresponding to Incident, Person, and Crash -level data. It is also accompanied with 5 photocopied pages of “Column Headings”, and about two-dozen pages of photocopied “Definitions”. [it is really strange that they would distribute this information on paper!?]. I was surprised to find out that very little of the data aligns with FARS/GES, which seems quite strange to me.
Anyway, I pieced together most of the info using those photocopies and assembled it for ready-reference in a spreadsheet adsm.xls , there is one worksheet for columns (fields) and another for defintions (ENUM). I don’t have all the definitions there, some i didn’t care about and were very lenghy, like vehicle color, state abbreviations, and so forth; nearly all of that is avaible at the referece material
- ADOT Crash Facts: handy for correlating and double-checking data
- NHTSA Traffic Records Team Home page, AZ Page hosts these useful documents — if links go dead, i have an [archived copy] of most of these documents:
- AZ_Tablesfor2010_Crash_Reports.pdf, [archived copy] : this almost matches the “definitions” photocopies, mentioned above.
- Arizona Crash Report 2010 : sample/blank ACR form
- Arizona Crash Form Manual, Rev. 8/2010 , [archived copy]: detailed instructions on how the ACR is supposed to be filled out.
- Old document: Arizona Crash Data Dictionary Attributes, 2008, [archived copy] , old but the only place we’ve found LOVCity (defines CityID column). (LOV apparently means List of Values — aren’t you glad you know that now?).
- Old document: Arizona Crash Form Manual, Rev. 12/2000, really old, the only place we’ve found with list of NCIC numbers.
- Background info on the ALISS database, and related terms: AIDW (Adot Information Data Warehouse), and Safety Data Mart.
- My spreadsheet adsm.xls; provides a unifying list of fields, and enumerations
If the document links, above, go dead; there are local copies in asdm/.
azbikelaw.org is making this data available publicly via a MySQL database accessible via the internet. Special thanks to Justin Pryzby for completing this work.
Years 2001-2003, and 2009-Newest hostname: mysql.azbikelaw.org databasename: asdm username: asdmuser password: Contact us access via myPhpAdmin currently not available Crash Map See crashmap-data
The tables were loaded from raw text files into tables and then re-processed with to create 3 tables: 2010_incident, 2010_person, and 2010_unit.
Interactive (e.g. myPhpAdmin, or MySQL Query Browser) users will want to refer to views: pretty_2010_incident, pretty_2010_person, and pretty_2010_unit which substitute enumerated fields for the original data fields, and leaves out the description-only fields. For example there are 3 fields all carrying the same information: TravelDirection = 1 (int), TravelDirectionDesc = “NORTH” (text), and a synthetic field eTravelDirection=’NORTH’ (enum). The first two are not in the pretty_unit view, while the unit2 table has all three. The table unit should not be used, and is likely to be dropped to save space.
Furthermore, all database users should rely on the enumerated field, eTravelDirection in the above example, and not on the integer values, as they are liable to change in subsequent years, whereas the enumerations can be kept consistent when subsequent data from a new year is imported.
There are some additional helper tables, such as LOVCity, LOVCounty and county which are handy for looking up things like city codes (e.g. Phoenix has a CityID of 241).
A word about data completeness: As explained in more detail in this comment below. The data received from ADOT represents data frozen in time mid-year (currently the cutoff date appears to be May 31 of the following year). Data does continue to trickle in, however, and for unknown reasons (why shouldn’t 6 months be enough time for PD’s to send all their corrected data from the preceding year?). ADOT initially releases a version of Crash Facts based on the May 31 data; and then subsequently re-publishes an update a few months later. My dataset is frozen in time at the May 31st (or June 1st) version; which doesn’t make me happy but tends to be statistically insignificant.
Trickling example: The number of pedalcyclist crashes in 2012: 2121 / 2134 / 2141 as of May 31 2013 / Oct 28 2013 / Late Nov 2013. I.e. 2012 crash reports continue to trickle in in Late November of the next year!
GIS issues: a widely varying (from year-to-year) number of incidents have a lat/long of zero. More about that at crashmap-data.
I have noted in a few cases (only fatals?) where the data in asdm seems to be massaged relative to the acr, see comment below about “fiddling” (also called “fudgery”) with the CollisionManner and PostedSpeed.
The first dataset available is 2010, which is the most recent full year available from ADOT.
I went backwards and purchased the 2009 data; I believe this is the oldest dataset available under the current schema — which kindof makes sense as the ACR form was re-vamped for 2009.
2011 data became available in mid-July 2012 ( I received it in the mail from ADOT Risk Management 7/27); and is now ready for querying. This year the data was distributed all on a CD, so thankfully no photocopies of headings and definitions hanging around.
As might be supposed, the tables are named 2011_incident, 2011_person, and 2011_unit ; along with the pretty views. The table structure is identical to 2010.
2012 data became available June 10, 2013, according to an email report. It once again was $15, and a CD was delivered very promptly upon mailing ADOT Risk Management a check. This year the contact was Sarah Greener, Litigation & Public Records Supervisor. The data fields were identical to last (and in fact, all previous) year. I only needed to edit the .sql’s to substitute in 2012_ instead of 2011_ or whatever.
2013 became available in mid-June as usual and the Adot risk mgmt people got it promptly mailed out to me for $17 (2 charge for mail, which apparently was waived or forgotten in previous years. See also arizona-crash-facts-2013 and
One area where data quality has degraded noticeably is with the number of incidents listing a geo-location of 0,0. Many of these also have no OnRoad nor CrossingFeature. This year there are almost 3,000 such incidents; last year there were only 663; and prior years were 2,679, 2,017, and 9,048(! 2009 was a bad year for much inconsistency) respectively. I don’t really see a pattern, e.g. as to particular agencies, in other words they’re sprinkled around the state. The 3,000 include about 50 bicyclist incidents including some with incapacitating injury. I am also pretty shocked to find among the 3,000 forty fatalities and over a hundred incapacitating injuries — this just seems horribly lax.
select count(*) from 2013_incident where Latitude=0 OR Longitude=0;
Everything seemed fine until discovering there’s something wrong with UnitID this year; Up until this year, UnitID (as well as PersonID and IncidentID) have all been unique, even year-over-year. This year, however, there are many (~ 30,000?) “reused” UnitIDs. It’s not clear if this is a bug, or just what… Is the data ok otherwise? who knows. What’s clear is ADOT traffic records people prefer to toil in anonymity and don’t deign to bless us (the general public) with any explanations of their work.
e.g. UnitID=4976526 appears in both 2012 and 2014. Boo. You can test for this condition by running the following (long, it takes over a minute) query; the result should be 0 rows, but it returns some 30,000!:
SELECT UnitID, count(1) c FROM (SELECT * FROM 2014_unit UNION SELECT * FROM 2013_unit UNION SELECT * FROM 2012_unit UNION SELECT * FROM 2011_unit UNION SELECT * FROM 2010_unit UNION SELECT * FROM 2009_unit) x GROUP BY 1 HAVING c>1 ORDER BY 2;
This seems rather odd; I expect to be able to use any of those IDs as a primary key, regardless of year, so that can no longer work. In any event to cover up adot’s short-comings; I’ve added XX0,000,000 where XX is the last two digits of the year to UnitID. For now, just did it to the 2014 tables…
UPDATE 2014_unit SET UnitID = UnitID + 140000000; UPDATE 2014_person SET UnitID = UnitID + 140000000;
Also, there is an error incident=2935635, is incorrectly listed as 2 bicycle crash; it should be bike-MV (there is no such thing a crash not involving any motor vehicle).
I had failed to notice at the time, but in 2014 four fields were added to the tables (this means, by inference, the Arizona Crash Form was updated, also):
- incident table: Offset Direction , Secondary Crash Flag (The latter seems to not have a def’n?)
- unit table: Distracted Driving and Distracted Driving Desc
These fields are added to the “end” (i.e. the far right-hand side) of the respective .csv files. As such, they can be safely ignored, as is presently done for 2014.
I would have thought this would break the “ADOTALL” tables, but it still seems to work, albeit without the new fields…
All Years, 2009-2015
I found it was becoming a drag to re-run queries across year; so in Oct 2014 I created three tables, incident, person, unit that hold all contiguous year in 3 big tables. As new years come by I will add them as well. Complicated queries to these tables need to be somewhat judicious as the amount of data gets larger and larger; mysql.azbikelaw.org is hosted on my home server and is an old laptop!
There is a shell script to build all year’s tables at once, build-asdm-all.sh which is in the ADOTALL folder (analogous to ADOT20xx folders). SEE ABOVE, 2014 FOR A PROBLEM with this approach — unfortuantely UnitID is no long unique, so it can’t be a primary key.
See note, just above, about year 2015 data and new fields.
Justin got these discs from Adot in September 2013 (i.e. these are way out of chronological order)… in any event they still have exactly the same columns/tables layout. The only exception was the values STOP and YIELD in ControlType in the Unit table are reversed; so the definition of eControlType had to be adjusted. (so they are the reverse of 2009-2012 values).
FARS and ASDM
[warning: the table referred to below, farsxref, has fallen out of date] : As of now, FARS (federal, fatality-only data) for years 2010 and 2011 is available with full PBCAT crash typing, see 2010 FARS contains PBCAT data for more info on PBCAT. There’s a somewhat hand-crafted table called farsxref which contains an adot IncidentID, Year, and Fars identifier (the adot IncidentID is unique, for fars, the year and identifier have to be combined to uniquely identify an incident). Anyway, in the table is row for pedalcyclists only (at the moment; there’s not particular reason it shouldn’t have all fatals). Anyways, FARS data will eventually be added to the incident dump query.
See comment below for a list of suggestions.
There are shell scripts to build the databases from raw text input files; build-asdm.sh build-fars.sh and there are also classifying scripts which produce sometimes interesting snapshot views of the tables; classify-asdm.sh and classify-fars.sh
See azbikelaw-gets-its-own-server (the slug is now just azbikelaw) which is password protected; actually it’s a “private” post, it’s invisible (not found) unless logged in.
In no particular order, here are some examples for routine lookups.
By Street Name(s), use OnRoad and/or CrossingFeature. This finds incidents near Catalina Hwy and Houghton Rd.
select * from 2013_incident where (OnRoad LIKE "Catalina%" and CrossingFeature LIKE "Hough%") OR (OnRoad LIKE "Hough%" and CrossingFeature LIKE "Catalina%"); or to get a feel for dangerousness level, instead select things like: SUM(TotalMotoristsInjuries), SUM(TotalNonMotoristsInjuries), SUM(TotalMotoristsFatalities), SUM(TotalNonMotoristsFatalities)
By Street Name and Latitude/Longitude:
This example was to find any crashes along Park Ave, in Tucson, between Park Ave and Irvington (on the North) and Valencia (to the south), where a road diet was done sometime in 2014. Since it’s a north/south road, use Latitude to discriminate. To find Latitudes/Longitudes use a tool such as latlong.net. You can, by the way, form a google maps url if you know lat/long like so: maps.google.com/?q=32.16334,-110.95618 . In any event, the Latitudes in the query are “opened up” a bit to catch more crashes near the begin/endpoints:
SELECT eCollisionManner, count(*) FROM 2012_incident AS i WHERE (OnRoad LIKE "Park Ave%" OR CrossingFeature LIKE "Park Ave%") AND (Latitude > 32.13410 AND Latitude < 32.16334 ) AND (IncidentYear > 2008 AND IncidentYear < 2013) AND CityId = 310 GROUP BY 1;
An example of an east/west road, University Drive in Tempe, between Priest Drive and ~ Ash Ave can be found here; (which is more complicated because it only selects MV-Bike crashes, by the way).