Tag Archives: mysql

Fun with tables.

table from mysql -H html output:

InjurySeverity IncidentID IncidentDateTime Onroad CrossingFeature
5 2371350 2010-04-17 01:18:00 99th Ave Camelback Rd
5 2401688 2010-08-13 00:00:00 99th Ave Union Hills Dr
5 2416950 2010-10-31 11:03:00 99th Ave Lindgren Ave

table from myphpadmin:

IncidentID IncidentDateTime Onroad CrossingFeature
2446148 2010-09-23 06:41:00 Via Linda 117th St
2453234 2010-09-23 07:45:00 Missouri Ave 63rd Ave
2445966 2010-09-23 08:22:00 University Dr Hardy Dr
2451975 2010-09-23 11:10:00 Ina Rd La Cholla Blvd
2451005 2010-09-23 13:49:00 Colombo Ave Charleston Rd
2454504 2010-09-23 15:10:00 Glenn St Orlando Ave
2454482 2010-09-23 15:31:00 8th St Tyndall Ave
2453231 2010-09-23 15:44:00 Camelback Rd 79th Ave
2454471 2010-09-23 15:45:00 Oklahoma St 12th Ave
2448706 2010-09-23 16:13:00 3rd Ave Encanto Blvd
2448069 2010-09-23 17:00:00 Delaware Dr Apache Trl

Arizona Bicyclist and Ped Crash Map

2009-2015 Arizona Traffic Crash Map:

Caveats:

  • The map covers reported traffic collisions; involving a bicyclist, or pedestrian, respectively. The general reporting requirements are any collision w/any injury or > $1,000 worth of property damage occurring on a public road involving a motor vehicle (so for example, a bike-bike, or a bike-ped collision isn’t going to be reported here. It is well established that the lesser the severity, the more under-reporting there is. [for expediency/laziness sake, the peds, but not the bicyclists, only include incidents with exactly one MV and exactly one ped — this limits the amount of “hand” editing that needed to be done. As a result ~ 4% of ped incidents are not on the map]
  • Motorized bicycles are generally categorized as a motorcycle and as such don’t appear here.
  • There is no “exposure”; each crash is represented by a dot on the map, end of story. There are, of course, many more dots where more people live and bike/walk and vice versa.
  • The key to the dot color for injury severity is as  follows: green-No Injury, blue-Possible Injury, yellow-Non Incapacitating Injury, red-Incapacitating Injury, b/w larger target-Fatality. There is a second map keyed by whether a crash was a hit and run (red means it was).
  • A small number of crashes have no GIS data, and are not on the map. For example;  2009-2013 there are 9,982 bike-MV crashes of which 324 (about 3%) have no latitude/longitude. [I think somebody ought to find out why the number of zeros is so high again in 2013!? 2012 saw a total of 663 (out of something over 100,000), less than 1% and seems acceptable. But 2013 shot up to 2,905!! Why so many?]

crashmapAbout the Data

The underlying data comes from ADOT; each year they release a public dataset with codified information for each of the ~ 100,000 reportable Motor Vehicle crashes. This data is sometimes/also referred to as the ASDM (Adot Safety Data Mart). Currently, 2014 is the most recent year available; and 2009 is the oldest (the oldest data I have, older data is available from Adot).

Of these, about 2,000/yr crashes involve a bicyclist, and for comparison about 1,500 involve a pedestrian. The pedestrians suffer far more serious injuries and fatalities, however.

For presentation purposes, many, but not all,  of the data fields are extracted and stored in a google fusion table. Because of technical constraints on the way fusion tables are implemented; crashes involving more than one bicyclist or more than one motorist are not completely described. This is a small minority of crashes.

Continue reading Arizona Bicyclist and Ped Crash Map

ADOT Traffic Collision Database

It turns out (who knew?) that ADOT sells their crash database for a nominal sum. I purchased the 2010 version, the latest full-year available (2011 is supposed to be ready in July). This data is either similar to (or synonmyous with) something referred to as the Arizona (or ADOT?) Safety Data Mart — thus the acronym asdm sprinkled throughout.

The data is delivered on a DVD which contains three large text files; corresponding to Incident, Person, and Crash -level data. It is also accompanied with 5 photocopied pages of “Column Headings”, and about two-dozen pages of photocopied “Definitions”. [it is really strange that they would distribute this information on paper!?]. I was surprised to find out that very little of the data aligns with FARS/GES, which seems quite strange to me.

Anyway, I pieced together most of the info using those photocopies and assembled it for ready-reference in a spreadsheet adsm.xls , there is one worksheet for columns (fields) and another for defintions (ENUM). I don’t have all the definitions there, some i didn’t care about and were very lenghy, like vehicle color, state abbreviations, and so forth; nearly all of that is avaible at the referece material

Reference Material

  • Background info on the ALISS database,  and related terms: AIDW (Adot Information Data Warehouse), and Safety Data Mart.
  • My spreadsheet adsm.xls; provides a unifying list of fields, and enumerations

If the document links, above, go dead; there are local copies in asdm/.

The Database

azbikelaw.org is making this data available publicly via a MySQL database accessible via the internet. Special thanks to Justin Pryzby for completing this work.

Years 2001-2003, and 2009-Newest
hostname:     mysql.azbikelaw.org
databasename: asdm
username:     asdmuser
password:     Contact us
access via myPhpAdmin currently not available

Crash Map 
See crashmap-data

The tables were loaded from raw text files into tables and then re-processed with to create 3 tables: 2010_incident, 2010_person, and 2010_unit.

Interactive (e.g. myPhpAdmin, or MySQL Query Browser) users will want to refer to views: pretty_2010_incident, pretty_2010_person, and pretty_2010_unit which substitute enumerated fields for the original data fields, and leaves out the description-only fields. For example there are 3 fields all carrying the same information: TravelDirection = 1 (int), TravelDirectionDesc = “NORTH” (text), and a synthetic field eTravelDirection=’NORTH’ (enum). The first two are not in the pretty_unit view, while the unit2 table has all three. The table unit should not be used, and is likely to be dropped to save space.

Furthermore, all database users should rely on the enumerated field, eTravelDirection in the above example, and not on the integer values, as they are liable to change in subsequent years, whereas the enumerations can be kept consistent when subsequent data from a new year is imported.

There are some additional helper tables, such as LOVCity, LOVCounty and county which are handy for looking up things like city codes (e.g. Phoenix has a CityID of 241).

Data Completeness

A word about data completeness: As explained in more detail in this comment below. The data received from ADOT represents data frozen in time mid-year (currently the cutoff date appears to be May 31 of the following year). Data does continue to trickle in, however, and for unknown reasons (why shouldn’t 6 months be enough time for PD’s to send all their corrected data from the preceding year?). ADOT initially releases a version of Crash Facts based on the May 31 data; and then subsequently re-publishes an update a few months later. My dataset is frozen in time at the May 31st (or June 1st) version; which doesn’t make me happy but tends to be statistically insignificant.

Trickling example: The number of pedalcyclist crashes in 2012: 2121 / 2134 / 2141 as of May 31 2013 / Oct 28 2013 / Late Nov 2013. I.e. 2012 crash reports continue to trickle in in Late November of the next year!

GIS issues: a widely varying (from year-to-year) number of incidents have a lat/long of zero. More about that at crashmap-data.

I have noted in a few cases (only fatals?) where the data in asdm seems to be massaged relative to the acr, see comment below about “fiddling” (also called “fudgery”) with the CollisionManner and PostedSpeed.

2010

The first dataset available is 2010, which is the most recent full year available from ADOT.

2009

I went backwards and purchased the 2009 data; I believe this is the oldest dataset available under the current schema — which kindof makes sense as the ACR form was re-vamped for 2009.

2011

2011 data became available in mid-July 2012 ( I received it in the mail from ADOT Risk Management 7/27); and is now ready for querying. This year the data was distributed all on a CD, so thankfully no photocopies of headings and definitions hanging around.

As might be supposed, the tables are named 2011_incident, 2011_person, and 2011_unit ; along with the pretty views. The table structure is identical to 2010.

2012

2012 data became available June 10, 2013, according to an email report. It once again was $15, and a CD was delivered very promptly upon mailing ADOT Risk Management a check. This year the contact was Sarah Greener, Litigation & Public Records Supervisor. The data fields were identical to last (and in fact, all previous) year. I only needed to edit the .sql’s to substitute in 2012_ instead of 2011_ or whatever.

2013

2013 became available in mid-June as usual and the Adot risk mgmt people got it promptly mailed out to me for $17 (2 charge for mail, which apparently was waived or forgotten in previous years. See also arizona-crash-facts-2013 and

One area where data quality has degraded noticeably is with the number of incidents listing a geo-location of 0,0. Many of these also have no OnRoad nor CrossingFeature. This year there are almost 3,000 such incidents; last year there were only 663; and prior years were 2,679, 2,017, and 9,048(! 2009 was a bad year for much inconsistency) respectively. I don’t really see a pattern, e.g. as to particular agencies, in other words they’re sprinkled around the state. The 3,000 include about 50 bicyclist incidents including some with incapacitating injury. I am also pretty shocked to find among the 3,000 forty fatalities and over a hundred incapacitating injuries — this just seems horribly lax.

select count(*) from 2013_incident where Latitude=0 OR Longitude=0;

2014

Everything seemed fine until discovering there’s something wrong with UnitID this year; Up until this year, UnitID (as well as PersonID and IncidentID) have all been unique, even year-over-year. This year, however, there are many (~ 30,000?) “reused” UnitIDs. It’s not clear if this is a bug, or just what… Is the data ok otherwise? who knows. What’s clear is ADOT traffic records people prefer to toil in anonymity and don’t deign to bless us (the general public) with any explanations of their work.

e.g. UnitID=4976526 appears in both 2012 and 2014. Boo. You can test for this condition by running the following (long, it takes over a minute) query; the result should be 0 rows, but it returns some 30,000!:

SELECT UnitID, count(1) c FROM (SELECT * FROM 2014_unit UNION SELECT * FROM 2013_unit UNION SELECT * FROM 2012_unit UNION SELECT * FROM 2011_unit UNION SELECT * FROM 2010_unit UNION SELECT * FROM 2009_unit) x GROUP BY 1 HAVING c>1 ORDER BY 2;

This seems rather odd; I expect to be able to use any of those IDs as a primary key, regardless of year, so that can no longer work. In any event to cover up adot’s short-comings; I’ve added XX0,000,000 where XX is the last two digits of the year to UnitID. For now, just did it to the 2014 tables…

UPDATE 2014_unit SET UnitID = UnitID + 140000000;
UPDATE 2014_person SET UnitID = UnitID + 140000000;

Also, there is an error incident=2935635, is incorrectly listed as 2 bicycle crash; it should be bike-MV (there is no such thing a crash not involving any motor vehicle).

2015

I had failed to notice at the time, but in 2014 four fields were added to the tables (this means, by inference, the Arizona Crash Form was updated, also):

  • incident table: Offset Direction , Secondary Crash Flag (The latter seems to not have a def’n?)
  • unit table: Distracted Driving and Distracted Driving Desc

These fields are added to the “end” (i.e. the far right-hand side) of the respective .csv files. As such, they can be safely ignored, as is presently done for 2014.

I would have thought this would break the “ADOTALL” tables, but it still seems to work, albeit without the new fields…

All Years, 2009-2015

I found it was becoming a drag to re-run queries across year; so in Oct 2014 I created three tables, incident, person, unit that hold all contiguous year in 3 big tables. As new years come by I will add them as well. Complicated queries to these tables need to be somewhat judicious as the amount of data gets larger and larger; mysql.azbikelaw.org is hosted on my home server and is an old laptop!

There is a shell script to build all year’s tables at once, build-asdm-all.sh which is in the ADOTALL folder (analogous to ADOT20xx folders). SEE ABOVE, 2014 FOR A PROBLEM with this approach — unfortuantely UnitID is no long unique, so it can’t be a primary key.

See note, just above, about year 2015 data and new fields.

2001-2003

Justin got these discs from Adot in September 2013 (i.e. these are way out of chronological order)… in any event they still have exactly the same columns/tables layout. The only exception was the values STOP and YIELD in ControlType in the Unit table are reversed; so the definition of eControlType had to be adjusted. (so they are the reverse of 2009-2012 values).

FARS and ASDM

[warning: the table referred to below, farsxref, has fallen out of date] : As of now, FARS (federal, fatality-only data) for years 2010 and 2011 is available with full PBCAT crash typing, see 2010 FARS contains PBCAT data for more info on PBCAT. There’s a somewhat hand-crafted table called farsxref which contains an adot IncidentID, Year, and Fars identifier (the adot IncidentID is unique, for fars, the year and identifier have to be combined to uniquely identify an incident). Anyway, in the table is row for pedalcyclists only (at the moment; there’s not particular reason it shouldn’t have all fatals). Anyways, FARS data will eventually be added to the incident dump query.

Database improvements?

See comment below for a list of suggestions.

Shell Scripts

There are shell scripts to build the databases from raw text input files; build-asdm.sh build-fars.sh and there are also classifying scripts which produce sometimes interesting snapshot views of the tables; classify-asdm.sh and classify-fars.sh

See azbikelaw-gets-its-own-server (the slug is now just azbikelaw) which is password protected; actually it’s a “private” post, it’s invisible (not found) unless logged in.

Examples

In no particular order, here are some examples for routine lookups.

By Street Name(s), use OnRoad and/or CrossingFeature.  This finds incidents near Catalina Hwy and Houghton Rd.

select * from 2013_incident where (OnRoad LIKE "Catalina%" and CrossingFeature LIKE "Hough%") OR (OnRoad LIKE "Hough%" and CrossingFeature LIKE "Catalina%");
or to get a feel for dangerousness level, instead select things like:
SUM(TotalMotoristsInjuries), SUM(TotalNonMotoristsInjuries), SUM(TotalMotoristsFatalities), SUM(TotalNonMotoristsFatalities)

By Street Name and Latitude/Longitude:

This example was to find any crashes along Park Ave, in Tucson, between Park Ave and Irvington (on the North) and Valencia (to the south), where a road diet was done sometime in 2014.  Since it’s a north/south road, use Latitude to discriminate. To find Latitudes/Longitudes use a tool such as latlong.net. You can, by the way, form a google maps url if you know lat/long like so:  maps.google.com/?q=32.16334,-110.95618 . In any event, the Latitudes in the query are “opened up” a bit to catch more crashes near the begin/endpoints:

SELECT eCollisionManner, count(*)
FROM 2012_incident AS i WHERE 
(OnRoad LIKE "Park Ave%" OR CrossingFeature LIKE "Park Ave%") AND 
(Latitude > 32.13410 AND Latitude < 32.16334 ) AND
(IncidentYear > 2008 AND IncidentYear < 2013) AND
CityId = 310 GROUP BY 1;

An example of an east/west road, University Drive in Tempe, between Priest Drive and ~ Ash Ave can be found here; (which is more complicated because it only selects MV-Bike crashes, by the way).

FARS and PBcat

Commencing with the recently-released 2010 data FARS (The USDOT’s Fatality Analysis and Reporting System) will have far more specialized detail on Pedestrian and Bicyclists crashes.

“Motorist Failure to Yield — signed intersection” One of several dozen crash types defined by PBCAT

618 cyclists (person type 6 bicyclist, and 7 other pedalcyclist) were killed in 2010 in traffic collisions — and as noted at the link above, only collisions with motor vehicles in-transport are tracked by FARS. So for example, a bicyclist who lost control and died as a result of crashing into a tree would not be tracked here, nor would a bicyclist who strikes a parked motor vehicle. Continue reading FARS and PBcat