Arizona Bicyclist and Ped Crash Map

Volunteer Needed

crashmap

URGENT: Google has announced the tool used to present the crash maps, below, called “Fusion Tables” is being discontinued Dec 3, 2019.
Azbikelaw is looking for a volunteer familiar with GIS and web-presentation to help effect transition to something else.

Please see this article for more info. Below is the historical data and procedures.


2009-2018 Arizona Traffic Crash Map:

note: because of the aforementioned fusion-table end-of-life, the following are maps only, no filtering capability

For posterity, here are the fusion table links (will redirect to a page explaining that fusion tables are no longer supported:

 

Caveats:

  • The map covers reported traffic collisions; involving a bicyclist, or pedestrian, respectively. The general reporting requirements are any collision w/any injury or > $1,000 worth of property damage occurring on a public road involving a motor vehicle (so for example, a bike-bike, or a bike-ped collision isn’t going to be reported here. It is well established that the lesser the severity, the more under-reporting there is. [for expediency/laziness sake, the peds, but not the bicyclists, only include incidents with exactly one MV and exactly one ped — this limits the amount of “hand” editing that needed to be done. As a result ~ 4% of ped incidents are not on the map]
  • Motorized bicycles are generally categorized as a motorcycle and as such don’t appear here.
  • There is no “exposure“; each crash is represented by a dot on the map, end of story. There are, of course, many more dots where more people live and bike/walk and vice versa. (see e.g. “Is cycling dangerous?“)
  • The key to the dot color for injury severity is as  follows: green-No Injury, blue-Possible Injury, yellow-Non Incapacitating Injury, red-Incapacitating Injury, b/w larger target-Fatality. There is a second map keyed by whether a crash was a hit and run (red means it was).
  • A small number of crashes have no GIS data, and are not on the map. For example;  2009-2013 there are 9,982 bike-MV crashes of which 324 (about 3%) have no latitude/longitude. [I think somebody ought to find out why the number of zeros is so high again in 2013!? 2012 saw a total of 663 (out of something over 100,000), less than 1% and seems acceptable. But 2013 shot up to 2,905!! Why so many?]

About the Data

The underlying data comes from ADOT; each year they release a public dataset (on CD!?) with codified information for each of the ~ 120,000 reportable Motor Vehicle crashes including MV crashes which involve a bicyclists, or a pedestrian. This data is sometimes/also referred to as the ASDM (Adot Safety Data Mart). Currently, 2017 is the most recent year available; and 2009 is the oldest (the oldest data I have, older data is available from Adot).

Of these, about 2,000/yr crashes involve a bicyclist (this number has fallen in recent years to  about 1,400; suspiciously low as detailed here) , and for comparison about 1,500 involve a pedestrian. The pedestrians suffer far more serious injuries and fatalities, however.

For presentation purposes, many, but not all,  of the data fields are extracted and stored in a google fusion table. Because of technical constraints on the way fusion tables are implemented; crashes involving more than one bicyclist or more than one motorist are not completely described. This is a small minority of crashes.

All Motor Vehicle Crash Map — Maricopa County

For illustration purposes, I made a couple of quick maps of ALL motor vehicle crashes — but since fusion tables limits the number of rows to 100,000, I couldn’t fit even a single year for the entire state(!); So here’s a couple of samples — the first one 2013 only and limited it to Maricopa county (there were about 75,000 crashes) the second is occurring in City of Tempe only but for years 2010-2016:

More Technical Minutia

Note that there are 2164 cyclists in total in the 2012_person table -> select count(1) FROM 2012_person where ePersonType IN (‘PEDALCYCLIST’). While most cyclist crashes involve one MV and one cyclist, it is of course possible to involve other combinations

whereas there are 2121 UNIQUE Incidents involving (any number of) cyclists  (the first two queries), versus e.g. 2060 were one MV and one bicycle units (second two queries; note each pair of queries is equivalent, the ‘EXITS’ versus the sF_Bicycle flag) ->

SELECT count(*) FROM 2012_incident i WHERE sF_Bicycle;
SELECT count(*) FROM 2012_incident i WHERE EXISTS 
(SELECT 1 FROM 2012_unit u WHERE u.IncidentID=i.IncidentID AND 
u.eUnitType IN ('PEDALCYCLIST'));

SELECT TotalUnits, TotalNonMotorists,count(*) FROM 2012_incident i 
WHERE sF_Bicycle GROUP BY 1,2 ;
SELECT TotalUnits, TotalNonMotorists,count(*) FROM 2012_incident i WHERE EXISTS 
(SELECT 1 FROM 2012_unit u WHERE u.IncidentID=i.IncidentID AND 
u.eUnitType IN ('PEDALCYCLIST')) GROUP BY 1,2 ;

total: 2021
num units / num non-motorists / count:
2 1 2060
3 1 26
3 2 29
4 3 5
9 8 1

So here is the big query, classified by the number of rows of results it returns by TotalUnits 2190 results (note there are only 2121 unique incidents, so 69 are “duplicate” IncidentIDs; arising from the “extra” units)

SELECT TotalUnits, TotalNonMotorists,count(*)
FROM ((((
2012_incident AS i JOIN 2012_person AS p_bike ON i.IncidentID = p_bike.IncidentID)
 JOIN 2012_unit AS u_bike ON p_bike.UnitID = u_bike.UnitID) 
 JOIN 2012_person AS p_car ON i.IncidentID = p_car.IncidentID)
 JOIN 2012_unit AS u_car ON p_car.UnitID = u_car.UnitID )
 LEFT OUTER JOIN LOVCounty ON i.CountyId = LOVCounty.id
 LEFT OUTER JOIN LOVCity ON i.CityId = LOVCity.id
 LEFT OUTER JOIN LOVNcic AS OffNcic ON i.OfficerNcic = OffNcic.id
 LEFT OUTER JOIN LOVNcic AS ExtNcic ON i.ExtendedNcic = ExtNcic.id
WHERE p_bike.ePersonType = 'PEDALCYCLIST' AND p_car.ePersonType='DRIVER' GROUP BY 1,2;

The Query

Here is the current query. It first joins person (the bicyclist) to the incident table, then joins the unit (of the bicyclist) to the person (of the bicyclist). This person and unit is then repeated for the motorist. Note that this can result in problems because more than one row of results will occur anytime there are more than one bicyclist or motorist involved (any time TotalUnits is > 2). This is relatively unusual, but just be aware. These “extra” rows are dealt with, later, by just deleting them — this results in exactly one row if info for each incident that involves any number of bicyclists [see this comment below on how to DO IT RIGHT! I REALLY NEED TO DO THIS FOR NEXT TIME TO eliminate this manual step of getting rid of the extra rows]: [NOTE u_bike and u_car.eDistractedDriving  are only valid for 2015 onward]

SELECT
i.IncidentID, i.IncidentDateTime, i.IncidentYear,
LOVCounty.name County,LOVCity.name City, 
OffNcic.name Oncic,ExtNcic.name Encic,
i.OfficerId, i.OfficerNcic, i.ExtendedNcic,
i.eCollisionManner,
i.eLightCondition,
i.TotalUnits,
i.TotalNonMotorists,
i.eInjurySeverity,
i.InjurySeverity,
i.HitAndRunFlag,
i.OnRoad, i.CrossingFeature,
i.Latitude, i.Longitude,
i.eTrafficWayType,
i.eIntersectionType,
i.eJunctionRelation,
i.eWeather,
p_bike.UnitNumber b_UnitNumber,
p_bike.eNonMotoristLocation,
p_bike.eSex b_Sex,
p_bike.Age b_Age, 
p_bike.eViolation1 b_Violation1,
p_bike.eSafetyDevice b_SafetyDevice,
p_bike.ePersonType b_PersonType,
u_bike.eLane b_Lane, 
u_bike.eTravelDirection b_Direction, 
u_bike.eUnitAction b_Action,
p_car.UnitNumber c_UnitNumber,
p_car.eSex c_Sex,
p_car.Age c_Age,
p_car.eViolation1 c_Violation1,
u_car.eLane c_Lane, 
u_car.eTravelDirection c_Direction, 
u_car.eUnitAction c_Action,
u_car.PostedSpeed c_PostSpeed,
u_car.EstimatedSpeed c_EstSpeed,
u_car.eDistractedDriving c_DistDr,
u_bike.eDistractedDriving b_DistDr 
FROM ((((
2015_incident AS i JOIN 2015_person AS p_bike ON i.IncidentID = p_bike.IncidentID)
 JOIN 2015_unit AS u_bike ON p_bike.UnitID = u_bike.UnitID) 
 JOIN 2015_person AS p_car ON i.IncidentID = p_car.IncidentID)
 JOIN 2015_unit AS u_car ON p_car.UnitID = u_car.UnitID )
 LEFT OUTER JOIN LOVCounty ON i.CountyId = LOVCounty.id
 LEFT OUTER JOIN LOVCity ON i.CityId = LOVCity.id
 LEFT OUTER JOIN LOVNcic AS OffNcic ON i.OfficerNcic = OffNcic.id
 LEFT OUTER JOIN LOVNcic AS ExtNcic ON i.ExtendedNcic = ExtNcic.id
WHERE p_bike.ePersonType = 'PEDALCYCLIST' AND p_car.ePersonType='DRIVER' AND Latitude>0 ORDER BY i.IncidentID;

Workflow

Here are the steps to create a google Fusion Table:

  1. run the query, select “raw” (-B) results; capture the output file. (one way to do that is to use my php front-end query tool but a password is required; then after running the query click on the download link, that will open the results file in a browser window; then just cntl-A, cntl-C it).
  2. cut/paste entire output file with header row into OO (a spreadsheet) with paste-special / text. select tab delimited.
  3. add a column next to IncidentID. sort entire sheet by  IncidentID, b_unitnumber and c_unitnumber ascending, this will insure whever is unit1 in a multi-vehicle crash is on top. Calculate somethign like =IF(A3=A2;”SAME”;””) and then go and delete (cntl-<minus>) each row marked SAME.
  4. Verify that the correct number of rows is left get the count of cyclist crashes, it should match the number of rows of data left in the file [e.g. the count for 2014 is 1699 crashes (that have a valid Latitude):
  5. Lately I’ve saved as .ods in folder crashReports\(dot)rpts as something like mapBike20xx.ods. Google fusion tables can read .ods files
  6. Add a column next to injurySeverity, call it Icon; paste in the following formula into entire column (adjust column as necessary):
    =IF(P2=1;"small_green";IF(P2=2;"small_blue";IF(P2=3;"small_yellow";IF(P2=4;"small_red";IF(P2=5;"placemark_circle";"null")))))
  7. Open up the existing fusion table in google docs, download the existing table for backup purposed. Then “import more rows” into an existing fusion table (used to import .csv, .ods works too). Note if the columns change, can use fusion table’s Edit/Columns function to make them match up; e.g. added columns in 2015)
  8. (if making a new fusion table) ; in “map of Latitude”, update the info-card style as below; and update the map styles to use a column (icon) for icons.

Sample Info window layout (careful when cutting/pasting and the quote marks!):

<div class='googft-info-window'>
<b>IncidentID:</b> <a target="_blank" href="http://mysql.azbikelaw.org:1080/asdmphp/queryAsdmIncident.php?incident={IncidentID}">{IncidentID}</a><br>
<b>IncidentDateTime:</b> {IncidentDateTime}<br>
<b>City:</b> {City}<b> County:</b> {County}<br>
<b>OfficerNcic:</b> {OfficerNcic}
<b> ExtendedNcic:</b> {ExtendedNcic}<br>
<b>CollisionManner:</b> {eCollisionManner}<br>
<b>LightCond:</b> {eLightCondition}
<b>TotalNumUnits:</b> {TotalUnits}<br>
<b>InjurySeverity:</b> {eInjurySeverity}<br>
<b>HitAndRunFlag:</b> {HitAndRunFlag}
<b>Weather:</b> {eWeather}<br>
<b>OnRoad:</b> {OnRoad}
<b> Cross St:</b> {CrossingFeature}<br>
<b>JuncRelation:</b> {eJunctionRelation}<br>
<hr>
<center><b>Cyclist Info</b></center>
<b>NonMotLoc:</b> {eNonMotoristLocation}<br>
<b> UnitNum:</b> {b_UnitNumber}
<b>Age:</b> {b_Age}/{b_Sex}<br>
<b>Violation1:</b> {b_Violation1}<br>
<b>SafetyDevice:</b> {b_SafetyDevice}<br>
<b>Lane:</b> {b_Lane}
<b> TravelDir:</b> {b_Direction}<br>
<b>UnitAction:</b> {b_Action}
<hr>
<center><b>Driver Info</b></center>
<b> UnitNum:</b> {c_UnitNumber}
<b>Age:</b> {c_Age}/{c_Sex}<br>
<b>Violation1:</b> {c_Violation1}<br>
<b>Lane:</b> {c_Lane}
<b> TravelDir:</b> {c_Direction}<br>
<b>UnitAction:</b> {c_Action}
</div>

2014 Comments

incident=2935635, according to the data is a bike-bike crash(?!). I noticed this only because there were supposed to be 1699 bicyclist incidents where Lat>0 but the big query, and after getting rid of the duplicated rows, only yielded 1698; so i went hunting for the 1699th incident. This is Phx file number 14002064311 so the ACR is readily available — as suspected, unit #2 is a motor vehicle… The officer 07219 assigned fault to the bicyclist apparently for no reason other than it was a counter-flow bicyclist on sidewalk (the motorist was emerging from private drive turning right). In any event, that one is “missing” from crashmap data.

Missing Latitude/longitude: There were 1742 / 43 : total cyclist crashes / crashes with missing Latitude. For pedestrians the numbers were 1561 / 49. This is roughly the same as previous years.

2015

Once again followed the “manual” process, using above query means multiple-unit crashes yield extra rows that I manually delete before adding them to the google fusion table.

Also, added these fields:  u_car.PostedSpeed, u_car.EstimatedSpeed,
u_car.eDistractedDriving (this last one is a new field added in 2014 that I didn’t notice until now). These fields for 2009-2014 will have “no data” in them.

Received 2015 dataset in early June 2016. The numbers of incidents where latitude=0 was relatively consistent at about 3% for both overall and bicyclist incidents (3267/116609 and 44/1434). Though, troublingly, 3 (of 28) fatal bicyclist incidents did not include any location data. I know where one of them occurred (3019899 was at 37th and Grand, 33.498486,-112.138556), so i can insert correct data, but not sure what to do for the other two. So the table for fusion table insertion contains 1391 rows of data (= 1434 – 44 + 1)

SELECT count(*) from 2015_incident;
SELECT count(*) from 2015_incident where Latitude=0;
SELECT count(*) FROM 2015_incident i WHERE EXISTS 
(SELECT 1 FROM 2015_unit u WHERE u.IncidentID=i.IncidentID AND 
u.eUnitType IN ('PEDALCYCLIST'));
SELECT count(*) FROM 2015_incident i WHERE EXISTS 
(SELECT 1 FROM 2015_unit u WHERE u.IncidentID=i.IncidentID AND 
u.eUnitType IN ('PEDALCYCLIST') AND i.Latitude=0) ;

2016 Data / Placeholder

see this comment below on how to DO IT RIGHT! though I still hadn’t figured it out 🙁

2017 Data / Placeholder

Still haven’t figure out how to do it right. This year there were a total of 1496 incidents with the following TotalUnits breakdown:

TotalUnits TotalNonMotorists count(*)
2 1 1472
3 1 12
3 2 7
4 1 4
4 3 1

Running the big query 1519/ 1993 (bikes/peds)  rows of data, and after deleting the “SAME” rows netted 1490 / 1694.  Why not 1496 for bikes? Because there were six with “0” lat/long, including one fatal, I manually added the lat/long for that one, so there are 1,491 rows of bike data in 2017. There were more, 12,  zero lat/long pedestrian incidents (including a couple of fatals, so they are missing from the map; but I don’t have time to correct that.

SELECT count(*) FROM 2017_incident i WHERE sF_Bicycle;

SELECT count(*) FROM 2017_incident i WHERE EXISTS 
(SELECT 1 FROM 2017_unit u WHERE u.IncidentID=i.IncidentID AND 
u.eUnitType IN ('PEDALCYCLIST')) ;

SELECT TotalUnits, TotalNonMotorists,count(*) FROM 2017_incident i WHERE EXISTS 
(SELECT 1 FROM 2017_unit u WHERE u.IncidentID=i.IncidentID AND 
u.eUnitType IN ('PEDALCYCLIST')) GROUP BY 1,2 ;

.

2018 Minutia

Once again, haven’t figured out how to do this right; in summary there were 1282 (1766) bike-MV (ped-MV) collisions, with 29 (252) duplications. Here’s the unit breakdown for bike-MV:

TotalUnits TotalNonMotorists count(*)
2 1 1251
3 1 19
3 2 7
4 1 4
5 4 1

As a curiosity note: one of the 3/2 incidents was one bicyclist and one pedestrian.

An annoyance: I quickly pushed the ped data through. Annoyingly there were FOUR single person ped fatalities that had no location info and as such were excluded from the crash map as I didn’t have time to try and find anything out about them. ALL FOUR had very sparse info indicating they were probably hit and runs yet none had the hit/run flag set; indicating to me they were miscoded. Poor police work. Three were in Navajo county and one was Coconino. Two of them involved an officer ncic affiliated with Navajo reservation, and one was whitemountain-apache reservation. The fourth was Winslow PD.

Google Fusion Table Minutia

You (anyone) can download the table, in .csv format; there is also an option to download metadata which describes the columns as to name and  data-type. When uploaded, google makes some non-obvious choices, e.g. it would be better if discrete numerical values were treated as text, which results in a drop-down filter, instead of numbers, e.g. HitAndRunFlag would be better treated as text. Another problem is Latitude which automatically makes the map (good!) but it ends up being treated as text (bad); a workaround is to make another column be CopyOfLatitude (currently would reside in Column AV if added to the end of the SELECT statement).

uploading data — file size limit on .ods is 1MByte. Otherwise must use .csv (which is sortof annoying because they get really large). I haven’t tried .xlsx (?)

uploading more rows — have to first ensure columns are exactly consistent

Data Consistency Issues

Everything below here is just some miscellaneous random things I was checking out but never completed… Certain choices on the ACR are either always or highly likely to be incorrect

Overuse of UNKNOWN

Overuse of NOT_SPECIFIED

Overuse of OTHER

UnitAction and NonMotoristLocation problems

Problem with this block are manifold.

so e.g. beginning with looking at cyclist’s UnitAction; certain choice ought to be prevented, there were, e.g. a couple of dozen WALKING… actions[1]. If they were really walking, they probably can’t be a cyclist. There was one DRIVERLESS_MOVING_VEHICLE. Other popular categories have more troublesome problems because they are more pervasive, there were 547 CROSSING ROAD, but if you break them down by NonMotoristLocation you see most of them were at intersections[2]. Compare this to the cyclist GOING STRAIGHT AHEAD and break down by NonMotoristLocation [3] you also see many of those are at intersections *in crosswalks*; so it seems that CROSSING ROAD and GOING STRAIGHT AHEAD are both often used to describe a cyclist’s UnitAction when struck in a crosswalk.

There is generally a lot of squishyness in the choosing a NonMotoristLocation in the case of a cyclist that ought to be addressed. There is, by the way, no clear correct choice when a “vehicular” (that is to say, cycling in the roadway; and not on the sidewalk) cyclist has a collision in or near an intersection. The correct answer should be IN ROADWAY, but there is no choice for that! The closest thing is  IN_ROADWAY_NOT_IN_A_CROSSWALK_OR_INTERSECTION, or sometimes AT_INTERSECTION_BUT_NO_CROSSWALK but neither of these is quite correct.

Here is another unlikely scenario: NonMotoristLocation SIDEWALK but UnitAction GOING_STRAIGHT_AHEAD, there were 221 of these (about 11% of all cyclist crashes).  This is definitely possible, but it very unlikely there are that many; rather the location is probably wrong and should usually be either driveway or crosswalk. I randomly stumbled onto one, where the cyclist was indeed on the sidewalk (and not at a driveway, or intersection) and two MV’s collided, one of them striking the cyclist from behind as she rode on the sidewalk. Incident 2655307. Incapacitating injury. shees. In fact most of the other SIDEWALKS are most likely bogus, though far less numerous — e.g. WALKING WITH/AGAINST TRAFFIC, and so forth. See them all at once with this query[4].

[1] SELECT 2012_unit.eUnitAction, count(*) FROM ((2012_incident JOIN 2012_person ON 2012_incident.IncidentID = 2012_person.IncidentID) JOIN 2012_unit ON 2012_person.UnitID = 2012_unit.UnitID) LEFT OUTER JOIN LOVCounty ON 2012_incident.CountyId = LOVCounty.id LEFT OUTER JOIN LOVCity ON 2012_incident.CityId = LOVCity.id WHERE 2012_person.ePersonType = 'PEDALCYCLIST' GROUP BY 1
[2] SELECT 2012_person.eNonMotoristLocation, count(*) FROM ((2012_incident JOIN 2012_person ON 2012_incident.IncidentID = 2012_person.IncidentID) JOIN 2012_unit ON 2012_person.UnitID = 2012_unit.UnitID) LEFT OUTER JOIN LOVCounty ON 2012_incident.CountyId = LOVCounty.id LEFT OUTER JOIN LOVCity ON 2012_incident.CityId = LOVCity.id WHERE 2012_unit.eUnitAction LIKE 'CROSSING_ROAD' AND 2012_person.ePersonType = 'PEDALCYCLIST' GROUP BY 1;
[3] SELECT 2012_person.eNonMotoristLocation, count(*) FROM ((2012_incident JOIN 2012_person ON 2012_incident.IncidentID = 2012_person.IncidentID) JOIN 2012_unit ON 2012_person.UnitID = 2012_unit.UnitID) LEFT OUTER JOIN LOVCounty ON 2012_incident.CountyId = LOVCounty.id LEFT OUTER JOIN LOVCity ON 2012_incident.CityId = LOVCity.id WHERE 2012_unit.eUnitAction LIKE 'GOING_STRA%' AND 2012_person.ePersonType = 'PEDALCYCLIST' GROUP BY 1;
[4] SELECT 2012_person.eNonMotoristLocation,2012_unit.eUnitAction, count(*) FROM ((2012_incident JOIN 2012_person ON 2012_incident.IncidentID = 2012_person.IncidentID) JOIN 2012_unit ON 2012_person.UnitID = 2012_unit.UnitID) LEFT OUTER JOIN LOVCounty ON 2012_incident.CountyId = LOVCounty.id LEFT OUTER JOIN LOVCity ON 2012_incident.CityId = LOVCity.id WHERE 2012_person.ePersonType = 'PEDALCYCLIST' GROUP BY 1,2;

13 thoughts on “Arizona Bicyclist and Ped Crash Map”

  1. I just realized, the field OfficerId is missing from the crash map data; I should probably go ahead and include it; as it would make it easier to search out individual officers.
    According to this f.b. post; a cyclist Henri Benard was unhappy with this: phx 06070 / incident 2014 – 00338989 apparently Feb 26.
    He has a photo of a piece of paper that seems to suggest there was no crash report; but then, if there is no crash report, why is there an incident number? (maybe that’s normal?).

  2. Scott W/TBAG pointed this out to me (it is related to the Tempe Bike count)… It usescorrelated subqueries; a subquery that contains a reference to a table that also appears in the outer query. I note that this solves my “duplication” problem, that is to say it produces one row of data per incident, and is good for pulling out flat files that can be readily used in google fusion tables. I REALLY NEED TO GET ON THE STICK WITH THIS FOR NEXT TIME (2015) AS THE QUERY BELOW RETURNS THE “CORRECT” ROWS, E.G. FOR 2012 I PROPERLY RETURNS 2121 ROWS OF DATA!!!

    select
            2012_incident.IncidentID as IncidentID,
            num.num as num_vehicles_involved,
            eCollisionManner, Longitude, Latitude, InjurySeverity,
            eIntersectionType,
            eCollisionManner,
            LOVCity.name as city_name,         IncidentDateTime,
            (
                select group_concat(person1.eViolation1 separator '+')
                from 2012_unit as unit1
                join 2012_person as person1 on unit1.UnitID = person1.UnitID and unit1.IncidentID = person1.IncidentID
                where unit1.IncidentID = 2012_incident.IncidentID
                and unit1.eUnitType = 'PEDALCYCLIST'
            ) as cyclist_citations,
            (
                select group_concat(person2.eViolation1 separator '+')
                from 2012_unit as unit2
                join 2012_person as person2 on unit2.UnitID = person2.UnitID and unit2.IncidentID = person2.IncidentID
                where unit2.IncidentID = 2012_incident.IncidentID
                and unit2.eUnitType = 'DRIVER'
            ) as driver_citations
        from 2012_incident
        LEFT join LOVCity on 2012_incident.CityId = LOVCity.id
        join (
            select count(*) as num, IncidentID from 2012_unit
            group by IncidentID
        ) as num on num.IncidentID = 2012_incident.IncidentID
        where 2012_incident.IncidentID in (
            select IncidentID from 2012_unit where 2012_unit.eUnitType = 'PEDALCYCLIST'
        )   
        and 2012_incident.IncidentID in (
            select IncidentID from 2012_unit where 2012_unit.eUnitType = 'DRIVER'
        )   
        group by 2012_incident.IncidentID ;

    So I’m struggling to understand these fancy queries! Here is Scott’s explanation:
    The two subqueries near the top execute another query each for each row the main outter query returns. Given an IncidentID, it fiends all of the eViolation1s for first cyclists then motorists.
    If the main query joined on to those, then we’d get an extra result record for every possible permutations of joins across all of the tables. If there were two cyclists and one motorist, there would be four results. So doing that part as a subquery lets me always get only one result for each incident.
    Here is Justin’s explanation to me:
    The first subquery (group by IncidentID) shows how many units were in each incident.
    And the next two subqueries have the same effect as EXISTS in the first query, the one you’re used to. (The second one with eUnitType=’DRIVER’ is probably not necessarily, since only MV crashes are present in the dataset, perhaps a result of ADOT munging, but that component of the query makes it explicit).

  3. [This is/was just another of my stupid ideas. really need to be using correlated subqueries as explained above. I’ll leave this here as a reminder how dumb I am]This is just more attempts at working around my limited understanding/knowledge of how to create queries. Make the following “helper” table to contain exactly one row per incident, and fill it one column at a time; (note if less than 2 units are in an incident U2 and P2 will remain NULL).

    DROP table 2013_helper;
    CREATE TABLE 2013_helper ( IncidentID int PRIMARY KEY, isBike int, isPed int,U1 int, U2 int, P1 int, P2 int) engine=MyISAM ;
    INSERT into 2013_helper select IncidentID, 0, 0, NULL,NULL,NULL,NULL from 2013_incident;
    UPDATE 2013_helper h SET U1=( SELECT UnitID from 2013_unit u WHERE h.IncidentID=u.IncidentID AND u.UnitNumber=1);
    UPDATE 2013_helper h SET U2=( SELECT UnitID from 2013_unit u WHERE h.IncidentID=u.IncidentID AND u.UnitNumber=2);
    # PersonType 4 is passenger; there is only ever 1 driver or ped or pedal person associated w/each unit
    UPDATE 2013_helper h SET P1=( SELECT PersonID from 2013_person p WHERE h.U1=p.UnitID AND p.PersonType <> 4);
    UPDATE 2013_helper h SET P2=( SELECT PersonID from 2013_person p WHERE h.U2=p.UnitID AND p.PersonType <> 4);
    UPDATE 2013_helper h SET isBike=1 WHERE EXISTS ( SELECT 1 from 2013_person p WHERE h.IncidentID=p.IncidentID AND p.ePersonType LIKE 'PEDAL%');
    UPDATE 2013_helper h SET isPed=1 WHERE EXISTS ( SELECT 1 from 2013_person p WHERE h.IncidentID=p.IncidentID AND p.ePersonType LIKE 'PEDES%');
    # make a view
    CREATE OR REPLACE VIEW 2013_incidentHelper AS SELECT i.*,h.isBike, h.isPed, h.U1, h.U2, h.P1, h.P2 FROM 2013_incident i JOIN 2013_helper h ON (i.IncidentID=h.IncidentID);
    

    Now it is (should be) easy to get “flat files” where there is exactly 1 row of results per incident with the proviso that it is specifically limited to whatever was declared unit1 and unit2. For example in a 3-unit crash where units were assigned 1 and 2 to the MVs and 3 to the pedestrian, the pedestrian’s results don’t show up at all(!). Here is a sample query

    SELECT isBike,isPed,count(*)
    FROM (((( 2013_incidentHelper AS i 
     LEFT OUTER JOIN 2013_person AS p_unit1 ON i.P1 = p_unit1.PersonID)
     LEFT OUTER JOIN 2013_unit AS u_unit1   ON i.U1 = u_unit1.UnitID) 
     LEFT OUTER JOIN 2013_person AS p_unit2 ON i.P2 = p_unit2.PersonID)
     LEFT OUTER JOIN 2013_unit AS u_unit2   ON i.U2 = u_unit2.UnitID )
     LEFT OUTER JOIN LOVCounty ON i.CountyId = LOVCounty.id
     LEFT OUTER JOIN LOVCity ON i.CityId = LOVCity.id
     LEFT OUTER JOIN LOVNcic AS OffNcic ON i.OfficerNcic = OffNcic.id
     LEFT OUTER JOIN LOVNcic AS ExtNcic ON i.ExtendedNcic = ExtNcic.id
    GROUP BY 1,2;
  4. I need to check out http://leafletjs.com/
    This may be a better deal than the google maps fusion table.
    There was a very cool map of bicyclist crashes in Berkeley, CA
    http://www.dailycal.org/2015/02/19/berkeley-streets-prove-dangerous-cyclists-2000-collisions-since-2001/
    One of their reports was kind enough to tell me about leaflet…

    From: Sahil Chinoy
    Sorry it took me so long to get back to you. I pulled data from CHP’s SWITRS system in the Berkeley Police Department Jurisdiction (some 20,000 incidents) as a csv file. Because the cross streets aren’t always recorded in the same way (Shattuck Avenue vs. Shattuck Road) I pulled the .csv into Excel, sorted the streets alphabetically, and manually made sure the names were consistent (this took a few hours). I then sorted through the data with a python script, essentially making the cross streets of each incident an alphabetically-sorted tuple, then created a dictionary of incident counts indexing on that tuple. From there, I exported the most dangerous intersections to a .json file, which I used to create the map using leaflet.js.

  5. 2016 Bike-MV crash Data:
    1476 incidents. 1434 were single MV- Single bike, the other 42 were various other permutations. There will be 46 “extra” rows in “the big query” that were deleted manually before entering them into the fusion table.
    There were 25 incidents where there was no GIS data, this is roughly in line with other years as well as overall (2,612 incidents, about 2% had no gis) but ADOT really should not accept such reports. All 25 pedalcyclist incidents, and nearly all of the 2,612 overall incidents had “No Data” for OnRoad; as such they are pretty much completely devoid of location information.
    The queries below select 1) bicyclist incidents, 2)those incidents broken down by # of traffic units, 3) the number of rows in the “big query”, and 4) the number of bicyclist incidents without gis data:

    SELECT count(*) FROM 2016_incident i WHERE EXISTS 
    (SELECT 1 FROM 2016_unit u WHERE u.IncidentID=i.IncidentID AND 
    u.eUnitType IN ('PEDALCYCLIST')) ;
    
    SELECT TotalUnits, TotalNonMotorists,count(*) FROM 2016_incident i WHERE EXISTS 
    (SELECT 1 FROM 2016_unit u WHERE u.IncidentID=i.IncidentID AND 
    u.eUnitType IN ('PEDALCYCLIST')) GROUP BY 1,2 ;
    
    SELECT TotalUnits, TotalNonMotorists,count(*)
    FROM ((((
    2016_incident AS i JOIN 2016_person AS p_bike ON i.IncidentID = p_bike.IncidentID)
     JOIN 2016_unit AS u_bike ON p_bike.UnitID = u_bike.UnitID) 
     JOIN 2016_person AS p_car ON i.IncidentID = p_car.IncidentID)
     JOIN 2016_unit AS u_car ON p_car.UnitID = u_car.UnitID )
     LEFT OUTER JOIN LOVCounty ON i.CountyId = LOVCounty.id
     LEFT OUTER JOIN LOVCity ON i.CityId = LOVCity.id
     LEFT OUTER JOIN LOVNcic AS OffNcic ON i.OfficerNcic = OffNcic.id
     LEFT OUTER JOIN LOVNcic AS ExtNcic ON i.ExtendedNcic = ExtNcic.id
    WHERE p_bike.ePersonType = 'PEDALCYCLIST' AND p_car.ePersonType='DRIVER' GROUP BY 1,2;
    
    SELECT count(*) FROM 2016_incident i WHERE EXISTS 
    (SELECT 1 FROM 2016_unit u WHERE u.IncidentID=i.IncidentID 
     AND u.eUnitType IN ('PEDALCYCLIST') AND  Latitude=0 );
  6. Here are the 2018 queries for bicyclists (for peds just change PEDALCYCLIST to PEDESTRIAN, and sF_Bicycle to sF_Pedestrian). The gross count for 2018 is 1282 incidents, 3 will be excluded due to missing Latitude; the file mapBike2018.ods has the expected number of rows of data: 1279
    After doing that I re-included two of the missing location ones because they were FATAL and I wanted them to be mapped (2/2/2018 Phoenix and 9/4 El Mirage incidents; i guessed or created a location)

    SELECT count(*) FROM 2018_incident i WHERE sF_Bicycle AND sF_Pedestrian;
    SELECT TotalUnits, TotalNonMotorists,count(*) FROM 2018_incident i 
    WHERE sF_Bicycle AND sF_Pedestrian GROUP BY 1,2 ;
    
    SELECT count(*) FROM 2018_incident i WHERE sF_Bicycle;
    SELECT count(*) FROM 2018_incident i WHERE sF_Bicycle AND Latitude=0;
    
    SELECT count(*) FROM 2018_incident i WHERE EXISTS 
    (SELECT 1 FROM 2018_unit u WHERE u.IncidentID=i.IncidentID AND 
    u.eUnitType IN ('PEDALCYCLIST'));
    
    SELECT TotalUnits, TotalNonMotorists,count(*) FROM 2018_incident i 
    WHERE sF_Bicycle GROUP BY 1,2 ;
    
    SELECT TotalUnits, TotalNonMotorists,count(*) FROM 2018_incident i WHERE EXISTS 
    (SELECT 1 FROM 2018_unit u WHERE u.IncidentID=i.IncidentID AND 
    u.eUnitType IN ('PEDALCYCLIST')) GROUP BY 1,2 ;
    
    
    SELECT count(*)
    FROM ((((
    2018_incident AS i JOIN 2018_person AS p_bike ON i.IncidentID = p_bike.IncidentID)
     JOIN 2018_unit AS u_bike ON p_bike.UnitID = u_bike.UnitID) 
     JOIN 2018_person AS p_car ON i.IncidentID = p_car.IncidentID)
     JOIN 2018_unit AS u_car ON p_car.UnitID = u_car.UnitID )
     LEFT OUTER JOIN LOVCounty ON i.CountyId = LOVCounty.id
     LEFT OUTER JOIN LOVCity ON i.CityId = LOVCity.id
     LEFT OUTER JOIN LOVNcic AS OffNcic ON i.OfficerNcic = OffNcic.id
     LEFT OUTER JOIN LOVNcic AS ExtNcic ON i.ExtendedNcic = ExtNcic.id
    WHERE p_bike.ePersonType = 'PEDALCYCLIST' AND p_car.ePersonType='DRIVER';
    
    SELECT TotalUnits, TotalNonMotorists,count(*)
    FROM ((((
    2018_incident AS i JOIN 2018_person AS p_bike ON i.IncidentID = p_bike.IncidentID)
     JOIN 2018_unit AS u_bike ON p_bike.UnitID = u_bike.UnitID) 
     JOIN 2018_person AS p_car ON i.IncidentID = p_car.IncidentID)
     JOIN 2018_unit AS u_car ON p_car.UnitID = u_car.UnitID )
     LEFT OUTER JOIN LOVCounty ON i.CountyId = LOVCounty.id
     LEFT OUTER JOIN LOVCity ON i.CityId = LOVCity.id
     LEFT OUTER JOIN LOVNcic AS OffNcic ON i.OfficerNcic = OffNcic.id
     LEFT OUTER JOIN LOVNcic AS ExtNcic ON i.ExtendedNcic = ExtNcic.id
    WHERE p_bike.ePersonType = 'PEDALCYCLIST' AND p_car.ePersonType='DRIVER' GROUP BY 1,2;
    
    
    
    SELECT
    i.IncidentID, i.IncidentDateTime, i.IncidentYear,
    LOVCounty.name County,LOVCity.name City, 
    OffNcic.name Oncic,ExtNcic.name Encic,
    i.OfficerId, i.OfficerNcic, i.ExtendedNcic,
    i.eCollisionManner,
    i.eLightCondition,
    i.TotalUnits,
    i.TotalNonMotorists,
    i.eInjurySeverity,
    i.InjurySeverity,
    i.HitAndRunFlag,
    i.OnRoad, i.CrossingFeature,
    i.Latitude, i.Longitude,
    i.eTrafficWayType,
    i.eIntersectionType,
    i.eJunctionRelation,
    i.eWeather,
    p_bike.UnitNumber b_UnitNumber,
    p_bike.eNonMotoristLocation,
    p_bike.eSex b_Sex,
    p_bike.Age b_Age, 
    p_bike.eViolation1 b_Violation1,
    p_bike.eSafetyDevice b_SafetyDevice,
    p_bike.ePersonType b_PersonType,
    u_bike.eLane b_Lane, 
    u_bike.eTravelDirection b_Direction, 
    u_bike.eUnitAction b_Action,
    p_car.UnitNumber c_UnitNumber,
    p_car.eSex c_Sex,
    p_car.Age c_Age,
    p_car.eViolation1 c_Violation1,
    u_car.eLane c_Lane, 
    u_car.eTravelDirection c_Direction, 
    u_car.eUnitAction c_Action,
    u_car.PostedSpeed c_PostSpeed,
    u_car.EstimatedSpeed c_EstSpeed,
    u_car.eDistractedDriving c_DistDr,
    u_bike.eDistractedDriving b_DistDr 
    FROM ((((
    2018_incident AS i JOIN 2018_person AS p_bike ON i.IncidentID = p_bike.IncidentID)
     JOIN 2018_unit AS u_bike ON p_bike.UnitID = u_bike.UnitID) 
     JOIN 2018_person AS p_car ON i.IncidentID = p_car.IncidentID)
     JOIN 2018_unit AS u_car ON p_car.UnitID = u_car.UnitID )
     LEFT OUTER JOIN LOVCounty ON i.CountyId = LOVCounty.id
     LEFT OUTER JOIN LOVCity ON i.CityId = LOVCity.id
     LEFT OUTER JOIN LOVNcic AS OffNcic ON i.OfficerNcic = OffNcic.id
     LEFT OUTER JOIN LOVNcic AS ExtNcic ON i.ExtendedNcic = ExtNcic.id
    WHERE p_bike.ePersonType = 'PEDALCYCLIST' AND p_car.ePersonType='DRIVER' AND Latitude>0 ORDER BY i.IncidentID;

Leave a Reply

Your email address will not be published. Required fields are marked *