FARS and PBcat

Commencing with the recently-released 2010 data FARS (The USDOT’s Fatality Analysis and Reporting System) will have far more specialized detail on Pedestrian and Bicyclists crashes.

“Motorist Failure to Yield — signed intersection” One of several dozen crash types defined by PBCAT

618 cyclists (person type 6 bicyclist, and 7 other pedalcyclist) were killed in 2010 in traffic collisions — and as noted at the link above, only collisions with motor vehicles in-transport are tracked by FARS. So for example, a bicyclist who lost control and died as a result of crashing into a tree would not be tracked here, nor would a bicyclist who strikes a parked motor vehicle.

The added information becomes a new “table” (in the parlance of databases), if you download the raw data files, it will be all found in the file PBtype.dbf

The information follows more-or-less exactly the PBcat, probably no coincidence. PBcat is the Pedestrian and Bicyclist Crash Analysis Tool, and contains among other things the bicyclists direction, and a detailed crash type (or for the complete reference, see PBcat’s manual, appendix C). [If these links go dead — Looks like it can all be gotten from the archive.org wayback machine; here is the reference manual; here are images of crash types] [Local archives: PBCAT_Manual.pdf and the (now very old, ver 2.1.1 windows xp-only) installation of PBCAT is available but not presently online. Along with the original/old PBCAT manuals, there is an updated coding manual 2014 FARS NASS GES Pedestrian BicyclistsManual .

So, say you were interested in bicyclists running stop signs; you would do a query and then a univariate split and take a look at Types 142, 144, and 147; all subtypes of “Bicyclist Failed to yield — Sign-Controlled Intersection”. There were 17+41+1 = 59 such fatalities recorded.

Since this is the first and only year where these national stats are available, they are of limited usefulness — that will change over time as the dataset grows and wil become a very useful comprehensive source of understanding bicyclist (and pedestrian) traffic fatalities. (update: see this comment below; all pbcat data is now gone, and they hope to have it back for 2015. sad.)

PBcat in the wild

It’s not clear to me why, but the state of North Carolina had PBcatted all their ped and bike crashes statewide for many years; leading to a very rich database of statistical information. E.g. 12,000 bike crashes covering the period 1997-2008!

In Arizona, ADOT as part of the BSAP has PBcatted 746 bike crashes in the “concentration” areas but it only covers the state-highway system. See e.g. Table 1 in working paper 3 for the full breakout of crash types.

Here is a presentation from the Pro Walk/Pro Bike 2006 conference by Steiner, Henderson, Libby Thomas that has some interesting details and background on PBCAT. I still don’t understand why pbcat won’t run on Windows 7(! requires XP!? I got it installed in an XP virtual machine). Here are a few places that used pbcat: . Orlando (no crash groups. They have some interesting breakdowns),  Wisconsin (they present crash types, didn’t use groups), Miami-Dade County (they seem to have rolled-their-own crash groups; see section 2.4 of the plan)

The MySQL Database

I got so frustrated trying to use NHTSHA’s front-end to query the FARS database, I set out to build my own mysql database; anyone wanting to query can do so by requesting the password and using the following details  Sorry, external/internet access no longer available:

hostname (internal only): 10.1.1.10
databasename: fars
username:     asdmuser
password:     (internal use only)
access via phpmyadmin.

The raw data is available via ftp from ftp://ftp.nhtsa.dot.gov , anonymous/anonymous — using an ftp client works far better than navigating it in a browser. The zip file for 2010 is: FARS2010.zip

The dbf formatted files were downloaded and unzipped; there are over a dozen .dbf file, each represents a “table” of data. The main tables are accident.dbf, person.dbf, and vehicle.dbf (and PBType.dbf of special interest).

Each dbf of interest was opened in OpenOffice Calc (this can take several minutes for the large tables!) — the first row was extracted and saved as text for later processing to become the CREATE TABLE command. Each field name has the form FIELD_NAME,[N | C],x,0. N becomes datatype int (or float in the case of latitude and longitude) and C becomes char(x). The header row is deleted, and the data is saved as text (csv) file [see the dbf2csv converter mentioned below for and alternative] — OpenOffice insists on surrounding text fields with either a single or double quote. These need to be subsequently eliminated with any text processor… this linux command seems to do the trick (the updated version of OO I now use has an option to not put in the quotes, so this is no longer needed. Yay), anyways if need be here is a sed command to do the trick (it’s tricky because of the escaping necessary):

sed -i 's/"//g' filename.txt

These text files are then LOADED into MySql tables, see e.g. 2010_PBType.sql

There are 16(!) tables in the 2010 data; several of them are new. Currently the following tables are in the MySQL database:

  • 2010_incident (from accident.dbf)
  • 2010_person (person.dbf)
  • 2010_vehicle (vehicle.dbf)
  • 2010_PBType (PBType.dbf)

The data is also available in SAS format. SAS is some sort of statistics package that I am unfamilar with; however there’s a very useful-looking file called format10.sas which has data definitions in it, and is easier than using the 811530.pdf, the FARS  Coding and Validation Manual. There’s a second manual, the FARS Analytical Reference Manual  811529.pdf 

Anyways, the correlation between the fields in the .dbf files can be looked up in the Analytical Ref Manual:  for example in the Manual p.79 there is a field called Special Jurisdiction, FARS C16 (i.e. the 16th element of the crash level data) that is in the accident dataset (which I’ve taken the initiative to properly name incident rather than accident) — with a “SAS name” of SP_JUR… which corresponds directly to the column heading name in accident.dbf. This field also appears to correspond to SPJUR10F in the format10.sas file, but I am unclear as to how someone is supposed to know that.

Another oddity — there is much strangely redundant stuff in  person.dbf like e.g. make and model of vehicle, which is also and obviously in the vehicle.dbf ?? The idea of duplicating data across tables is a database design no-no!

2009

The order of data in 2009 (and earlier, i would guess) is SIGNIFICANTLY different compared to 2010 and later. So I didn’t grind through it. Another reason is 2009 didn’t have PBCAT data, an likely never will.

2010

2010 was the oldest year for which I loaded up data. See above…

2011

The 2011 tables were loaded sometime around fall 2012. The structures were identical; or at least identical for the four tables of interest.

2012

The big (bad) news, there is not PBCAT info this year; so there is no 2012_PBTable, and furthermore it’s been “removed” from the published 2010 and 11 data; see this comment for details. Other than that, not much to report; several fields got slightly switched around; which is annoying. See the .sql files for specifics. For example, the field IMPACT2 was removed from both the Person and Vehicle tables.

Re-downloaded FARS2012.zip Feb 22, 2015 (the dates for most of the .dbf’s were 1/28/2014 or newer) in order to get “final” files. Reprocessed everything with dbf2csv script; and rebuilt the databases (added an empty PBType table so multi-year queries won’t fail).

An annoying note about 10000000000000000, 9999999999999999, and 2147483647

Ok, so I looked at a hexdump of the original nhtsa .dbf files — the value  9999999999999999 (so, sixteen ascii 9’s) often show up seemingly to mean “not coded” or “not available”. For whatever reason when I open the .dbf in OpenOffice it interprets this as 10000000000000000 (that’s 1 followed by 16 zeros); and subsequently saves the text file with the ascii representation of this. When I read it into mysql; I had (and still have) spec’ed the fields as int, and not bigint. The largest int value is 2147483647 — thus you will see that value a lot. This doesn’t seem to cause any trouble — but it does seem to violate the documentation set forth in the FARS Analytic Reference Guide. For example the field Curb Weight, VIN_WGT is supposed to be either 0 meaning not available, 9999 meaning Value not coded, or a number between 1 and 9998 meaning weight in pounds. Why is the value 9999999999999999 be in the data? Other fields are similar; that is to say the sixteen-nines is notwhere to be found in the documentation. In any event in my mysql tables anything above, in this instance, 9998 will turn out to be 2147483647 and that simply obviously means not coded. [update: the dbf2csv python script maintains the 9999999999999999 values]

2013

Surprisingly, the 2013 dataset I grabbed in January 2015 has PBCAT data, i.e. has a PBType.dbf file. The analytic reference manual for this year 812092.pdf says it should not be there, and was not expected to be there(?); “Appendix E: Pedestrian and Bicyclist Data Availability Change”. Weird. Anyway, i loaded it. This appears to be a mistake/leak, the .zip file as of 2/22/2015 has that file removed. The data appears to be reasonable.

Inserted the 2013 data into my private server’s database Jan 2015. There were quite a few fields removed from person and vehicle tables (the VIN stuff, got moved to a new table VINDecode), as compared to 2012, see the .sql files for details. Also see 20xx_postproc.sql for definitions of some synthetic fields (so-called because they are derived from other fields), including person type tags I elevated to the incident (“accident”) table so that you can select incidents involving, e.g. bicyclists, or ped, e.g. to check the split between urban/rural for bicyclist incidents you can do this without having to join another table:

select sUrbanRural,count(*) from 2013_incident where sF_Bicycle GROUP BY 1;

Not much changed in the data definitions: a new harmful event was added #73 ‘Object fell from Motor Vehicle In Transport’; a bunch of typo corrections, and oddly, updated descriptions that only apply to pbcat values, like BKPOS (bicyclist position), and as such aren’t even in the dataset this year, e.g. #01 was “Travel Lane” and is now “On a roadway, in a shared travel lane”.

Here’s a query that joins the incident, person (specifically, the bicyclist), and the vehicle tables. Note that it returns more than one result in the few instances where there is more than one fatal in an an incident, e.g. in 2013 there were 740 fatal bicyclist incidents and the query below returns 743 results (note: peds and cyclists are always VEH_NO=0. I have to go back and verify if there is always a number 1, I seem to remember a handful with only a number 2 or something).

SELECT p_bike.eLOCATION,sUrbanRural,count(1) FROM 
(( 2013_incident as i JOIN 2013_person AS p_bike ON (i.ST_CASE = p_bike.ST_CASE AND p_bike.eINJ_SEV LIKE ('Fatal%') AND p_bike.ePER_TYP IN ('Bicyclist', 'Other Cyclist') )) 
JOIN 2013_vehicle as vehicle ON (i.ST_CASE = vehicle.ST_CASE AND vehicle.VEH_NO=1 )) 
 GROUP BY 2,1;

So, there are 743 bicyclist fatalities; i.e. three of the 740 incidents involved two persons killed; and other 737 involved one each.  Slightly different results for total number killed can be gotten either from the incident table, or the person table:

select SUM(FATALS) from 2013_incident where sF_Bicycle; --> 744, i.e. includes one not-cyclist
SELECT count(*) FROM 2013_person WHERE eINJ_SEV LIKE ('Fatal%') AND ePER_TYP IN ('Bicyclist', 'Other Cyclist') ; --> 743

Created a VIEW called person (i.e. like 2013_person w/o the year prefix) that was just most fields from each year UNIONed together. See the readme in folder FARSALL. I should/will then create an incidents, and vehicles view to get multi-year data. [as things turn out, this was horrendously slow for anything interesting; so created tables without the year_ prefix, see the FARSALL folder]

I did a bunch of work with the alcohol fields in person table this year, and reflected back to 2010; eALC_DET, eALC_STATUS, eATST_TYP, sALC_RES, see use-of-alcohol-as-a-risk-factor-for-bicycling-injury

Used the python script dbf2csv to re-process FARS data files; see more about how to use it at this private post.  As of this writing, 2010-2012 have been re-done with finalized data, and 2013 is still preliminary, final is due out in approx December 2015.

File Versions

This link www-fars.nhtsa.dot.gov/Common/FileVersions.html gives the current version and release dates on a year-by-year basis. As of this writing, December 2015, the most recent year’s final files are available for 2012 and earlier. So I have final files for years 2010-2012, and preliminary for 2013. Based on the most recent file release, 2012 did not become final until July 24, 2015.

2014 and 2015

With the sharp increase in traffic fatalities in 2015 (35,092) — this is getting some extra attention, e.g. the Fastlane blog said on Aug 29, 2016: “2015 Traffic Fatalities Data Has Just Been Released: A Call to Action to Download and Analyze”. The big data news here is that PBCAT data has reappeared(!), and the 2014 and 2015 datasets released today both have a PBtable.

2014: replaced existing 2014 data with this new dataset, from the datestamps, it appears I had downloaded 2014 just a few days before the update was release. Oh well; the update went completely smoothly and followed the same file procedures: that is there is one zip file with national dbf’s in it. I used dbf2csv to convert, and all columns matched what was previously released in 2014 preliminaries. See “New in 2014” in the 2014 FARS Analytic Reference Manual for all the changes. There are several non-motorist tables that contain tiny tidbits of info each: nmprior, nmcrash, nmimpair — I never understood why this info isn’t in the person table; ie. would be undefined when the person is a motorist.

2015: Was a different story. There were several changes in the tables I import (the accident/incident, person, vehicle, and PBType tables). See “New in 2015” in the 2015 FARS Analytic Reference Manual. Some columns got deleted (person death CERT_NO), the functional classification in the accident/incident table got completely changed, with ROAD_FNC deleted, and simliar info into three new columns: RUR_URB (which is essentially the same as my synthetic sUrbanRural flag), FUNC_SYS, and RD_OWNER. And ALC_RES in person table has been re-worked from 2 to 3 digits, which if i’m lucky can be consistent for prior years by re-working the synthetic sALC_RES.

I discovered an error in this 2015 fatality (bicyclist direct incorrectly said “facing traffic”) in january 2017; I have an email thread thru some USDOT folks back to ADOT (Kerry Wilcoxson, formerly of CoP, now ADOT State Safety Engineer), the answer was it was a typo; but unfortunately they said the cutoff for corrections was the end of the year (2016).

Current status: 2014 and 2015 both rolled into my mysql database including PBCAT.

2016

I downloaded the 2016 FARS dataset on 1/9/2018, though it apparently has been available since 10/6/2017, see e.g. this USDOT press release, the top-line stats being deaths were worse, even when adjusted for an increase in VMT:

37,461 lives were lost on U.S. roads in 2016, an increase of 5.6 percent from calendar year 2015. The number of vehicle miles traveled on U.S. roads in 2016 increased by 2.2 percent, and resulted in a fatality rate of 1.18 deaths per 100 million VMT – a 2.6-percent increase from the previous year.

Review of the changed info indicates no new fields. Also, there is now a direct .csv version of the data; and release notes indicate they will be phasing out .dbf version of FARS data sometime in the near future — this is good news, as it’s one less step in the import process (no need for dbf2csv tool/converter).

2018 / 2019

Some year’s raw csv files have a tiny number of weird charachters in them, in them. For Example 2018 Vehicle has several occurances of xA0 (hex A0, which is a non-breaking space, nbsp).

Fixed by simply specifying character set as latin1 instead of default. Updated by putting that into the LOAD DATA commands in the sql files, e.g. 2018_vehicle.sql

other ideas are to use iconv command to eliminate errant charachters, e.g. either would work:

iconv -c --from-code ascii --to-code ascii in.csv --output out.csv
iconv --from-code latin1 --to-code ascii//TRANSLIT in.csv --output out.csv

.

Query to Validate Correct Counts

visit main page to correlate between datafile and FARS official data; tallies of total crashes and total deaths should match exactly:

SELECT ePER_TYP, eINJ_SEV, YEAR, count(*) FROM 2017_person WHERE eINJ_SEV LIKE 'Fatal%' GROUP BY 1,2,3;
SELECT SUM(FATALS), count(*) FROM 2017_incident;
SELECT sUrbanRural,count(1), YEAR FROM 2017_incident GROUP BY 1;

The last query is a sanity check; (see more queries here ) and look at the number of Unknowns for Urban/Rural — in preliminary (“annual”) datafiles, there will be a relatively large number of unknowns; many hundreds. In the final file the number is typically less than 100.

 

23 thoughts on “FARS and PBcat”

  1. Mainly out of frustration with the web interface to the FARS database…

    The entire raw FARS data is distributed and freely downloadable as a zip file containing a bunch of files, all in .dbf format — each representing a table. nhtsa ftp site or My Local copy.
    These can be read in to a spreadsheet type program (I used OpenOffice) and outputted as (e.g. comma separated) text files.

    I put the 2010 FARS PBType table into a mysql database.
    So this would be nationwide ped and bicyclists MV fatals.
    This table has a row for each ped or bicyclist involved in a fatal collision (the large majority, but not every row is associated with one fatality).
    There are a bunch of other tables, i also loaded up the (entire, not just bike/ped) incident table (which they annoyingly still call accident.dbf, but i will have none of that!); there’s a field called ST_CASE which acts as a unique case identifier across all tables.

    I would guess the most interesting feature would be the breakdown of BikeCrashType/Grp;
    I made a spreadsheet 🙂 !!
    Listing the attribute definitions: http://azbikelaw.org/crashReports/asdm/FARS.xls

    The database is named farsaz, and the password scheme is analogous to asdmadmin setup:
    mysql -h farsaz.db.4318490.hostedresource.com -u farsaz -ppassword -D farsaz
    mysql -h farsaz.db.4318490.hostedresource.com -u farsazuser -puserpassword -D farsaz


    After i did the above, it came to my attention that in addition to the .dbf format data files there are ALSO something called SAS (Statistical Analysis Software?) format files. In particular, this file format10.sas in file 2010\SAS\FSAS2010.zip contains a plain text definitions file that looks like it might come in handy (i had been cutting/pasting 811530FARS2010CodingandValidationManual.pdf which was definitely a p.i.t.a ). The table files end with a .sas7bdat extension; here is a reader for that file format along with some explanatory notes on the files structure.

  2. Lighting (LGT_COND); 2010 Data

    Daylight = 332/617 = 54%
    Darkness = (121+132+1)/617 = 41%
    The remaining 4% was dusk or dawn

    Note in the table below, the terminology refers to the condition of the roadway, so e.g. “Dark – Not Lighted” is nothing to do with whether or not the cyclist was lighted.

    SELECT count(*) FROM 2010_incident i WHERE EXISTS (SELECT 1 FROM 2010_person p WHERE p.ST_CASE=i.ST_CASE AND (p.PER_TYP=6 OR p.PER_TYP=7) AND p.INJ_SEV=4 ) ;
    
    SELECT LGT_COND,count(1) FROM 2010_incident i WHERE EXISTS (SELECT 1 FROM 2010_person p WHERE p.ST_CASE=i.ST_CASE AND (p.PER_TYP=6 OR p.PER_TYP=7) AND p.INJ_SEV=4 ) GROUP BY 1 ORDER BY 1 ;
    Light Condition arizona Only US
    Daylight 9 332
    Dark – Not Lighted 2 121
    Dark – Lighted 6 132
    Dawn 0 9
    Dusk 2 21
    Dark – Unknown Lighting 0 1
    Other 0 0
    Not Reported 0 0
    Unknown 0 1
    TOTAL 19 617

    The table was extracted using the FARS web front-end, while the SELECT statements above are from my own mysql database; the results do match — and represent an *incident* count (slightly different than a fatality count) where a cyclist was killed (and not just a count of all cyclists in the FARS person database). To get a particular state add on an AND STATE=4 (for for example, arizona) to the query.

  3. New Table: farsxref

    Note to self: should also put peds into this table (since the PBType covers both cyclists and peds…

    I have all the FARS plus ASDM data tables loaded up on my own local server in databases named asdm and fars. A new table will be added to asdm.farsxref just three fields: IncidentID, Year, and FarsCase.
    The year is slightly redundant; it is possible to lookup by IncidentID number in all the asdm.YYYY_incident tables until it is found but that seems like a waste.
    CREATE TABLE asdm.farsxref ( IncidentID int PRIMARY KEY, Year int, FarsCase int) engine=MyISAM ;

    So here’s a query to select the data we want from the ASDM tables:

    SELECT IncidentID, IncidentYear FROM asdm.2013_incident i WHERE EXISTS  
    (SELECT 1 FROM asdm.2013_unit u WHERE u.IncidentID=i.IncidentID AND u.eUnitType IN ('PEDALCYCLIST') AND i.InjurySeverity=5 ) 
    ORDER BY IncidentDateTime ;
     -- OR THIS OTHER QUERY SHOULD PRODUCE SAME RESULTS --
    SELECT IncidentID, IncidentYear FROM asdm.2013_incident i WHERE sF_Bicycle AND i.InjurySeverity=5
    ORDER BY IncidentDateTime ;
    

    And here’s what that would look like plugged into an INSERT command (note, though that after this the FarsCase would be NULL:

    INSERT INTO farsxref ( IncidentID, Year ) SELECT IncidentID, IncidentYear FROM 2012_incident i WHERE EXISTS (SELECT 1 FROM 2012_unit u WHERE u.IncidentID=i.IncidentID AND u.eUnitType IN ('PEDALCYCLIST') AND i.InjurySeverity=5 ) ;

    Re-run this command for all the years. Note since IncidentID is a primary key, if you re-run it for the same year, it will get an error (and that’s probably good since we want exactly one row for each incident).

    So I tried to figure out a way to join them together, e.g. i THINK something like this should work (but doesn’t!):

    SELECT az_i.IncidentDateTime FROM asdm.2011_incident AS az_i JOIN fars.2011_incident AS fars_i WHERE
    az_i.IncidentDayOfWeek = fars_i.DAY_WEEK AND fars_i.STATE = 4 AND az_i.InjurySeverity=5;


    But I gave up… I didn’t know how to rectify the different date/time formats, e.g. in asdm there an entire datetime, and in fars there are various subsets of that, e.g. day, month, hour, minute and so forth.

    So i chickened out and used these two queries, and produced a spreadsheet I eyeballed for correctness and output as the following csv, from whence they can be LOADed asdm.farsxref this will be incorporated into build-asdm.sql :


    SELECT IncidentID, IncidentYear, IncidentDateTime, sF_Bicycle, sF_Pedestrian FROM asdm.2016_incident i
    WHERE (sF_Bicycle OR sF_Pedestrian) AND i.InjurySeverity=5 ORDER BY IncidentDateTime ;
    SELECT i.ST_CASE, i.YEAR, i.MONTH, i.DAY, i.HOUR, i.MINUTE, sF_Bicycle, sF_Pedestrian FROM fars.2016_incident i
    WHERE (sF_Bicycle OR sF_Pedestrian) AND i.STATE=4 ORDER BY i.MONTH, i.DAY, i.HOUR, i.MINUTE;

    Note this will produce a few false-positives — that is fatal collisions where a bike or ped is only incidentally involved (not killed); these will need to be manually deleted from the .csv during eye-balling.

    An older, now abandoned, scheme involved turning them into INSERT statements:
    “IncidentID”,”IncidentYear”,”ST_CASE”


    anomolies — 2015 was a nightmare:
    http://10.1.1.10/asdmphp/queryAsdmIncident.php?incident=2959511 is coded as a moped in asdm but in fars, it’s a “personal conveyance”
    http://10.1.1.10/asdmphp/queryFarsIncident.php?year=2015&st_case=40108

    http://10.1.1.10/asdmphp/queryFarsIncident.php?year=2015&st_case=40840 says pedestrian acoma/pima but cannot find in asdm at all?!
    AND this one!!
    http://10.1.1.10/asdmphp/queryFarsIncident.php?year=2015&st_case=40841 says ped on “BIA 12” cannot find in asdm at all. one with that date/time was on I40 but that one involved two drivers.
    AND this one!!
    http://10.1.1.10/asdmphp/queryFarsIncident.php?year=2015&st_case=40801 says ped on US-191 cannot find in asdm. There is one at another time that day on us191 but that’s written as a single-vehicle where passenger killed.

    Oh, this is weird, for 2012 adot incident 2624035 is missing from Fars!? According to asdm, this is a ped fatality occuring on 5/30/2012.

  4. UPDATE: as of at least Aug 29, 2016 PBCAT is back… “Starting from 2014, Pedestrian and Bicyclist file is included in the file release”

    Well this stinks: on 07/03/2013 NHTSA quietly removed pbcat data entirely, citing unspecified “inconsistencies…”. ug. Here is the text of the notification:

    PBTYPE In 2010, NHTSA added new pre-crash data elements for pedestrians and bicyclists (non-motorist) to the data collected in the Fatality Analysis Reporting System (FARS) and the National Automotive Sampling System (NASS) General Estimates System (GES). These data are intended for countermeasure research and development. However, NHTSA’s National Center for Statistics and Analysis (NCSA) quality control team has identified inconsistencies between the new non-motorist data elements and comparable pre-existing FARS elements that can lead to different results. Consequently, NHTSA has removed the pbtype table from the 2010 and 2011 FARS and NASS GES files while research is conducted on how improvements can be made. This pbtype table includes some non-motorist data (see below). The following data elements have been removed: (all of them, i think)

    Here is an email that explains some more detail; Ug, note timeframe for getting data back isn’t for 2 YEARS (the latter part of 2015).

    ———- Forwarded message ———-
    Date: Wed, Oct 23, 2013 at 5:14 PM
    Subject: RE: FW: NASS redesign and bicyclist injuries

    Sorry for the delayed response, I hope this is clarifying.
    We are working to correct the new Ped/Bike typing application that we introduced in 2010 and used on a trial basis since then (all new data elements).
    Our goal is to have revisions to the typing application in place in time to have confidence to release data it generates during the 2014 data collection year. However, the following Pedestrian Bicycle Elements continue to be available for public access:

    NON-MOTORIST LOCATION AT TIME OF CRASH
    NON-MOTORIST ACTION/CIRCUMSTANCES PRIOR TO CRASH
    NON-MOTORIST ACTION/CIRCUMSTANCES AT TIME OF CRASH
    NON-MOTORIST SAFETY EQUIPMENT
    CONDITION (IMPAIRMENT) AT TIME OF CRASH
    RELATED FACTORS – PERSON (NOT A MOTOR VEHICLE OCCUPANT) LEVEL

    We do not have plans to correct data already generated or already being generated by the typing application (2010-2013). These will serve as internal test data for us.

    We estimate that the 2014 data generated from the typing application (new data elements) should be available the latter part of 2015.

    In reviewing the 2010 -2013 data generated by the typing application, some of the anomalies we noted included the following:

    •We observed apparent inconsistencies, for example, between data coded in the elements above and data generated by the typing application.
    •We observed apparent unsatisfactory inter-coder reliability in the data generated by the typing application.
    •Assumptions and definitions used within the application differ from those understood for the rest of our traditional data elements.

  5. sample query with incident joined to person and PBType for cyclists… Note that this “overcounts” in the sense that it yields 676 rows, wheres there were only 640 cyclists in the entire database. There is a solution to this in another comment…

    SELECT
    i.sLighting,pbtype.eBIKECTYPE,count(*)
    FROM ((
    2010_incident as i JOIN 2010_person AS p_bike ON i.ST_CASE = p_bike.ST_CASE)
      JOIN 2010_PBType AS pbtype ON i.ST_CASE = pbtype.ST_CASE)
    WHERE p_bike.ePER_TYP IN ('Bicyclist', 'Other Cyclist')
    GROUP BY 1,2;

    model type breakdown (auto, LTV light truck van, heavy truck, etc):
    SELECT sModel,count(*) FROM 2011_vehicle GROUP BY 1;

    mysql -u root -ppassword -D fars

  6. To avoid “overcounting”, add conditions to the join’s ‘on’ clause; for example the following 3 queries all yield 618; one-and-only-one row for each fatality. In the first example, the incident is joined with the (bicyclist)person. Next the vehicle (note there is always a vehicle#1, note that sometimes there are #2, etc but those are ignored. Also of note: cyclists are assigned vehicle#0). And thirdly, the PBType table is joined…

    SELECT count(1)
    FROM (
    2010_incident as i JOIN 2010_person AS p_bike ON (i.ST_CASE = p_bike.ST_CASE AND p_bike.eINJ_SEV LIKE ('Fatal%') AND  p_bike.ePER_TYP IN ('Bicyclist', 'Other Cyclist')  ));
    
    SELECT count(1)
    FROM ((
    2010_incident as i JOIN 2010_person AS p_bike ON (i.ST_CASE = p_bike.ST_CASE AND p_bike.eINJ_SEV LIKE ('Fatal%') AND  p_bike.ePER_TYP IN ('Bicyclist', 'Other Cyclist')  ))
    JOIN 2010_vehicle as vehicle ON (i.ST_CASE = vehicle.ST_CASE AND vehicle.VEH_NO=1 ));
    
    SELECT count(1)
    FROM (((
    2010_incident as i JOIN 2010_person AS p_bike ON (i.ST_CASE = p_bike.ST_CASE AND p_bike.eINJ_SEV LIKE ('Fatal%') AND  p_bike.ePER_TYP IN ('Bicyclist', 'Other Cyclist')  ))
      JOIN 2010_vehicle as vehicle ON (i.ST_CASE = vehicle.ST_CASE AND vehicle.VEH_NO=1 ))
      JOIN 2010_PBType AS pbtype ON (i.ST_CASE = pbtype.ST_CASE AND p_bike.PER_NO=pbtype.PER_NO));

    This next one, to self-join the person table with the driver’s info; doesn’t work as i would expect it, it yields 640 rows but I don’t understand why (640 is, possibly coincidentally, the total number of cyclists in the database including those not killed)… I also noted the count(distinct(i.ST_CASE)) was 615 so I don’t understand that at all, should be 617…

    SELECT count(1)
    FROM ((((
    2010_incident as i JOIN 2010_person AS p_bike ON (i.ST_CASE = p_bike.ST_CASE AND p_bike.eINJ_SEV LIKE ('Fatal%') AND  p_bike.ePER_TYP IN ('Bicyclist', 'Other Cyclist')  ))
      JOIN 2010_vehicle as vehicle ON (i.ST_CASE = vehicle.ST_CASE AND vehicle.VEH_NO=1 ))
      JOIN 2010_PBType AS pbtype ON (i.ST_CASE = pbtype.ST_CASE AND p_bike.PER_NO=pbtype.PER_NO))
      JOIN 2010_person AS p_driver ON (i.ST_CASE = p_driver.ST_CASE AND vehicle.VEH_NO = 1 AND p_driver.PER_TYP=1 ));
    

    Another issue is counting crashes versus counting fatalities; there were 618 cyclist fatalities, but only 617 incidents — in other words, there was one double-fatality and all the others were single fatalities. Check this by selecting count(distinct(i.ST_CASE)) instead of count(1).

  7. Fatalities of Pedestrians, Bicycle Riders, and Motorists Due to Distracted Driving Motor Vehicle Crashes in the U.S., 2005–2010, Jim P. Stimpson, et al

    Used FARS data to count “distracted driving” deaths by looking at driver factors. John Allen has a blog post raising some questions… Also, unrelated note: they use something called the USDA continuum codes to classify between rural and urban (“metro”). Not sure how this might vary from the more obvious (to me) ROAD_FNC, the Roadway Function Class. As the author noted to me in a private email, and I would completely agree; the categorization is quite subjective (that was noted, e.g. many years ago in the Cross study where each crash was completely re-categorized on a case-by-case basis).

  8. Queries used to gather lists of info to link FARS to ASDM (arizona specific) databases. These queries use the EXISTS scheme to produce lists of incidents; exactly one row per fatal incident regardless of number of fatalities (i.e. they count fatal collisions, not the number of people killed; for cyclists these numbers are almost always equal)…

    Get list of FARS bicyclists killed in AZ sorted chronologically; joins incident and person tables
    SELECT i.ST_CASE,i.MONTH, i.DAY, i.HOUR, i.MINUTE FROM fars.2012_incident i WHERE EXISTS (SELECT 1 FROM 2012_person p WHERE p.ST_CASE=i.ST_CASE AND p.PER_TYP IN ('6', '7') AND i.STATE=4) ORDER BY i.MONTH, i.DAY, i.HOUR, i.MINUTE;

    Get corresponding list of ASDM incidents (1st example uses unit table):
    SELECT IncidentID, IncidentYear, IncidentDateTime FROM 2012_incident i WHERE EXISTS (SELECT 1 FROM 2012_unit u WHERE u.IncidentID=i.IncidentID AND u.eUnitType IN ('PEDALCYCLIST') AND i.InjurySeverity=5 ) ORDER BY IncidentDateTime ;
    Or, a 2nd example yields same result using person table
    SELECT IncidentID, IncidentYear, IncidentDateTime FROM 2012_incident i WHERE EXISTS (SELECT 1 FROM 2012_person p WHERE p.IncidentID=i.IncidentID AND p.ePersonType IN ('PEDALCYCLIST') AND p.InjuryStatus=5 ) ORDER BY IncidentDateTime ;

    To capture pedestrian results, in Fars specify PER_TYP of both 5 and 8 (personal conveyances, typically a powered personal assistive device, arizona codes these as pedestrians).
    Fars: HIT_RUN (is only in the vehicle table)
    asdm: HitAndRunFlag

    This info is normally spread-sheeted in crashStats/FARS.xls ; and also gets put into the official AZ cyclist fatality grid kept public on Google Docs.

  9. Good explanation of why the intersection data element in crash data isn’t useful, from Serge Issakov on bicycledriving group:
    There are two widely used definitions for “intersection”. I call them the narrow and broad interpretations. Neither is right or wrong – they’re just different. You do have to understand which is being used in whatever context you’re reading, because they’re meanings are very different.

    The narrow interpretation strictly limits “intersection” to refer to the space that is common to any two intersecting roadways.
    The broad interpretation of “intersection” refers to the vicinity of any place where vehicular travel from different directions can coincide.

    All of the following are intersections per the broad interpretation, but not the narrow interpretation:
    Vicinity of a driveway and a street.
    Vicinity of an alley and a street.
    The area immediately before a cross street.
    The FARS data is based on the narrow definition of an intersection. It ultimately depends on whether the officer at the scene checks the “intersection” box on the form, which the are trained to do only if the collision occurs within the intersecting space of two roadways.

    So if a cyclist riding near a curb is right hooked by a right turning vehicle before the end of the block has been reached, it’s counted as a non-intersection crash. If a cyclist is left-crossed by an oncoming motorist turning left into a mid-block driveway, it’s counted as a non-intersection crash.…more

  10. Raw counts for sanity check purposes (counts people killed and not incidents):

      SELECT PER_TYP,ePER_TYP,count(1) FROM 2012_person WHERE INJ_SEV=4 GROUP BY 1 ORDER BY 1  ;
    +---------+---------------------------+----------+
    | PER_TYP | ePER_TYP                  | count(1) |
    +---------+---------------------------+----------+
    |       1 | Driver of a Motor Vehicle |    21394 |
    |       2 | Passenger of a Motor Vehic|     6389 |
    |       3 | Occupant of a Motor Vehicl|       45 |
    |       4 | Occupant of a Non-Motor Ve|       17 |
    |       5 | Pedestrian                |     4743 |
    |       6 | Bicyclist                 |      722 |
    |       7 | Other Cyclist             |        4 |
    |       8 | Pers Convey               |      149 |
    |       9 | Unknown Occupant Type in a|       86 |
    |      10 | Persons In/On Buildings   |       11 |
    |      19 | Unknown                   |        1 |
    +---------+---------------------------+----------+
    can just count up pedalcyclists and peds, respectively with:
    SELECT count(*) FROM 2012_person p WHERE PER_TYP IN ('6', '7') AND INJ_SEV=4  ; ==>> 726
    SELECT count(*) FROM 2012_person p WHERE PER_TYP IN ('5', '8') AND INJ_SEV=4  ; ==>> 4892

    But counting incidents instead will be slightly smaller numbers, since a few of these crashes involve more than one fatality…:

    SELECT count(*) FROM 2012_incident i WHERE EXISTS (SELECT 1 FROM 2012_person p WHERE p.ST_CASE=i.ST_CASE AND p.PER_TYP IN ('6', '7') AND INJ_SEV=4); ==>> 722; and 4804 peds.
    

    So I created a series of flags that split by PER_TYP and reflected them into the incident table:
    sF_Pedestrian, sF_Bicycle, sF_Motorist, and sF_OtherUnk (see 20xx_postproc.sql for definitions. As its name suggests, postproc is run after all the tables are generated; some of these flags rely on other tables).
    There are some others, sF_Motorcyclist is a subset of Motorist. And sModel is reflected up from the vehicle table into the the incident table for convenience (mine!). Here’s a possibly interesting query that reveals counts of fatality types:

    SELECT sF_OtherUnk,sF_Motorist,sF_Bicycle,sF_Pedestrian,sF_Motorcycle,count(1) from 2012_incident group by 1,2,3,4,5;

    Unfortunately, this still doesn’t solve my problem, e.g. when i join the incident and vehicle tables, i *still* get more than one result per incident — in the query below, there are 760 results; one for each motor vehicle involved, rather than one result for each of 724 incidents.

    SELECT i.VE_FORMS,count(1) FROM ( 2012_incident AS i
     JOIN 2012_vehicle as v ON i.ST_CASE =v.ST_CASE) WHERE sF_Bicycle=1 GROUP BY 1;

    Here is a sample modeled after the query used to create the AZ crash map… unfortunately i never figured out how to eliminate the duplicate cases; that’s why the query below yields, 726 results; one for each cyclist fatality rather than 722 results. Note that the “AND i.sF_Bicycle” is redundant:

    SELECT i.ST_CASE, i.sUrbanRural, p_bikeped.eLocation, p_bikeped.ePER_TYP FROM ( 2012_incident AS i  JOIN 2012_person as p_bikeped ON i.ST_CASE =p_bikeped.ST_CASE) WHERE p_bikeped.PER_TYP IN ('6', '7') AND p_bikeped.INJ_SEV = 4 AND i.sF_Bicycle;
    

    Here are the ped/cyclist by Vehicle Model type queries:

    SELECT  i.sModel,count(1)  FROM 2012_incident AS i WHERE  i.sF_Bicycle GROUP BY 1;
    +---------------------+----------+
    | sModel              | count(1) |
    +---------------------+----------+
    | Automobiles         |      279 |
    | Light Trucks        |      323 |
    | Motorcycles         |        6 |
    | Med and Heavy Truck |       61 |
    | Bus                 |       12 |
    | Unknown             |       41 |
    +---------------------+----------+
    SELECT  i.sModel,count(1)  FROM 2012_incident AS i WHERE  i.sF_Pedestrian GROUP BY 1;
    +---------------------+----------+
    | sModel              | count(1) |
    +---------------------+----------+
    | NULL                |       13 |
    | Automobiles         |     2108 |
    | Light Trucks        |     1936 |
    | Motorcycles         |       28 |
    | ATV                 |        1 |
    | Med and Heavy Truck |      284 |
    | Motor Home          |        2 |
    | Bus                 |       64 |
    | Not Reported        |        2 |
    | Other(Vehicle)      |        3 |
    | Unknown             |      372 |
    +---------------------+----------+
  11. http://bikeleague.org/content/new-report-every-bicyclist-counts#comment-2946
    full EveryBicyclistCounts .pdf report.
    There are some irregularities; they seem to want to overstate the fraction of strike-from-behind fatalities, claiming they found much higher (40%) vs. FARS 2011 (27%).
    There are a couple of problems, 1) their 40% is really more like 34% (194/567) when computed in the normal way, and 2) their data is a subset and it’s not known what the characteristics of the “missing” crashes are; they gathered their list from, e.g. news stories. This means the characteristics of their sample is potentially different than the true full set in unknowable ways. “We captured 628 fatalities overall and 552 in 2012 alone. In 2012, FARS reported 726 bicyclist deaths”. P.s. in FARS 2010, the figure for rear-end was 25% (154/681). I just so happen to have that factoid in fars.xls

    I left this comment on their blog site:
    It’s customary to total tabular data, i.e. “n=”; it’s also customary when representing percentages that they total to 100% (give or take rounding error) I’m confused about the numbers in Table 1 — i totaled up the ‘#’ column and got 567. (which would make the normal way to report percentages 194/567 = 34%). Did i total it wrong? what is 567 supposed to represent?

  12. This is the query i used to pull splits for rear-ends; other splits were eLGT_COND, sLighting, sUrbanRural, eROAD_FNC
    SELECT eBIKECGP, count(1)
    FROM (((
    2011_incident as i JOIN 2011_person AS p_bike ON (i.ST_CASE = p_bike.ST_CASE AND p_bike.eINJ_SEV LIKE ('Fatal%') AND p_bike.ePER_TYP IN ('Bicyclist', 'Other Cyclist') ))
    JOIN 2011_vehicle as vehicle ON (i.ST_CASE = vehicle.ST_CASE AND vehicle.VEH_NO=1 ))
    JOIN 2011_PBType AS pbtype ON (i.ST_CASE = pbtype.ST_CASE AND p_bike.PER_NO=pbtype.PER_NO))
    GROUP BY 1;

    This all had to do with the LAB paper every bicyclist counts and how it was their number for rear-end was 40% (no one seems to know; lab won’t release data?!)

    You can use that query above to find that there were 154+164=318 (2010+2011) Motorist Overtaking crash group of a total of 618+677=1295. Equals 25%
    (for the curious, that includes 2+2 “other” pedalcyclists, typically trikes)

    Note/reminder on that query: the bit where it joins to the vehicle table isn’t necessarily consistent; it turns out that for 2010 and 11 bicyclists it works fine; but there were some odd cases where there was no veh #1 in pedestrian cases. In other words, there usually, but not always a vehicle #1. I’m not sure why there isn’t always a #1. E.g. in AZ, there is ALWAYS a veh #1. In this slight re-spin of that query, to get driver’s gender needed to also join the person table again, decided arbitrarily to force it to veh #1 so that the quantites would remain the same; i.e. ignore the gender of other drivers when more than one veh is involved:

    SELECT eBIKECGP, p_car.SEX, count(1)
    FROM ((((
    2011_incident as i
    JOIN 2011_person AS p_bike ON (i.ST_CASE = p_bike.ST_CASE AND p_bike.eINJ_SEV LIKE ('Fatal%') AND p_bike.ePER_TYP IN ('Bicyclist', 'Other Cyclist') ))
    JOIN 2011_vehicle as vehicle ON (i.ST_CASE = vehicle.ST_CASE AND vehicle.VEH_NO=1 ))
    JOIN 2011_PBType AS pbtype ON (i.ST_CASE = pbtype.ST_CASE AND p_bike.PER_NO=pbtype.PER_NO))
    JOIN 2011_person AS p_car ON (i.ST_CASE = p_car.ST_CASE AND p_car.ePER_TYP LIKE 'Driver%' AND p_car.VEH_NO=1 ))
    GROUP BY 1,2;

    Factoids: males Drivers are ~ 2.5X more often involved with a bicyclist fatality compared to females. Also if you split by sHit_Run, of the drivers *identified*, male drivers are 3-4X more frequent than females. (higher incidence of risky behaviors like impaired driving?)

  13. pre-cursor to PBCAT? Pedestrian and Bicyclist Crash Types of the Early 1990’s “A total of 45 distinct bicycle-motor vehicle crash types are identified in the NHTSA Manual Accident Typing (MAT) for Bicyclist Accidents Coder’s Handbook. Each type is characterized by a specific sequence of causal events or bicyclist/driver actions preceding the crash occurrence….” This paper is an update to cross/fisher and other than lumping turns (e.g. right hook) with overtaking into what they called “parallel paths” seems like good stuff.

  14. Steve Magas asked, with reference to IIHS page
    “It is estimated that 21 percent of bicyclists age 16 and older who were killed in 2013 had blood alcohol concentrations (BACs) at or above 0.08 percent. Imputations for missing BACs were provided by the U.S. Department of Transportation’s multiple imputation model beginning in 1982”

    content that was here has been moved to use-of-alcohol-as-a-risk-factor-for-bicycling-injury

  15. There is a handy document
    http://www-nrd.nhtsa.dot.gov/Pubs/F10CH.pdf
    “Below is a list of FARS elements that have substantial changes for 2010… If you have any questions concerning this Coding Manual, please contact NISR, Inc. at: 717-751-2823; or by e-mail at: john@nisrinc.com NEED HELP IN CODING? CONTACT NISR AT: codingquestions@nisrinc.com
    Anyways, i never heard of the National Institute for Safety Research nisrinc.com which appears to be a private contractor involved with coding and training for coding and so forth.

  16. From AAA Agressive Driving Research Update 2009:
    The following driver-related contributing factors in FARS were taken as indications that crashes may have involved aggressive driving.
    • Following improperly
    • Improper or erratic lane changing
    • Illegal driving on road shoulder, in ditch, or on sidewalk or median
    • Passing where prohibited by …
    • Operating the vehicle in an erratic, reckless, careless, or negligent manner or suddenly
    changing speeds
    • Failure to yield right of way
    • Failure to obey traffic signs, traffic control devices,…
    • Failure to observe warnings or instructions on vehicle displaying them
    • Failure to signal
    • Driving too fast for conditions or in excess of posted speed limit
    • Racing
    • Making an improper turn

    The factors summarized in this list represent FARS driver-related contributing factors 26, 27, 29, 33, 34, 35, 36, 38, 39, 41, 42, 44, 46, 47, and 48.

    Along with the contributing factors analyzed here, FARS also includes a code for Road Rage /
    Aggressive Driving, which was added in 2004. T

Leave a Reply

Your email address will not be published. Required fields are marked *