Safety Analytics



The objective of this course is to impart students of both UG and PG levels with a holistic view of safety analytics across an organization through advanced analytic and reporting technologies. Upon completion of this course, the students will know (i) types, sources and characteristics of safety data and their integration for organization - wide safety centric data  model, (ii) safety data visualization and exploration, (iii) safety performance evaluation and

monitoring, (iv) safety predictive models, and (v) safety related decision making. The methodologies, mathematics, techniques and algorithms needed for this course are drawn from statistics

(both frequentist and Bayesian approaches) , machine learning and data mining. The primary focuses of this course is learn from past data, predict the future and take data driven decision making.

Class lectures: 4 hours per week

Instructor:

Teaching Assistants:




Prerequisites: Probability & statistics , basic knowledge in programming

Text & reference Books:

  • Primary textbook: Pattern Recognition and Machine Learning by Christopher M Bishop, Springer.
  • Recommended textbook:
  • Machine Learning by Tom Mitchell, McGraw Hill,
  • Generalised Linear Models by McCullagh P and Nelder JA, Chapman & Hall.
  • Reference books
  • Probabilistic Risk Assessment and Management for Engineers and Scientists, by H Kumamoto and E J Henley, IEEE Press.
  • Productive Safety Management by Tania Mol, Butterworth Heinemann.
  •  Predictive Analytics by Siegel E, Wiley India Pvt. Ltd.
  •  Big Data at Work by Davenport TH, HBR Press.
  •  Applied Predictive Analytics by D Abbott, Wiley.

Grading:

  • Final grades will be assessed based on three components: internal assessment (IA) consisting of home assignments, term projects and attendance (20%), mid-term examination (30%), and end-term examination (50%).
  • It is mandatory to submit home assignments (HA) and term projects (TP) within due dates. Failing of which leads to non-evaluation of mid-term or end-term answer scripts.
  • Late submission will lead to reduction of marks in IA. If the assignment marks (e.g., HA or TP) is x, the marks to be awarded will be x-d, where d is the delay in number of days. Zero marks will be awarded for d ≥ x.Home assignments& term project:

 

  • HW1:Basics of safety analytics
  • HW2:Data cleaning and exploration
  • HW3:Control charts
  • HW4: Regression -GLM
  • HW5: Classification-I LDA & SVM
  • HW6: Classification-II Decision Tree & Ensemble Methods
  • HW7:Cluster Analysis
  • HW8:Text mining
  • HW9:Reinforcement learning
  • TP: Capstone project

 

:Reinforcement learning
  • TP: Capstone project
  •  

    Module Topics (with tutorials) Duration (Total=54 hrs.)
    Module 1
    Introduction to safety Analytics
    (i) Introduction & scope of the subject (1 hour)
    (ii) Concept review: basic probability & statistics (2 hours)
    (iii) Concept review: decision theory (2 hours)
    (iv) Concept review: basics of machine learning (2 hours)
    7 hours
    Module 2
    Safety data: Getting & cleaning data, data visualization & exploration
    (i) Types, sources and collection of safety data (1 hour)
    (ii) Pre-processing (normalization, scaling, binning, feature extraction) (3 hours)
    (iii) Dimensionality reduction (feature selection, PCA, SVD) (4 hours)
    8 hours
    Module 3
    Safety performance evaluation and monitoring
    (i) Key performance indicators and their measurements (1 hour) (ii) Control charts and safety capability analysis (2 hours)
    (iii) Multivariate charts (2 hours)
    5 hours
    Module 4
    (Safety predictive models)
    Predictive analytics: Supervised & unsupervised learning algorithms
    (i) Generalized linear models (log-linear, logistic regression and multinomial logit models) (4 hours)
    (ii) Linear discriminant analysis (3 hours)
    (iii) Support vector machines (3 hours)
    (iv) Association rule mining (1 hours)
    (v) Decision trees including CART (4 hours)
    (vi) Ensemble & boosting methods (4 hours)
    (vii) Cross-validation (2 hours)
    (viii) Cluster analysis (hierarchical and k-means) (3 hours)
    24 hours
    Module 5
    Text mining
    (i) Tokenization, lemmatization, part-of-speech tagging, parsing (1 hour)
    (ii) Information retrieval using nearest neighbourhood method and similarity measures (2 hours)
    (iii) Finding structure using k-means cluster analysis (1 hour)
    4 hours
    Module 6
    Safety related decision making: Prescriptive analytics
    (i) Statistical measures of safety programme effectiveness (1 hour)
    (ii) Decision trees revisited (1 hour)
    (iii) Reinforcement learning (4 hours)
    hours
    Module 7
    Capstone project
    All students require deciding a term project in a group of 5 students together just after completion of Module 2.  

    Instructions for Tutorials, assignment & term project:


    (i) The tutorials and assignment will have to be programmed by the students either in R-studio or in Python.
    (ii) The data & instructions for the tutorials and assignment shall be provided beforehand. The tutorial class might also be converted to home assignments.
    (iii) The problem statement & data sets of the capstone project have to be given by the students by the end of second module. The project should comprise of the applications of the machine learning algorithms taught throughout the semester in the area of safety.

    Instructions for Tutorials, assignment & term project:


    (i) The tutorials and assignment will have to be programmed by the students either in R-studio or in Python.
    (ii) The data & instructions for the tutorials and assignment shall be provided beforehand. The tutorial class might also be converted to home assignments.
    (iii) The problem statement & data sets of the capstone project have to be given by the students by the end of second module. The project should comprise of the applications of the machine learning algorithms taught throughout the semester in the area of safety.




    1. Development of automated hazard triangle for working at height.

    2. Data quality dimension and InfoQ.