Introduction to Data Science

1. Understand basic concepts in data science and machine learning.

2. Be able to apply these methods, using programming, to scientific problems.

This course is at basic level. It assumes some knowledge in stats and programming. We will generally limit the theory and math to a minimum.

Presentation

Visualization practice files

EDA practice files

What is a time series?

A complex dataset changing through time: daily

cycle of gene expression

How do changing environmental conditions affect a bacterial community structure?

Identifying breakpoints in a time-series

Presentation

TSA practice files

Time-Series-tutorial

Welcome for the first NVIDIA endorsed DLI Course “Fundamentals of Deep Learning” hosted by the Data Science Research Center (DSRC) of Haifa University.

This course is hosted by NVIDIA, and uses a specialized platform for the lectures and hands-on exercises. To prepare for the workshop, please do the following:

  1. Create an NVIDIA account at (http://courses.nvidia.com/join)
  2. Test your laptop (http://websocketstest.courses.nvidia.com/)
  3. Review the course datasheet to become familiar with Prerequisites, Learning Objectives, and Agenda Outline. These can be found here: Fundamentals of Deep Learning

* These instructions can also be found at https://developer.nvidia.com/dli/getready.

Note:

  1. You will get an access code to enter the course lectures and exercises.
     In order to log-in to the system, go to the link: https://courses.nvidia.com/dli-event, and enter the access code. 

 

In each 1:1 meeting, 5/55 terms will be randomly chosen, and you will need to explain their meaning in the context of what we studied in class (e.g., entropy is a general term, but we studied it specifically in the context of decision trees, and neural networks exist in brains, but they mean something else in the context of machine learning.).

EDA

mean square error

confusion matrix

support vector machine

one hot encoding

supervised learning

singular value decomposition

ROC curve

elbow method

 

Autocorrelation
(time series)

unsupervised learning

log likelihood

decision tree

inertia

 

de-trending
(time series)

classification

batch learning

random forest

dendrogram

differencing
(time series)

regression

L1/L2 regularization

logistic regression

agglomerative clustering

hidden layers (neural networks)

deep learning

k-fold cross validation

k-means

silhouette scores

pooling
(neural networks)

dimensionality reduction

Gradient descent

k-nearest neighbors

bias-variance trade-off

data augmentation
(neural networks)

PCA

grid search

gini impurity

ridge regression

convolution (neural networks)

t-SNE

under/overfitting

entropy

lasso regression

dropout (neural networks)

optimization

convex optimization

pruning

feature engineering

hyperparameters

 

Text books-

https://learningds.org/intro.html

https://github.com/jakevdp/PythonDataScienceHandbook/tree/master

Some people recommend this, though I never tried it myself-

https://developers.google.com/machine-learning

There, you can also find a long list of terms in ML-

https://developers.google.com/machine-learning/glossary#loss