Introduction to Data Science
1. Understand basic concepts in data science and machine learning.
2. Be able to apply these methods, using programming, to scientific problems.
This course is at basic level. It assumes some knowledge in stats and programming. We will generally limit the theory and math to a minimum.
Presentation
Visualization practice files
EDA practice files
• What is a time series?
• A complex dataset changing through time: daily
cycle of gene expression
• How do changing environmental conditions affect a bacterial community structure?
• Identifying breakpoints in a time-series
Presentation
TSA practice files
Time-Series-tutorial
Welcome for the first NVIDIA endorsed DLI Course “Fundamentals of Deep Learning” hosted by the Data Science Research Center (DSRC) of Haifa University.
This course is hosted by NVIDIA, and uses a specialized platform for the lectures and hands-on exercises. To prepare for the workshop, please do the following:
- Create an NVIDIA account at (http://courses.nvidia.com/join)
- Test your laptop (http://websocketstest.courses.nvidia.com/)
- Review the course datasheet to become familiar with Prerequisites, Learning Objectives, and Agenda Outline. These can be found here: Fundamentals of Deep Learning
* These instructions can also be found at https://developer.nvidia.com/dli/getready.
Note:
- You will get an access code to enter the course lectures and exercises.
In order to log-in to the system, go to the link: https://courses.nvidia.com/dli-event, and enter the access code.
In each 1:1 meeting, 5/55 terms will be randomly chosen, and you will need to explain their meaning in the context of what we studied in class (e.g., entropy is a general term, but we studied it specifically in the context of decision trees, and neural networks exist in brains, but they mean something else in the context of machine learning.).
EDA | mean square error | confusion matrix | support vector machine | one hot encoding |
supervised learning | singular value decomposition | ROC curve | elbow method
| Autocorrelation |
unsupervised learning | log likelihood | decision tree | inertia
| de-trending |
classification | batch learning | random forest | dendrogram | differencing |
regression | L1/L2 regularization | logistic regression | agglomerative clustering | hidden layers (neural networks) |
deep learning | k-fold cross validation | k-means | silhouette scores | pooling |
dimensionality reduction | Gradient descent | k-nearest neighbors | bias-variance trade-off | data augmentation |
PCA | grid search | gini impurity | ridge regression | convolution (neural networks) |
t-SNE | under/overfitting | entropy | lasso regression | dropout (neural networks) |
optimization | convex optimization | pruning | feature engineering | hyperparameters |
Text books-
https://learningds.org/intro.html
https://github.com/jakevdp/PythonDataScienceHandbook/tree/master
Some people recommend this, though I never tried it myself-
https://developers.google.com/machine-learning
There, you can also find a long list of terms in ML-
https://developers.google.com/machine-learning/glossary#loss