30+ Best GitHub Repositories For Data Science

In our previous article, We’ve covered 100+ Most Valuable Github Repos for Machine Learning and Deep Learning. Today, In this article, We are going to do the same but specifically for Data Science. So, Without wasting a second, let’s take a look at some of the Best Repositories And Open Source GitHub Projects for Data Science.

Top 10 Github Repositories For Data Science From The List Are

  • Data Science Python Notebooks
  • 1000+ Data Science Blogs
  • Top Data Science Cheatsheets
  • Data Science With Python
  • 100+ Data Science Interviews
  • Data Science Roadmap 2023 (From GitHub)
  • Awesome R
  • Eat TensorFlow 2 In 30 Days
  • Data Engineering Roadmap 2023
  • Awesome Python Github Repos
Best Github Repositories for Data Science

Awesome Data Science

An open source Data Science repository to learn and apply towards solving real world problems. This is a shortcut path to start studying Data Science. Just follow the steps to answer the questions, “What is Data Science and what should I study to learn Data Science?” If you’re a beginner then this is the most recommended and one of the best data science github repositories for you.

Self Taught Data Science Path

This is a path for those of you who want to complete the Data Science undergraduate curriculum on your own time, for free, with courses from the best universities in the World.

Data Science Python Notebooks

The Data Science IPython Notebooks repo from Donne Martin, a tech lead from Facebook, covers a wide range of popular topics and tools for technologies such as Big Data, Machine Learning, Business Analyses, Python essentials, and a handful of command-line utilities. TensorFlow, Keras, Pandas, NumPy, Spark, Amazon Web Services, Matplotlib are just some of the many tools covered by this vast repo.

1000+ Data Science Blogs

A curated list of 1000+ data science blogs for Beginners, Intermediate and Advanced Machine Learning and Data Science Enthusiast.

Theano Tutorials

Bare bones introduction to machine learning from linear regression to convolutional neural networks using Theano.

Data Science Cheat Sheet

A helpful 5-page data science cheatsheet to assist with exam reviews, interview prep, and anything in-between. The reader should have at least a basic understanding of statistics and linear algebra, though beginners may find this resource helpful as well.

Time Series Forecasting Best Practices

This repository provides examples and best practice guidelines for building forecasting solutions. The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in forecasting algorithms to build solutions and operationalize them.

Data Science With Python

A curated list of awesome resources for practicing data science using Python, including not only libraries, but also links to tutorials, code snippets, blog posts and talks.

Essential Keras

This is a directory of tutorials and open-source code repositories for working with Keras, the Python deep learning library.

100+ Data Science Interviews

A curated list of 100+ Data science interview questions with answers for Beginners, Intermediate and Advanced Data Science Enthusiast.

Recommended Stories:

Spark Internals

Notes talking about the design and implementation of Apache Spark. This series discuss the design and implementation of Apache Spark, with focuses on its design principles, execution mechanisms, system architecture and performance optimization. In addition, there’s some comparisons with Hadoop MapReduce in terms of design and implementation.

The Open-Source Data Science Masters

Created by the data scientist Clare Corthell, one of the founding partners of Luminant Data Science Consulting, TOSDSM is an open-source curriculum for learning Data Science. The source material of this curriculum is foundational in nature and is targeted towards beginners stepping into the world of Data Science.

Eat TensorFlow 2 In 30 Days

This is a introduction reference book which is extremely friendly to human being. The lowest goal of the authors is to avoid giving up due to the difficulties, while “Don’t let the readers think” is the highest target. It is different from the official documents, which is disordered and contains both tutorial and guidance with lack of systematic logic, that this book redesigns the content according to the difficulties, readers’ searching habits, and the architecture of TensorFlow.

Top Latest Data Science Cheatsheets

A list of Data Science Cheat Sheets to rule the world. This cheats covers wide range of topics including math for data science, libraries, data visualization, big data, business science and more.

Awesome Python

A curated list of awesome Python frameworks, libraries, software and resources like Newsletters, Podcasts, Websites, Books and more for Machine Learning, Data Science, Computer Vision, Deep Learning, Natural language Processing, Data Analysis, Data Visualization, etc.

Data Engineering

A Knowledge Hub for Data Engineering and Machine Learning Learning Enthusiast. In this github repository, You’ll find interesting articles, A structured level wise handpicked Tutorials, Projects, Papers and a lot more.

Apache Spark

A curated list of awesome Apache Spark packages and resources.
Data Science with R – This repo contains a curated list of R tutorials and packages for Data Science, NLP and Machine Learning. This also serves as a reference guide for several common data analysis tasks.

Awesome TensorFlow

A curated list of awesome TensorFlow experiments, libraries, and projects.

Data Science Question Answer

A repo for data science related questions and answers. The purpose of this repo is to help you (data science practitioners) prepare for data science related interviews and to introduce to people who don’t know but want to learn some basic data science concepts.

Data Engineer Roadmap 2023

This roadmap aims to give a complete picture of the modern data engineering landscape and serve as a study guide for aspiring data engineers.

Free Data Science Books

This list contains free learning resources for data science and big data related concepts, techniques, and applications. Each entry provides the expected audience for the certain book (beginner, intermediate, or veteran). It may be subjective, but it provides some clue of how difficult the book is.

Big Data

A handpicked collection of frameworks, databases, interesting papers, books, streaming, talks, videos and other resources related to big data.

How To Become A Data Engineer

In this github repos, you’ll find a list of useful resources to learn Data Engineering from scratch.


A curated list of awesome data visualization libraries and resources.
Learning – Become better at data science every day

Best Data Science Resources

A trove of carefully curated resources and links (on the topics of software, platforms, language, techniques, etc.) related to data science, all in one place.


Long list of geospatial analysis tools.

100 Pandas Puzzles

Since pandas is a large library with many different specialist features and functions, these exercises focus mainly on the fundamentals of manipulating data (indexing, grouping, aggregating, cleaning), making use of the core Data Frame and Series objects. Many of the exercises here are straightforward in that the solutions require no more than a few lines of code.

Learn Data Science for Free

This repo contains a curated collection of learning materials for the domain in the form of IPython Notebooks. The repo covers four major topics in Data Science, which include Linear Regression, Logistic Regression, Random Forests, and K-Means Clustering, along with their respective data sets.

Pandas Exercises

This repo is fed up with a ton of tutorials but no easy way to find exercises, this repo might help you.

Awesome R

R is one of the programming language that data scientist used as an alternative of Python. So, if you’re using R as your default programming language for data science then you should check this repo. This repo is filled with awesome R packages, frameworks and softwares for Data Science as well as general R programming.

Data Science with Ruby

This curated list comprises awesome tutorials, libraries, information sources about various Data Science applications using the Ruby programming language.

Data Scientist Roadmap

So, These are one of the best resources on GitHub for getting a good insight into data science. whether you are a beginner or a mid-way data science learner you will definitely find something useful.

It covers a variety of aspects like Statistics, Programming, Machine Learning, Data Visualization, NLP, and many more. It also has a vast roadmap for becoming a data scientist and links to follow the roadmap. This repo is inspired from a roadmap of data science skills by Swami Chandrasekaran.

It would be unfair to call GitHub merely a code repository and collaboration platform as it is much more than that, as you’ve seen in this article. Not many people know it, but GitHub is also a casino of expert-curated and peer-reviewed free resources. Today, we shared some of the top open source GitHub projects for data science with python and best github repositories for data science with you. Don’t forget to Bookmark this list as it will help you in your journey and keep you updated with what’s new in this field.

Help Someone By Sharing This Article