30+ Most Valuable GitHub Repositories For Data Science

Best Github Repositories For Data Science

Is GitHub a Social Media for Programmers, Data Scientists, Machine Learning Engineers, or Software Developers? What's your thought on this? On whichever platform you find this post, please share your thoughts. In our previous article, We've covered 100+ Most Valuable Github Repos for Machine Learning and Deep Learning. Today, In this article, We are going to do the same but specifically for Data Science. So, Without wasting a second, let's take a look at some of the best GitHub repositories for Data Science. 

Our Own Ad: Hey, How do you discover content from around the web related to Data Science? You may be searching on google or reading stuffs from different websites to newsletters to RSS feeds to any social media. You increased the diversity but also noise. It's difficult, Right? Let's fix the way you consume content. Stay up-to-date, ahead of the curve, and get smarter every day. Don't wait, Download the Insane app now! and Reinvent the way you feed your curiosity!

  • Awesome Data Science - An open source Data Science repository to learn and apply towards solving real world problems. This is a shortcut path to start studying Data Science. Just follow the steps to answer the questions, "What is Data Science and what should I study to learn Data Science?"
  • Self Taught Data Science Path - This is a path for those of you who want to complete the Data Science undergraduate curriculum on your own time, for free, with courses from the best universities in the World.
  • Data Science Python Notebooks - The Data Science IPython Notebooks repo from Donne Martin, a tech lead from Facebook, covers a wide range of popular topics and tools for technologies such as Big Data, Machine Learning, Business Analyses, Python essentials, and a handful of command-line utilities. TensorFlow, Keras, Pandas, NumPy, Spark, Amazon Web Services, Matplotlib are just some of the many tools covered by this vast repo.
  • Data Science Blogs - A curated list of 1000+ data science blogs for Beginners, Intermediate and Advanced Machine Learning and Data Science Enthusiast.
  • Theano Tutorials - Bare bones introduction to machine learning from linear regression to convolutional neural networks using Theano.
  • Data Science Cheat Sheet - A helpful 5-page data science cheatsheet to assist with exam reviews, interview prep, and anything in-between. The reader should have at least a basic understanding of statistics and linear algebra, though beginners may find this resource helpful as well.
  • Time Series Forecasting Best Practices - This repository provides examples and best practice guidelines for building forecasting solutions. The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in forecasting algorithms to build solutions and operationalize them.
  • Data Science With Python - A curated list of awesome resources for practicing data science using Python, including not only libraries, but also links to tutorials, code snippets, blog posts and talks.
  • Essential Keras - This is a directory of tutorials and open-source code repositories for working with Keras, the Python deep learning library.
  • Data Science Interviews - A curated list of 100+ Data science interview questions with answers for Beginners, Intermediate and Advanced Data Science Enthusiast.
  • Spark Internals - Notes talking about the design and implementation of Apache Spark. This series discuss the design and implementation of Apache Spark, with focuses on its design principles, execution mechanisms, system architecture and performance optimization. In addition, there's some comparisons with Hadoop MapReduce in terms of design and implementation.
  • Data Science Wiki - A wiki of Data Science, Statistics, Maths, R, Python, AI, Machine Learning, Automation, Devops, Bash, Linux Tutorials, Scripts and Datasets.

  • The Open-Source Data Science Masters - Created by the data scientist Clare Corthell, one of the founding partners of Luminant Data Science Consulting, TOSDSM is an open-source curriculum for learning Data Science. The source material of this curriculum is foundational in nature and is targeted towards beginners stepping into the world of Data Science. 
  • Eat TensorFlow 2 In 30 Days - This is a introduction reference book which is extremely friendly to human being. The lowest goal of the authors is to avoid giving up due to the difficulties, while "Don't let the readers think" is the highest target. It is different from the official documents, which is disordered and contains both tutorial and guidance with lack of systematic logic, that this book redesigns the content according to the difficulties, readers' searching habits, and the architecture of TensorFlow. 
  • Data Science Cheatsheets - A list of Data Science Cheat Sheets to rule the world. This cheats covers wide range of topics including math for data science, libraries, data visualization, big data, business science and more.
  • Data Engineering - A Knowledge Hub for Data Engineering and Machine Learning Learning Enthusiast. In this github repository, You'll find interesting articles, A structured level wise handpicked Tutorials, Projects, Papers and a lot more.
  • Apache Spark - A curated list of awesome Apache Spark packages and resources.
  • Data Science with R - This repo contains a curated list of R tutorials and packages for Data Science, NLP and Machine Learning. This also serves as a reference guide for several common data analysis tasks.
  • Awesome TensorFlow - A curated list of awesome TensorFlow experiments, libraries, and projects.
  • Data Science Question Answer - A repo for data science related questions and answers. The purpose of this repo is to help you (data science practitioners) prepare for data science related interviews and to introduce to people who don't know but want to learn some basic data science concepts.
  • Data Engineer Roadmap 2021 - This roadmap aims to give a complete picture of the modern data engineering landscape and serve as a study guide for aspiring data engineers. 
  • Free Data Science Books - This list contains free learning resources for data science and big data related concepts, techniques, and applications. Each entry provides the expected audience for the certain book (beginner, intermediate, or veteran). It may be subjective, but it provides some clue of how difficult the book is.

  • Big Data - A handpicked collection of frameworks, databases, interesting papers, books, streaming, talks, videos and other resources related to big data.
  • Dataviz -  A curated list of awesome data visualization libraries and resources.
  • Learning - Become better at data science every day
  • Data Science Resources - A trove of carefully curated resources and links (on the topics of software, platforms, language, techniques, etc.) related to data science, all in one place.
  • Geospatial - Long list of geospatial analysis tools.
  • 100 Pandas Puzzles - Since pandas is a large library with many different specialist features and functions, these exercises focus mainly on the fundamentals of manipulating data (indexing, grouping, aggregating, cleaning), making use of the core Data Frame and Series objects. Many of the exercises here are straightforward in that the solutions require no more than a few lines of code.

  • Learn Data Science - This repo contains a curated collection of learning materials for the domain in the form of IPython Notebooks. The repo covers four major topics in Data Science, which include Linear Regression, Logistic Regression, Random Forests, and K-Means Clustering, along with their respective data sets.
  • Pandas Exercises - Fed up with a ton of tutorials but no easy way to find exercises, this repo might help you. 
  • Awesome R - A curated list of awesome R packages, frameworks and softwares for Data Science as well as general R programming.
  • Data Science with Ruby - This curated list comprises awesome tutorials, libraries, information sources about various Data Science applications using the Ruby programming language.
  • Data Scientist Roadmap - One of the best resources on GitHub for getting a good insight into data science. whether you are a beginner or a mid-way data science learner you will definitely find something useful. It covers a variety of aspects like Statistics, Programming, Machine Learning, Data Visualization, NLP, and many more. It also has a vast roadmap for becoming a data scientist and links to follow the roadmap. This repo is inspired from a roadmap of data science skills by Swami Chandrasekaran.
It would be unfair to call GitHub merely a code repository and collaboration platform as it is much more than that, as you've seen in this article. Not many people know it, but GitHub is also a casino of expert-curated and peer-reviewed free resources. Today, we shared some of the best GitHub repositories for data science with you. Don't forget to Bookmark this list as it will help you in your journey and keep you updated with what's new in this field.
Data Science
September 28, 2021
0
Back to Top

Search

Contact Us


close