MeganBloemsma.com

View Original

Datacamp ‘Python Programmer’ review

“Where can I start learning about Data Science?” is a question I get at least once a week.

…and DataCamp is always one of the first resources I refer to. I had done a course or two, and when researching data science it’s one of the websites that just keeps coming back.

During my master’s degree I learned to code both R and Python. I had written my thesis in R, and thus became more proficient in it. Ever since I started working at Microsoft I’ve been coding less and less. And I wanted to get my hands dirty again in the language that seemed applicable to more than just data science: Python.

To do so I compared a lot of programs: Pluralsight, Python course by Nina Zakharenko (who I follow on Twitter), Udemy, 100 days of Code (both free versions and paid)… the possibilities felt endless.
I chose for DataCamp because I was already familiar with it, I love a practical way of learning (I need more than just theory) and I am a sucker for beautiful visuals.

Looking at the Python Programmer’s full program, I was most excited about the last couple of courses:

  • Introduction to Shell

  • Conda Essentials

  • Parallel Programming with Dask in Python

  • Software Engineering for Data Scientists in Python

  • Unit Testing for Data Science in Python

The theory of data science is known to me and I’m comfortable with it – so I wasn’t looking for a program that would highlight those. DataCamp’s program is a great variety of coding fundamentals, data science tooling and also practical knowledge (such as unit tests and software engineering principles. Something I’ve been missing in my own education).

Program overview and review

The course contains 58 hours of information, spread over 15 courses. I will go through every course and give you my opinion of it. The courses with a 💙 are the ones that I enjoyed most.

  • Introduction to Data Science in Python

Easy start, and fun to do. The story is of a cute dog that gets kidnapped and you need to find out who did it. Went through this quickly and happy.

  • Data Types for Data Science in Python

Here it gets a little more technical. Gives you an introduction to looping (which I’ve always found difficult), as well as Counter() from collections and learning how to work with dates and times. Definitely essentials!

Dry exercises with variables that all seemed very alike – which was frustrating. No explanation of why you use the things you use (or how exactly), and the hints are not useful.
Passed, but with minimal amount of XP due to the confusing exercises. Left feeling frustrated and unmotivated. Sorry, Jason!

  • Data Manipulation with pandas

Whenever I work with ‘pandas’, all I hear is this song:

Pace is much slower compared to the last one. This is good thing: as it really takes you through pandas so you actually know how to use it. Unfortunately not all parts in the course are useful, such as indexing. They even state themselves it’s not useful (but it is good to see it so you can read other people’s code).

This is also the first introduction to visualizing data which is super useful. Last part is about different dataframes again (double information from earlier courses).

  • 💙Python Data Science toolbox (part 1)

THIS ONE IS AMAZING.
It’s slow-paced, it repeats information enough times to actually understand what you’re and goes through functions. Hugo, you’re amazing!!

  • 💙Python Data Science toolbox (part 2)

Teaches you about iterations, how to deal with big data, and how to use chunks using ‘chunksize’. Super useful, and definitely would recommend. Tempo is good, exercises build up nice and Hugo is a hero.

  • Writing efficient Python code

This topic is something I wish I had learned during my studies. Although I’ve been taught how to make your code readable, it’s not the same as making it run efficiently. So I was looking very much forward to this course!

Logan Thomas is the host, and one of the first things he mentions is the Zen of Python by Tim Peters. “It lists 19 idioms that serve as guiding principles for any Pythonista. Python has hundreds of Python Enhancement Proposals, commonly referred to as PEP20“.
Fun fact: you can view the Zen of Python in your shell by running:

See this content in the original post

The course offers good techniques to write more efficient Python code. The placement after the DS toolbox courses was perfect, and builds on top of what you learn there.

All this being said, a few Google or Bing searches would have offered the same information (or more) – and would have cost me less time. That being said: being able to practice the techniques in the shell is an added value.

  • Working with Dates and Times in Python

Dates and Times are, unfortunately, inevitable when working with data. They are not fun. This course was as fun as it can get.

  • Regular Expressions in Python

Audio was of lesser quality than previous courses. And unfortunately the accent of the host made it difficult to follow (I say this with pain in my heart – I know how difficult it is when English is not your native language!).

The pace of the course was good though, and as boring as regular expressions are it is good to know how they work. It also touches on Regex which is useful knowledge for some fields.

  • Web Scraping in Python

Crash course into HTML (including XPath), so you can navigate HTML when web scraping with Python. From here on it moves onto CSS and ends with ‘spiders’: programs that scrawl multiple web pages and scrape data.

This course should definitely be viewed as a starting point. It will offer you tools to get started, but you will need to practice in order to get good at web scraping. Or, as the instructur Thomas Laetsch says: “some of what we see in this lesson may seem a bit gnarly. But once we master this step, we have learned the main parsing tool scrapy offers”.

If you’re a more experienced programmer looking up the ‘scrapy’ module will be more time efficient than going through this course.

  • 💙Writing Functions in Python

Writing functions is so essential to programming in Python that I’m surprised this wasn’t moved up. In my opinion this is so basic that it should be one of the first topics to touch upon. Some of the other courses touched upon it briefly (Python Data Science toolbox part 1), but a whole course digging deep is everything you should want. And I learned some new things!

Only note to this course is that it could have been longer to allow for more practising.

Introduction to Shell

Knowing how to work with shell is useful, especially if you’re managing infrastructure. It can also be a faster way of setting up some files and documents. There were no videos in this one, only exercises. And they slowly build up to more complex things in a nice way. The course offers a nice basis.

  • Conda Essentials

When you are installing python on your laptop, conda is one of the first things you’ll encounter. That’s why I was surprised that this course was pushed so far back into the program.

The course offers good and practical information. Starting from chapter two it is quite detailed and not necessarily 1:1 applicable in all situations. Felt like stretching the amount of relevant info into four chapters, whilst maybe a maximum of two would have been sufficed.
That being said, this course passed quickly.

  • Parallel Programming with Dask in Python

Python code again! After shell and condo it feels good to see some.
A lot of videos which I did not find super effective. Very dry. The exercises are very step by step though, so you’ll be able to go through to it fine.
But again: very dry.

  • 💙Software Engineering for Data Scientists in Python

In a previous article I talked about how I believe the future of data science is software engineering – so I was very happy to see this chapter added to the Python program in DataCamp.

It focuses on modularity, documentation and testing. It goes over ‘pip’ which you’ll encounter often when working in Python. What I also found exciting was that it teaches you how to create your own packages – something that I never learned to do at uni (and seems ridiculous to me now).

The only thing is that the last part of the course deals with unit testing… which is the final course after this one. If you’re solely taking this course this is a great added value, but when following the entire program this is strange.

What this course does really well is provide you with practical information that will make you a better programmer. And I love that!

  • Unit Testing for Data Science in Python

This course focuses on pytest, which is one of the most popular testing libraries in Python. The pace of the course is quite slow and the exercises are complicated – not in that what they’re asking you to do is complicated, but the variables are unnecessarily long and detailed. This makes the course boring and difficult to focus on.

It is also very 1:1 focused. We need to know the EXACT expected output to perform an unit test. Automated testing is not mentioned, which makes me wonder whether this manually testing is the best way to go. And how do we apply this to a whole production of code?
It is only in the last part of the course that this is addressed – and it is ‘solved’ by using a dataset where the value is known. Not solving the problem at all (and definitely not applicable to all models OR something that statistically would be recommendable!). And if the model is too complicated the advice is: “do as many sanity checks as you can”.
Have we really not solved this yet? Or are there better methods? Please let me know.

All in all I was expecting more from this course. It really goes over the basics of pytest but leaves me with a lot of usability questions. Unit testing in python is not super popular yet and the user friendliness is not there yet in my opinion. It feels like reinventing the wheel for Python whilst there are better (and more user friendly) solutions for other programming languages.

Overall conclusion

It’s a bit of a mess. Because the Python Programmer program consists of 15 courses, there is overlap in topics. And consequently the order of some assignments don’t make full sense.
For example: you only learn in-depth about functions in Data Science Toolbox Part 1, whilst these are already used in some code in earlier courses. In ‘Software engineering for data scientists’, at the end of the program, you are introduced to pip. Which is one of the first things you’ll use when working with python on your own computer.

Throughout the courses I’ve given a lot of feedback on assignments. Some are unclear, vague or their hints offer no help. There were about 3 instances where my answer was identical to the answer but not accepted. You can give feedback (which I did) but it feels sloppy when you’re paying this much for a website and these mistakes are in there.

Despite all this I am a real fan of DataCamp. It’s an easy to use website with great visuals, and it offers courses on any part of data science topics that you’d want to learn about.
Although the quality really differs per course I learned something from each and every one of them. Filling in pieces of code is definitely not the same as writing code yourself – but it’s a good method to learn the concepts.

Discount code

A yearly subscription on DataCamp will cost you $300 dollars. You get can also start a trial, and after that runs out they offer a 40% discount code.

By signing up for Visual Studio Dev Essential you can get 2 months free DataCamp subscription, as well as some other benefits like Azure credits. Himanshu Sharma wrote a great blog post which will take you through it step by step and I recommend using this route to see whether DataCamp is a good fit for you.