Tools and Data to Experiment with Machine Learning19 Oct 2021 / 3 minutes to read Elena Daehnhardt |
Introduction
In this post, I write about tools, web platforms and Data to Experiment with Machine Learning.
Libraries and APIs
With a focus on Python libraries, I want to mention scikit-learn.org, TensorFlow by Google, PyThorch by Facebook, and Keras (API), the most mature tools providing the Machine Learning algorithms. Keras is an API presenting an easier usage of other libraries such as TensorFlow or Aesara (former Theano). PyTorch is also more user-friendly compared to TensorFlow. Nevertheless, TensorFlow seems to be more complicated to use; it is more mature and has a larger community and better support.
Python open-source library scikit-learn provides a comprehensive selection of machine learning techniques (regression, classification, clustering), feature selection, metrics, preprocessing, and other functionality. At this moment, Scikit-learn, is lacking deep learning functionality; however, we can use TensorFlow with the Scikit Flow wrapper for creating neural networks using the Scikit-learn approach.
XGBoost library is another option for applications requiring multicore parallelism. XGBoost is Extreme Gradient Boosting, using boosted trees to build regression, classification, ranking, and other predictive models.
Platforms
To learn how Machine Learning works in practice with the Worldwide community, I recommend kaggle.com as one of the first steps in ML experimentation. Kaggle has loads of posts, discussions, shared code, datasets, and loads competitions on different topics, from simple regression to working with visual media. Kaggle also has courses in Python, Machine Learning, data manipulation and visualisation, SQL, and others.
Openml.org is a Machine Learning platform created to share machine learning experiments and datasets to facilitate reproducibility in research. OpenML contributors also provide API to access ML libraries such as scikit-learn [1].
Datasets
It is essential to mention that datasets on these websites are already preprocessed and ready to experiment. For real-life training purposes, It is good to try out your own data collection, for instance, web scrapping, or to use publicly available APIs. For example, Twitter streaming API is a good source of real-life data. I have shared the tweets collection code in my GitHub repository if you like to experiment with Twitter data.
Did you like this post? Please let me know if you have any comments or suggestions.
Posts about Machine Learning that might be interesting for youReferences
[1] Matthias Feurer, Jan N. van Rijn, Arlind Kadra, Pieter Gijsbers, Neeratyoy Mallik, Sahithya Ravi, Andreas Mueller, Joaquin Vanschoren, Frank Hutter. OpenML-Python: an extensible Python API for OpenML. arXiv:1911.02490 [cs.LG], 2019
About Elena Elena, a PhD in Computer Science, simplifies AI concepts and helps you use machine learning.
|