Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20

Machine Learning with Python Cookbook

Practical Solutions from Preprocessing to Deep Learning

Paperback Engels 2023 9781098135720
Verwachte levertijd ongeveer 15 werkdagen


This practical guide provides more than 200 self-contained recipes to help you solve machine learning challenges you may encounter in your work. If you're comfortable with Python and its libraries, including pandas and scikit-learn, you'll be able to address specific problems, from loading data to training models and leveraging neural networks.

Each recipe in this updated edition includes code that you can copy, paste, and run with a toy dataset to ensure that it works. From there, you can adapt these recipes according to your use case or application. Recipes include a discussion that explains the solution and provides meaningful context.

Go beyond theory and concepts by learning the nuts and bolts you need to construct working machine learning applications.

You'll find recipes for:
- Vectors, matrices, and arrays
- Working with data from CSV, JSON, SQL, databases, cloud storage, and other sources
- Handling numerical and categorical data, text, images, and dates and times
- Dimensionality reduction using feature extraction or feature selection
- Model evaluation and selection
- Linear and logical regression, trees and forests, and k-nearest neighbors
- Supporting vector machines (SVM), naäve Bayes, clustering, and tree-based models
- Saving, loading, and serving trained models from multiple frameworks


Aantal pagina's:380
Hoofdrubriek:IT-management / ICT


Wees de eerste die een lezersrecensie schrijft!

Over Chris Albon

Chris Albon is data scientist with a Ph.D. in quantitative political science and a decade of experience working in statistical learning, artificial intelligence, and software engineering. He founded New Knowledge, an artificial intelligence company, and previously worked for the crisis and humanitarian non-profit, Ushahidi. Chris also founded and co-hosts of the data science podcast, Partially Derivative.

Andere boeken door Chris Albon


Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us

1. Working with Vectors, Matrices, and Arrays in NumPy
1.0. Introduction
1.1. Creating a Vector
1.2. Creating a Matrix
1.3. Creating a Sparse Matrix
1.4. Preallocating NumPy Arrays
1.5. Selecting Elements
1.6. Describing a Matrix
1.7. Applying Functions over Each Element
1.8. Finding the Maximum and Minimum Values
1.9. Calculating the Average, Variance, and Standard Deviation
1.10. Reshaping Arrays
1.11. Transposing a Vector or Matrix
1.12. Flattening a Matrix
1.13. Finding the Rank of a Matrix
1.14. Getting the Diagonal of a Matrix
1.15. Calculating the Trace of a Matrix
1.16. Calculating Dot Products
1.17. Adding and Subtracting Matrices
1.18. Multiplying Matrices
1.19. Inverting a Matrix
1.20. Generating Random Values

2. Loading Data
2.0. Introduction
2.1. Loading a Sample Dataset
2.2. Creating a Simulated Dataset
2.3. Loading a CSV File
2.4. Loading an Excel File
2.5. Loading a JSON File
2.6. Loading a Parquet File
2.7. Loading an Avro File
2.8. Querying a SQLite Database
2.9. Querying a Remote SQL Database
2.10. Loading Data from a Google Sheet
2.11. Loading Data from an S3 Bucket
2.12. Loading Unstructured Data

3. Data Wrangling
3.0. Introduction
3.1. Creating a Dataframe
3.2. Getting Information about the Data
3.3. Slicing DataFrames
3.4. Selecting Rows Based on Conditionals
3.5. Sorting Values
3.6. Replacing Values
3.7. Renaming Columns
3.8. Finding the Minimum, Maximum, Sum, Average, and Count
3.9. Finding Unique Values
3.10. Handling Missing Values
3.11. Deleting a Column
3.12. Deleting a Row
3.13. Dropping Duplicate Rows
3.14. Grouping Rows by Values
3.15. Grouping Rows by Time
3.16. Aggregating Operations and Statistics
3.17. Looping over a Column
3.18. Applying a Function over All Elements in a Column
3.19. Applying a Function to Groups
3.20. Concatenating DataFrames
3.21. Merging DataFrames

4. Handling Numerical Data
4.0. Introduction
4.1. Rescaling a Feature
4.2. Standardizing a Feature
4.3. Normalizing Observations
4.4. Generating Polynomial and Interaction Features
4.5. Transforming Features
4.6. Detecting Outliers
4.7. Handling Outliers
4.8. Discretizating Features
4.9. Grouping Observations Using Clustering
4.10. Deleting Observations with Missing Values
4.11. Imputing Missing Values

5. Handling Categorical Data
5.0. Introduction
5.1. Encoding Nominal Categorical Features
5.2. Encoding Ordinal Categorical Features
5.3. Encoding Dictionaries of Features
5.4. Imputing Missing Class Values
5.5. Handling Imbalanced Classes

6. Handling Text
6.0. Introduction
6.1. Cleaning Text
6.2. Parsing and Cleaning HTML
6.3. Removing Punctuation
6.4. Tokenizing Text
6.5. Removing Stop Words
6.6. Stemming Words
6.7. Tagging Parts of Speech
6.8. Performing Named-Entity Recognition
6.9. Encoding Text as a Bag of Words
6.10. Weighting Word Importance
6.11. Using Text Vectors to Calculate Text Similarity in a Search Query
6.12. Using a Sentiment Analysis Classifier

7. Handling Dates and Times
7.0. Introduction
7.1. Converting Strings to Dates
7.2. Handling Time Zones
7.3. Selecting Dates and Times
7.4. Breaking Up Date Data into Multiple Features
7.5. Calculating the Difference Between Dates
7.6. Encoding Days of the Week
7.7. Creating a Lagged Feature
7.8. Using Rolling Time Windows
7.9. Handling Missing Data in Time Series

8. Handling Images
8.0. Introduction
8.1. Loading Images
8.2. Saving Images
8.3. Resizing Images
8.4. Cropping Images
8.5. Blurring Images
8.6. Sharpening Images
8.7. Enhancing Contrast
8.8. Isolating Colors
8.9. Binarizing Images
8.10. Removing Backgrounds
8.11. Detecting Edges
8.12. Detecting Corners
8.13. Creating Features for Machine Learning
8.14. Encoding Color Histograms as Features
8.15. Using Pretrained Embeddings as Features
8.16. Detecting Objects with OpenCV
8.17. Classifying Images with Pytorch

9. Dimensionality Reduction Using Feature Extraction
9.0. Introduction
9.1. Reducing Features Using Principal Components
9.2. Reducing Features When Data Is Linearly Inseparable
9.3. Reducing Features by Maximizing Class Separability
9.4. Reducing Features Using Matrix Factorization
9.5. Reducing Features on Sparse Data

10. Dimensionality Reduction Using Feature Selection
10.0. Introduction
10.1. Thresholding Numerical Feature Variance
10.2. Thresholding Binary Feature Variance
10.3. Handling Highly Correlated Features
10.4. Removing Irrelevant Features for Classification
10.5. Recursively Eliminating Features

11. Model Evaluation
11.0. Introduction
11.1. Cross-Validating Models
11.2. Creating a Baseline Regression Model
11.3. Creating a Baseline Classification Model
11.4. Evaluating Binary Classifier Predictions
11.5. Evaluating Binary Classifier Thresholds
11.6. Evaluating Multiclass Classifier Predictions
11.7. Visualizing a Classifier’s Performance
11.8. Evaluating Regression Models
11.9. Evaluating Clustering Models
11.10. Creating a Custom Evaluation Metric
11.11. Visualizing the Effect of Training Set Size
11.12. Creating a Text Report of Evaluation Metrics
11.13. Visualizing the Effect of Hyperparameter Values

12. Model Selection
12.0. Introduction
12.1. Selecting the Best Models Using Exhaustive Search
12.2. Selecting the Best Models Using Randomized Search
12.3. Selecting the Best Models from Multiple Learning Algorithms
12.4. Selecting the Best Models When Preprocessing
12.5. Speeding Up Model Selection with Parallelization
12.6. Speeding Up Model Selection Using Algorithm-Specific Methods
12.7. Evaluating Performance After Model Selection

13. Linear Regression
13.0. Introduction
13.1. Fitting a Line
13.2. Handling Interactive Effects
13.3. Fitting a Nonlinear Relationship
13.4. Reducing Variance with Regularization
13.5. Reducing Features with Lasso Regression

14. Trees and Forests
14.0. Introduction
14.1. Training a Decision Tree Classifier
14.2. Training a Decision Tree Regressor
14.3. Visualizing a Decision Tree Model
14.4. Training a Random Forest Classifier
14.5. Training a Random Forest Regressor
14.6. Evaluating Random Forests with Out-of-Bag Errors
14.7. Identifying Important Features in Random Forests
14.8. Selecting Important Features in Random Forests
14.9. Handling Imbalanced Classes
14.10. Controlling Tree Size
14.11. Improving Performance Through Boosting
14.12. Training an XGBoost Model
14.13. Improving Real-Time Performance with LightGBM

15. K-Nearest Neighbors
15.0. Introduction
15.1. Finding an Observation’s Nearest Neighbors
15.2. Creating a K-Nearest Neighbors Classifier
15.3. Identifying the Best Neighborhood Size
15.4. Creating a Radius-Based Nearest Neighbors Classifier
15.5. Finding Approximate Nearest Neighbors
15.6. Evaluating Approximate Nearest Neighbors

16. Logistic Regression
16.0. Introduction
16.1. Training a Binary Classifier
16.2. Training a Multiclass Classifier
16.3. Reducing Variance Through Regularization
16.4. Training a Classifier on Very Large Data
16.5. Handling Imbalanced Classes

17. Support Vector Machines
17.0. Introduction
17.1. Training a Linear Classifier
17.2. Handling Linearly Inseparable Classes Using Kernels
17.3. Creating Predicted Probabilities
17.4. Identifying Support Vectors
17.5. Handling Imbalanced Classes

18. Naive Bayes
18.0. Introduction
18.1. Training a Classifier for Continuous Features
18.2. Training a Classifier for Discrete and Count Features
18.3. Training a Naive Bayes Classifier for Binary Features
18.4. Calibrating Predicted Probabilities

19. Clustering
19.0. Introduction
19.1. Clustering Using K-Means
19.2. Speeding Up K-Means Clustering
19.3. Clustering Using Mean Shift
19.4. Clustering Using DBSCAN
19.5. Clustering Using Hierarchical Merging

20. Tensors with PyTorch
20.0. Introduction
20.1. Creating a Tensor
20.2. Creating a Tensor from NumPy
20.3. Creating a Sparse Tensor
20.4. Selecting Elements in a Tensor
20.5. Describing a Tensor
20.6. Applying Operations to Elements
20.7. Finding the Maximum and Minimum Values
20.8. Reshaping Tensors
20.9. Transposing a Tensor
20.10. Flattening a Tensor
20.11. Calculating Dot Products
20.12. Multiplying Tensors

21. Neural Networks
21.0. Introduction
21.1. Using Autograd with PyTorch
21.2. Preprocessing Data for Neural Networks
21.3. Designing a Neural Network
21.4. Training a Binary Classifier
21.5. Training a Multiclass Classifier
21.6. Training a Regressor
21.7. Making Predictions
21.8. Visualize Training History
21.9. Reducing Overfitting with Weight Regularization
21.10. Reducing Overfitting with Early Stopping
21.11. Reducing Overfitting with Dropout
21.12. Saving Model Training Progress
21.13. Tuning Neural Networks
21.14. Visualizing Neural Networks

22. Neural Networks for Unstructured Data
22.0. Introduction
22.1. Training a Neural Network for Image Classification
22.2. Training a Neural Network for Text Classification
22.3. Fine-Tuning a Pretrained Model for Image Classification
22.4. Fine-Tuning a Pretrained Model for Text Classification

23. Saving, Loading, and Serving Trained Models
23.0. Introduction
23.1. Saving and Loading a scikit-learn Model
23.2. Saving and Loading a TensorFlow Model
23.3. Saving and Loading a PyTorch Model
23.4. Serving scikit-learn Models
23.5. Serving TensorFlow Models
23.6. Serving PyTorch Models in Seldon

About the Authors

Managementboek Top 100


Populaire producten



        Machine Learning with Python Cookbook