Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20

Python for Data Analysis

Data Wrangling with pandas, NumPy, and Jupyter

Paperback Engels 2022 9781098104030
Voormalig top 100Totaal 9 dagen
Verkooppositie 2229Hoogste positie: 61
Verwachte levertijd ongeveer 8 werkdagen

Samenvatting

Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process.

Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.

- Use the IPython shell and Jupyter notebook for exploratory computing
- Learn basic and advanced features in NumPy (Numerical Python)
- Get started with data analysis tools in the pandas library
- Use flexible tools to load, clean, transform, merge, and reshape data
- Create informative visualizations with matplotlib
- Apply the pandas groupby facility to slice, dice, and summarize datasets
- Analyze and manipulate regular and irregular time series data
- Learn how to solve real-world data analysis problems with thorough, detailed examples

Specificaties

ISBN13:9781098104030
Taal:Engels
Bindwijze:paperback
Aantal pagina's:550
Uitgever:O'Reilly
Druk:3
Verschijningsdatum:26-8-2022
Hoofdrubriek:IT-management / ICT
ISSN:

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Geef uw waardering

Zeer goed Goed Voldoende Matig Slecht

Over Wes McKinney

Wes McKinney is CTO and Cofounder of Lambda Foundry, Inc. From 2010 to 2012, he served as a Python consultant to hedge funds and banks while developing pandas, a widely used Python data analysis library. From 2007 to 2010, he researched global macro and credit trading strategies at AQR Capital Management. He graduated from MIT with an S.B. in Mathematics. He is on leave from the Duke University Ph.D program in Statistics.

Andere boeken door Wes McKinney

Inhoudsopgave

Preface
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
In Memoriam: John D. Hunter (1968–2012)
Acknowledgments for the Third Edition (2022)
Acknowledgments for the Second Edition (2017)
Acknowledgments for the First Edition (2012)

Preliminaries
1.1 What Is This Book About?
What Kinds of Data?
1.2 Why Python for Data Analysis?
Python as Glue
Solving the “Two-Language” Problem
Why Not Python?
1.3 Essential Python Libraries
NumPy
pandas
matplotlib
IPython and Jupyter
SciPy
scikit-learn
statsmodels
Other Packages
1.4 Installation and Setup
Miniconda on Windows
GNU/Linux
Miniconda on macOS
Installing Necessary Packages
Integrated Development Environments and Text Editors
1.5 Community and Conferences
1.6 Navigating This Book
Code Examples
Data for Examples
Import Conventions

Python Language Basics, IPython, and Jupyter Notebooks
2.1 The Python Interpreter
2.2 IPython Basics
Running the IPython Shell
Running the Jupyter Notebook
Tab Completion
Introspection
2.3 Python Language Basics
Language Semantics
Scalar Types
Control Flow
2.4 Conclusion

Built-In Data Structures, Functions, and Files
3.1 Data Structures and Sequences
Tuple
List
Dictionary
Set
Built-In Sequence Functions
List, Set, and Dictionary Comprehensions
3.2 Functions
Namespaces, Scope, and Local Functions
Returning Multiple Values
Functions Are Objects
Anonymous (Lambda) Functions
Generators
Errors and Exception Handling
3.3 Files and the Operating System
Bytes and Unicode with Files
3.4 Conclusion

NumPy Basics: Arrays and Vectorized Computation
4.1 The NumPy ndarray: A Multidimensional Array Object
Creating ndarrays
Data Types for ndarrays
Arithmetic with NumPy Arrays
Basic Indexing and Slicing
Boolean Indexing
Fancy Indexing
Transposing Arrays and Swapping Axes
4.2 Pseudorandom Number Generation
4.3 Universal Functions: Fast Element-Wise Array Functions
4.4 Array-Oriented Programming with Arrays
Expressing Conditional Logic as Array Operations
Mathematical and Statistical Methods
Methods for Boolean Arrays
Sorting
Unique and Other Set Logic
4.5 File Input and Output with Arrays
4.6 Linear Algebra
4.7 Example: Random Walks
Simulating Many Random Walks at Once
4.8 Conclusion

Getting Started with pandas
5.1 Introduction to pandas Data Structures
Series
DataFrame
Index Objects
5.2 Essential Functionality
Reindexing
Dropping Entries from an Axis
Indexing, Selection, and Filtering
Arithmetic and Data Alignment
Function Application and Mapping
Sorting and Ranking
Axis Indexes with Duplicate Labels
5.3 Summarizing and Computing Descriptive Statistics
Correlation and Covariance
Unique Values, Value Counts, and Membership
5.4 Conclusion
Data Loading, Storage, and File Formats
6.1 Reading and Writing Data in Text Format
Reading Text Files in Pieces
Writing Data to Text Format
Working with Other Delimited Formats
JSON Data

XML and HTML: Web Scraping
6.2 Binary Data Formats
Reading Microsoft Excel Files
Using HDF5 Format
6.3 Interacting with Web APIs
6.4 Interacting with Databases
6.5 Conclusion
Data Cleaning and Preparation
7.1 Handling Missing Data
Filtering Out Missing Data
Filling In Missing Data
7.2 Data Transformation
Removing Duplicates
Transforming Data Using a Function or Mapping
Replacing Values
Renaming Axis Indexes
Discretization and Binning
Detecting and Filtering Outliers
Permutation and Random Sampling

Computing Indicator/Dummy Variables
7.3 Extension Data Types
7.4 String Manipulation
Python Built-In String Object Methods
Regular Expressions
String Functions in pandas
7.5 Categorical Data
Background and Motivation
Categorical Extension Type in pandas
Computations with Categoricals
Categorical Methods
7.6 Conclusion
Data Wrangling: Join, Combine, and Reshape
8.1 Hierarchical Indexing
Reordering and Sorting Levels
Summary Statistics by Level
Indexing with a DataFrame’s columns
8.2 Combining and Merging Datasets
Database-Style DataFrame Joins
Merging on Index
Concatenating Along an Axis

Combining Data with Overlap
8.3 Reshaping and Pivoting
Reshaping with Hierarchical Indexing
Pivoting “Long” to “Wide” Format
Pivoting “Wide” to “Long” Format
8.4 Conclusion
Plotting and Visualization
9.1 A Brief matplotlib API Primer
Figures and Subplots
Colors, Markers, and Line Styles
Ticks, Labels, and Legends
Annotations and Drawing on a Subplot
Saving Plots to File
matplotlib Configuration
9.2 Plotting with pandas and seaborn
Line Plots
Bar Plots
Histograms and Density Plots
Scatter or Point Plots

Facet Grids and Categorical Data
9.3 Other Python Visualization Tools
9.4 Conclusion

Data Aggregation and Group Operations
10.1 How to Think About Group Operations
Iterating over Groups
Selecting a Column or Subset of Columns
Grouping with Dictionaries and Series
Grouping with Functions
Grouping by Index Levels
10.2 Data Aggregation
Column-Wise and Multiple Function Application
Returning Aggregated Data Without Row Indexes
10.3 Apply: General split-apply-combine
Suppressing the Group Keys
Quantile and Bucket Analysis
Example: Filling Missing Values with Group-Specific Values
Example: Random Sampling and Permutation
Example: Group Weighted Average and Correlation
Example: Group-Wise Linear Regression
10.4 Group Transforms and “Unwrapped” GroupBys
10.5 Pivot Tables and Cross-Tabulation
Cross-Tabulations: Crosstab
10.6 Conclusion

Time Series
11.1 Date and Time Data Types and Tools
Converting Between String and Datetime
11.2 Time Series Basics
Indexing, Selection, Subsetting
Time Series with Duplicate Indices
11.3 Date Ranges, Frequencies, and Shifting
Generating Date Ranges
Frequencies and Date Offsets
Shifting (Leading and Lagging) Data
11.4 Time Zone Handling
Time Zone Localization and Conversion
Operations with Time Zone-Aware Timestamp Objects
Operations Between Different Time Zones
11.5 Periods and Period Arithmetic
Period Frequency Conversion
Quarterly Period Frequencies
Converting Timestamps to Periods (and Back)
Creating a PeriodIndex from Arrays
11.6 Resampling and Frequency Conversion
Downsampling
Upsampling and Interpolation
Resampling with Periods
Grouped Time Resampling
11.7 Moving Window Functions
Exponentially Weighted Functions
Binary Moving Window Functions
User-Defined Moving Window Functions
11.8 Conclusion

Introduction to Modeling Libraries in Python
12.1 Interfacing Between pandas and Model Code
12.2 Creating Model Descriptions with Patsy
Data Transformations in Patsy Formulas
Categorical Data and Patsy
12.3 Introduction to statsmodels
Estimating Linear Models
Estimating Time Series Processes
12.4 Introduction to scikit-learn
12.5 Conclusion

Data Analysis Examples
13.1 Bitly Data from 1.USA.gov
Counting Time Zones in Pure Python
Counting Time Zones with pandas
13.2 MovieLens 1M Dataset
Measuring Rating Disagreement
13.3 US Baby Names 1880–2010
Analyzing Naming Trends
13.4 USDA Food Database
13.5 2012 Federal Election Commission Database
Donation Statistics by Occupation and Employer
Bucketing Donation Amounts
Donation Statistics by State
13.6 Conclusion
Advanced NumPy

A.1 ndarray Object Internals
NumPy Data Type Hierarchy
A.2 Advanced Array Manipulation
Reshaping Arrays
C Versus FORTRAN Order
Concatenating and Splitting Arrays
Repeating Elements: tile and repeat
Fancy Indexing Equivalents: take and put
A.3 Broadcasting
Broadcasting over Other Axes
Setting Array Values by Broadcasting
A.4 Advanced ufunc Usage
ufunc Instance Methods
Writing New ufuncs in Python
A.5 Structured and Record Arrays
Nested Data Types and Multidimensional Fields
Why Use Structured Arrays?
A.6 More About Sorting
Indirect Sorts: argsort and lexsort
Alternative Sort Algorithms
Partially Sorting Arrays
numpy.searchsorted: Finding Elements in a Sorted Array
A.7 Writing Fast NumPy Functions with Numba
Creating Custom numpy.ufunc Objects with Numba
A.8 Advanced Array Input and Output
Memory-Mapped Files
HDF5 and Other Array Storage Options
A.9 Performance Tips
The Importance of Contiguous Memory
More on the IPython System
B.1 Terminal Keyboard Shortcuts
B.2 About Magic Commands
The %run Command
Executing Code from the Clipboard
B.3 Using the Command History
Searching and Reusing the Command History
Input and Output Variables
B.4 Interacting with the Operating System
Shell Commands and Aliases
Directory Bookmark System
B.5 Software Development Tools
Interactive Debugger
Timing Code: %time and %timeit
Basic Profiling: %prun and %run -p
Profiling a Function Line by Line
B.6 Tips for Productive Code Development Using IPython
Reloading Module Dependencies
Code Design Tips
B.7 Advanced IPython Features
Profiles and Configuration
B.8 Conclusion

Index
About the Author

Managementboek Top 100

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Python for Data Analysis