KOBV Portal

Hits per page

hits 1 - 5 | 5 hits

Sorting

Book

Statistical data cleaning with applications in R / (2018)

Loo, Mark van der, 1976-, ; Jonge, Edwin de, 1972-

Hoboken, NJ :Wiley,

add to watchlist on the watchlist

Details

UID:

almahu_BV044887489

Format: xiii, 300 Seiten : , Diagramme.

ISBN: 978-1-118-89715-7

Additional Edition: Erscheint auch als Online-Ausgabe, PDF ISBN 978-1-118-89714-0

Additional Edition: Erscheint auch als Online-Ausgabe, EPUB ISBN 978-1-118-89713-3

Language: English

Subjects: Computer Science

RVK:

ST 530

Keywords: Statistik ; Datenverarbeitung ; R

URL: http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030281553&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA

URL: Cover

URL: Inhaltsverzeichnis

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Book

Inter-library loan

HU Berlin

MPI Bildungsforschung

Book

Learning RStudio for R statistical computing : learn to effectively perform R development, statistical analysis, and reporting with the most popular R IDE (2012)

Loo, Mark van der, 1976- ; Jonge, Edwin de, 1972-

Birmingham, UK : Packt Publishing

add to watchlist on the watchlist

Details

UID:

kobvindex_INT0002975

Format: ii, 111 pages , illustrations (black and white) , 23.5 x 19 cm

ISBN: 9781782160601 , 9781782160618 , 1782160604 , 1782160612

Content: "A practical tutorial covering how to leverage RStudio functionality to effectively perform R Development, analysis, and reporting with RStudio. The book is aimed at R developers and analysts who wish to do R statistical development while taking advantage of RStudio functionality to ease their development efforts. Familiarity with R is assumed. Those who want to get started with R development using RStudio will also find the book useful. Even if you already use R but want to create reproducible statistical analysis projects or extend R with self-written packages, this book shows how to quickly achieve this using RStudio."

Content: "Learn - Learn to install and run RStudio on a desktop or a web server - Acquaint yourself with the latest and advanced R console features - Perform code editing and navigation - Learn to create advanced and interactive graphics - Effectively manage your R project and project files - Learn to build R extension packages - Perform reproducible statistical analyses within your R projects - Learn your way through getting Data is coming at us faster, dirtier, and at an ever increasing rate. The necessity to handle many, complex statistical analysis projects is hitting statisticians and analysts across the globe. This book will show you how to deal with it like never before, thus providing an edge and improving productivity. "Learning RStudio for R Statistical Computing" will teach you how to quickly and efficiently create and manage statistical analysis projects, import data, develop R scripts, and generate reports and graphics. R developers will learn about package development, coding principles, and version control with RStudio. This book will help you to learn and understand RStudio features to effectively perform statistical analysis and reporting, code editing, and R development. The book starts with a quick introduction where you will learn to load data, perform simple analysis, plot a graph, and generate automatic reports. You will then be able to explore the available features for effective coding, graphical analysis, R project management, report generation, and even project management. "Learning RStudio for R Statistical Computing" is stuffed with feature-rich and easy-to-understand examples, through step-by-step instructions helping you to quickly master the most popular IDE for R development. Features - A complete practical tutorial for RStudio, designed keeping in mind the needs of analysts and R developers alike - Step-by-step examples that apply the principles of reproducible research and good programming practices to R projects - Learn to effectively generate reports, create graphics, and perform analysis, and even build R-packages with RStudio."

Note: 1 Getting Started -- 2 Writing R Scripts and the R Console -- 3 Viewing and Plotting Data -- 4 Managing R Projects -- 5 Generating Reports -- 6 Using RStudio Effectively -- Authors

Language: English

Keywords: Handbooks and manuals

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Book

Inter-library loan

Berlin International

Online Resource

Learning RStudio for R statistical computing : learn to effectively perform R development, statistical analysis, and reporting with the most popular R IDE (2012)

Loo, Mark van der, 1976- ; Jonge, Edwin de, 1972-

Birmingham, UK :Packt Pub.,

add to watchlist on the watchlist

Details

UID:

almafu_9959236423202883

Format: 1 online resource (126 pages)

Edition: 1st ed.

ISBN: 1-68015-356-0 , 1-283-93785-9 , 1-78216-061-2

Series Statement: Community experience distilled

Content: A practical tutorial covering how to leverage RStudio functionality to effectively perform R Development, analysis, and reporting with RStudio. The book is aimed at R developers and analysts who wish to do R statistical development while taking advantage of RStudio functionality to ease their development efforts. Familiarity with R is assumed. Those who want to get started with R development using RStudio will also find the book useful. Even if you already use R but want to create reproducible statistical analysis projects or extend R with self-written packages, this book shows how to quickly

Note: Includes index. , Cover; Copyright; Credits; About the Authors; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Getting Started; RStudio at a glance; Installing RStudio; Installing R; Installing R on Windows and Mac OS X; Installing R on Linux; Building R from source; Building R using Windows; Installing RStudio; Installing RStudio Server; Installing R packages; Overview: A first R session; Keyboard shortcuts; Getting help; What if I uninstall RStudio?; Further reading; Summary; Chapter 2: Writing R Scripts and the R Console; Moving around RStudio; Features of the R console , Executing commandsCommand history; Command completion; Completion of functions and arguments; Object completion; Completion of filenames; Keyboard shortcuts for the console; Features of the source editor; Editing R scripts; Syntax highlighting; Indenting code; Commenting code; Find and replace; Folding, sectioning, and navigation; Code folding; Code navigation; Code sections; Code execution; Summary; Chapter 3: Viewing and Plotting Data; Viewing data and the object browser; Plotting; Zoom; Export; Navigation; Interactive plotting with the manipulate package; The manipulate function , Using more options of manipulateAdvanced topic: retrieving plot parameters from manipulate; Summary; Chapter 4: Managing R Projects; R projects; Creating an R project; Directory structure and file manipulations; Version control; Introduction to version control; Installing GIT or Subversion; Version control for single-person projects; GIT; Subversion; Working with a team; Further reading; Summary; Chapter 5: Generating Reports; Prerequisites for report generation; Notebook; Notebook options; Publishing a notebook; R Markdown and Rhtml; Workflow for R Markdown; An extended example , An introduction to Markdown syntaxRhtml; Code chunks; Chunk syntax and options; RMarkdown: .Rmd files; Rhtml: .Rhtml files; LaTeX: .Rnw files; RStudio's chunk support and keyboard shortcuts; LaTeX; Further reading; Summary; Chapter 6: Using RStudio Effectively; Additional features for function writing; Function extraction; Function navigation; Introduction to package writing; Prerequisites; Basic structure and workflow; Creating the package directory structure; Documenting functions with Roxygen2; Building your package with devtools; More about the devtools package; Publishing your package , SummaryIndex , English

Additional Edition: ISBN 1-78216-060-4

Language: English

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

FU Berlin

Online Resource

Statistical data cleaning with applications in R / (2018)

Loo, Mark van der, 1976- ; Jonge, Edwin de, 1972-

Hoboken, NJ :John Wiley & Sons, Inc.,

add to watchlist on the watchlist

Details

UID:

almahu_9948198673602882

Format: 1 online resource (xiii, 300 pages)

ISBN: 9781118897140 , 1118897145 , 9781118897126 , 1118897129 , 9781118897133 , 1118897137

Content: A comprehensive guide to automated statistical data cleaning The production of clean data is a complex and time-consuming process that requires both technical know-how and statistical expertise. Statistical Data Cleaning brings together a wide range of techniques for cleaning textual, numeric or categorical data. This book examines technical data cleaning methods relating to data representation and data structure. A prominent role is given to statistical data validation, data cleaning based on predefined restrictions, and data cleaning strategy. Key features: -Focuses on the automation of data cleaning methods, including both theory and applications written in R.-Enables the reader to design data cleaning processes for either one-off analytical purposes or for setting up production systems that clean data on a regular basis.-Explores statistical techniques for solving issues such as incompleteness, contradictions and outliers, integration of data cleaning components and quality monitoring.-Supported by an accompanying website featuring data and R code. This book enables data scientists and statistical analysts working with data to deepen their understanding of data cleaning as well as to upgrade their practical data cleaning skills. It can also be used as material for a course in data cleaning and analyses.

Note: Cover -- Title Page -- Copyright -- Contents -- Foreword -- About the Companion Website -- Chapter 1 Data Cleaning -- 1.1 The Statistical Value Chain -- 1.1.1 Raw Data -- 1.1.2 Input Data -- 1.1.3 Valid Data -- 1.1.4 Statistics -- 1.1.5 Output -- 1.2 Notation and Conventions Used in this Book -- Chapter 2 A Brief Introduction to R -- 2.1 R on the Command Line -- 2.1.1 Getting Help and Learning R -- 2.2 Vectors -- 2.2.1 Computing with Vectors -- 2.2.2 Arrays and Matrices -- 2.3 Data Frames -- 2.3.1 The Formula-Data Interface -- 2.3.2 Selecting Rows and Columns -- Boolean Operators -- 2.3.3 Selection with Indices -- 2.3.4 Data Frame Manipulation: The dplyr Package -- 2.4 Special Values -- 2.4.1 Missing Values -- 2.5 Getting Data into and out of R -- 2.5.1 File Paths in R -- 2.5.2 Formats Provided by Packages -- 2.5.3 Reading Data from a Database -- 2.5.4 Working with Data External to R -- 2.6 Functions -- 2.6.1 Using Functions -- 2.6.2 Writing Functions -- 2.7 Packages Used in this Book -- Chapter 3 Technical Representation of Data -- 3.1 Numeric Data -- 3.1.1 Integers -- 3.1.2 Integers in R -- 3.1.3 Real Numbers -- 3.1.4 Double Precision Numbers -- 3.1.5 The Concept of Machine Precision -- 3.1.6 Consequences of Working with Floating Point Numbers -- 3.1.7 Dealing with the Consequences -- 3.1.8 Numeric Data in R -- 3.2 Text Data -- 3.2.1 Terminology and Encodings -- 3.2.2 Unicode -- 3.2.3 Some Popular Encodings -- 3.2.4 Textual Data in R: Objects of Class Character -- 3.2.5 Encoding in R -- 3.2.6 Reading and Writing of Data with Non-Local Encoding -- 3.2.7 Detecting Encoding -- 3.2.8 Collation and Sorting -- 3.3 Times and Dates -- 3.3.1 AIT, UTC, and POSIX Seconds Since the Epcoch -- 3.3.2 Time and Date Notation -- 3.3.3 Time and Date Storage in R -- 3.3.4 Time and Date Conversion in R -- 3.3.5 Leap Days, Time Zones, and Daylight Saving Times. , 3.4 Notes on Locale Settings -- Chapter 4 Data Structure -- 4.1 Introduction -- 4.2 Tabular Data -- 4.2.1 data.frame -- 4.2.2 Databases -- 4.2.3 dplyr -- 4.3 Matrix Data -- 4.4 Time Series -- 4.5 Graph Data -- 4.6 Web Data -- 4.6.1 Web Scraping -- 4.6.2 Web API -- 4.7 Other Data -- 4.8 Tidying Tabular Data -- 4.8.1 Variable Per Column -- 4.8.2 Single Observation Stored in Multiple Tables -- Chapter 5 Cleaning Text Data -- 5.1 Character Normalization -- 5.1.1 Encoding Conversion and Unicode Normalization -- 5.1.2 Character Conversion and Transliteration -- 5.2 Pattern Matching with Regular Expressions -- 5.2.1 Basic Regular Expressions -- 5.2.2 Practical Regular Expressions -- 5.2.3 Generating Regular Expressions in R -- 5.3 Common String Processing Tasks in R -- 5.4 Approximate Text Matching -- 5.4.1 String Metrics -- 5.4.2 String Metrics and Approximate Text Matching in R -- Chapter 6 Data Validation -- 6.1 Introduction -- 6.2 A First Look at the validate Package -- 6.2.1 Quick Checks with check_that -- 6.2.2 The Basic Workflow: validator and confront -- 6.2.3 A Little Background on validate and DSLs -- 6.3 Defining Data Validation -- 6.3.1 Formal Definition of Data Validation -- 6.3.2 Operations on Validation Functions -- 6.3.3 Validation and Missing Values -- 6.3.4 Structure of Validation Functions -- 6.3.5 Demarcating Validation Rules in validate -- 6.4 A Formal Typology of Data Validation Functions -- 6.4.1 A Closer Look at Measurement -- 6.4.2 Classification of Validation Rules -- 6.5 Validating Data with the validate Package -- 6.5.1 Validation Rules in the Console and the validator Object -- 6.5.2 Validating in the Pipeline -- 6.5.3 Raising Errors or Warnings -- 6.5.4 Tolerance for Testing Linear Equalities -- 6.5.5 Setting and Resetting Options -- 6.5.6 Importing and Exporting Validation Rules from and to File. , 6.5.7 Checking Variable Types and Metadata -- 6.5.8 Checking Value Ranges and Code Lists -- 6.5.9 Checking In-Record Consistency Rules -- 6.5.10 Checking Cross-Record Validation Rules -- 6.5.11 Checking Functional Dependencies -- 6.5.12 Cross-Dataset Validation -- 6.5.13 Macros, Variable Groups, Keys -- 6.5.14 Analyzing Output: validation Objects -- 6.5.15 Output Dimensionality and Output Selection -- 6.5.15 Exercises for Section -- Chapter 7 Localizing Errors in Data Records -- 7.1 Error Localization -- 7.2 Error Localization with R -- 7.2.1 The Errorlocate Package -- 7.3 Error Localization as MIP-Problem -- 7.3.1 Error Localization and Mixed-Integer Programming -- 7.3.2 Linear Restrictions -- 7.3.3 Categorical Restrictions -- 7.3.4 Mixed-Type Restrictions -- 7.4 Numerical Stability Issues -- 7.4.1 A Short Overview of MIP Solving -- 7.4.2 Scaling Numerical Records -- 7.4.3 Setting Numerical Threshold Values -- 7.5 Practical Issues -- 7.5.1 Setting Reliability Weights -- 7.5.2 Simplifying Conditional Validation Rules -- 7.6 Conclusion -- Chapter 8 Rule Set Maintenance and Simplification -- 8.1 Quality of Validation Rules -- 8.1.1 Completeness -- 8.1.2 Superfluous Rules and Infeasibility -- 8.2 Rules in the Language of Logic -- 8.2.1 Using Logic to Rewrite Rules -- 8.3 Rule Set Issues -- 8.3.1 Infeasible Rule Set -- 8.3.2 Fixed Value -- 8.3.3 Redundant Rule -- 8.3.4 Nonrelaxing Clause -- 8.3.5 Nonconstraining Clause -- 8.4 Detection and Simplification Procedure -- 8.4.1 Mixed-Integer Programming -- 8.4.2 Detecting Feasibility -- 8.4.3 Finding Rules Causing Infeasibility -- 8.4.4 Detecting Conflicting Rules -- 8.4.5 Detect Partial Infeasibility -- 8.4.6 Detect Fixed Values -- 8.4.7 Detect Nonrelaxing Clauses -- 8.4.8 Detect Nonconstraining Clauses -- 8.4.9 Detect Redundant Rules -- 8.5 Conclusion. , Chapter 9 Methods Based on Models for Domain Knowledge -- 9.1 Correction with Data Modifying Rules -- 9.1.1 Modifying Functions -- 9.1.2 A Class of Modifying Functions on Numerical Data -- 9.1.2 Exercises for Section -- 9.2 Rule-Based Correction with dcmodify -- 9.2.1 Reading Rules from File -- 9.2.2 Modifying Rule Syntax -- 9.2.3 Missing Values -- 9.2.4 Sequential and Sequence-Independent Execution -- 9.2.5 Options Settings Management -- 9.3 Deductive Correction -- 9.3.1 Correcting Typing Errors in Numeric Data -- 9.3.1 Exercises for Section -- 9.3.2 Deductive Imputation Using Linear Restrictions -- Chapter 10 Imputation and Adjustment -- 10.1 Missing Data -- 10.1.1 Missing Data Mechanisms -- 10.1.2 Visualizing and Testing for Patterns in Missing Data Using R -- 10.2 Model-Based Imputation -- 10.3 Model-Based Imputation in R -- 10.3.1 Specifying Imputation Methods with simputation -- 10.3.2 Linear Regression-Based Imputation -- 10.3.3 M-Estimation -- 10.3.4 Lasso, Ridge, and Elasticnet Regression -- 10.3.5 Classification and Regression Trees -- 10.3.6 Random Forest -- 10.4 Donor Imputation with R -- 10.4.1 Random and Sequential Hot Deck Imputation -- 10.4.2 k Nearest Neighbors and Predictive Mean Matching -- 10.5 Other Methods in the simputation Package -- 10.6 Imputation Based on the EM Algorithm -- 10.6.1 The EM Algorithm -- 10.6.2 EM Imputation Assuming the Multivariate Normal Distribution -- 10.7 Sampling Variance under Imputation -- 10.8 Multiple Imputations -- 10.8.1 Multiple Imputation Based on the EM Algorithm -- 10.8.2 The Amelia Package -- 10.8.3 Multivariate Imputation with Chained Equations (Mice) -- 10.8.4 Imputation with the mice Package -- 10.9 Analytic Approaches to Estimate Variance of Imputation -- 10.9.1 Imputation as Part of the Estimator -- 10.10 Choosing an Imputation Method -- 10.11 Constraint Value Adjustment. , 10.11.1 Formal Description -- 10.11.2 Application to Imputed Data -- 10.11.3 Adjusting Imputed Values with the rspa Package -- Chapter 11 Example: A Small Data-Cleaning System -- 11.1 Setup -- 11.1.1 Deterministic Methods -- 11.1.2 Error Localization -- 11.1.3 Imputation -- 11.1.4 Adjusting Imputed Data -- 11.2 Monitoring Changes in Data -- 11.2.1 Data Diff (Daff) -- 11.2.2 Summarizing Cell Changes -- 11.2.3 Summarizing Changes in Conformance to Validation Rules -- 11.2.4 Track Changes in Data Automatically with lumberjack -- 11.3 Integration and Automation -- 11.3.1 Using RScript -- 11.3.2 The docopt Package -- 11.3.3 Automated Data Cleaning -- References -- Index -- EULA.

Additional Edition: Print version: Loo, Mark van der, 1976- Statistical data cleaning with applications in R. Hoboken, NJ : John Wiley & Sons, Inc., 2018 ISBN 9781118897157

Language: English

Keywords: Electronic books.

URL: https://onlinelibrary.wiley.com/doi/book/10.1002/9781118897126

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

HU Berlin

Article

Regionale statistieken 2009 : het CBS in uw buurt (2009)

Beeckmann, Duncan [VerfasserIn] ; Houwelingen, Caroline van [VerfasserIn] ; Jonge, Edwin de [VerfasserIn]

Show associated volumes

add to watchlist on the watchlist

Details

UID:

gbv_1747039705

Format: 5 Illustrationen

ISSN: 1572-5464

Note: Dutch

In: Geo-Info, Deurne : GIN, 2003, 6(2009), 12, Seite 36-40, 1572-5464

In: volume:6

In: year:2009

In: number:12

In: pages:36-40

Language: Dutch

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Article

hits 1 - 5 | 5 hits

Nothing or not found what you are looking for? Please check your search query or use the Interlibrary Loan Search.

Kooperativer Bibliotheksverbund

Berlin Brandenburg