您好,欢迎光临有路网!
Python网络数据采集(第2版影印版 英文版)
QQ咨询:
有路璐璐:

Python网络数据采集(第2版影印版 英文版)

  • 作者:[美] 瑞安·米切尔
  • 出版社:东南大学出版社
  • ISBN:9787564179779
  • 出版日期:2018年11月01日
  • 页数:288
  • 定价:¥89.00
  • 分享领佣金
    手机购买
    城市
    店铺名称
    店主联系方式
    店铺售价
    库存
    店铺得分/总交易量
    发布时间
    操作

    新书比价

    网站名称
    书名
    售价
    优惠
    操作

    图书详情

    内容提要
    如果编程是魔法,那么网络数据采集肯定就是某种巫术。编写一个简单的自动化程序,你就可以查询Web服务器,请求数据,解析数据以提取所需的信息。这本实用书籍的扩充版不但介绍了网络数据采集,更是从现代网络中抓取几乎各类数据的综合指南。
    《Python网络数据采集(第2版影印版 英文版)》**部分侧重于网络数据采集机制:使用Python向Web服务器请求信息,对服务器响应信息做基本的处理,自动与站点展开交互。第二部分探讨了各种更具体的工具和应用程序,以应对你可能遇到的任何网络数据采集场景。
    目录
    Preface

    Part I. Building Scrapers
    1. Your First Web Scraper
    Connecting
    An Introduction to BeautifulSoup
    Installing BeautifulSoup
    Running BeautifulSoup
    Connecting Reliably and Handling Exceptions
    2. Advanced HTML Parsing
    You Don't Always Need a Hammer
    Another Serving of BeautifulSoup
    findo and findallo with BeautifulSoup
    Other BeautifulSoup Objects
    Navigating Trees
    Regular Expressions
    Regular Expressions and BeautifulSoup
    Accessing Attributes
    Lambda Expressions
    3. Writing Web Crawlers
    Traversing a Single Domain
    Crawling an Entire Site
    Collecting Data Across an Entire Site
    Crawling Across the Internet
    4. Web Crawling Models
    Planning and Defining Objects
    Dealing with Different Website Layouts
    Structuring Crawlers
    Crawling Sites Through Search
    Crawling Sites Through Links
    Crawling Multiple Page Types
    Thinking About Web Crawler Models
    5. Scrapy
    Installing Scrapy
    Initializing a New Spider
    Writing a Simple Scraper
    Spidering with Rules
    Creating Items
    Outputting Items
    The Item Pipeline
    Logging with Scrapy
    More Resources
    6. St0ring Data
    Media Files
    Storing Data to CSV
    MySQL
    Installing MySQL
    Some Basic Commands
    Integrating with Python
    Database Techniques and Good Practice
    "Six Degrees" in MySQL
    Email

    Part II. Advanced Scraping
    7. Reading Documents
    Document Encoding
    Text
    Text Encoding and the Global Internet
    CSV
    Reading CSV Files
    PDF
    Microsoft Word and .docx
    8. Cleaning Your Dirty Data
    Cleaning in Code
    Data Normalization
    Cleaning After the Fact
    OpenRefine
    9. Reading and Writing Natural Languages
    Summarizing Data
    Markov Models
    Six Degrees of Wikipedia: Conclusion
    Natural Language Toolkit
    Installation and Setup
    Statistical Analysis with NLTK
    Lexicographical Analysis with NLTK
    Additional Resources
    10. Crawling Through Forms and Logins
    Python Requests Library
    Submitting a Basic Form
    Radio Buttons, Checkboxes, and Other Inputs
    Submitting Files and Images
    Handling Logins and Cookies
    HTTP Basic Access Authentication
    Other Form Problems
    11. Scraping JavaScript
    A Brief Introduction to JavaScript
    Common JavaScript Libraries
    Ajax and Dynamic HTML
    Executing JavaScript in Python with Selenium
    Additional Selenium Webdrivers
    Handling Redirects
    A Final Note on JavaScript
    12. Crawling Through APIs
    A Brief Introduction to APIs
    HTTP Methods and APIs
    More About API Responses
    Parsing JSON
    Undocumented APIs
    Finding Undocumented APIs
    Documenting Undocumented APIs
    Finding and Documenting APIs Automatically
    Combining APIs with Other Data Sources
    More About APIs
    13. Image Processing and Text Recognition
    Overview of Libraries
    Pillow
    Tesseract
    NumPy
    Processing Well-Formatted Text
    Adjusting Images Automatically
    Scraping Text from Images on Websites
    Reading CAPTCHAs and Training Tesseract
    Training Tesseract
    Retrieving CAPTCHAs and Submitting Solutions
    14. Avoiding Scraping Traps
    A Note on Ethics
    Looking Like a Human
    Adjust Your Headers
    Handling Cookies with JavaScript
    Timing Is Everything
    Common Form Security Features
    Hidden Input Field Values
    Avoiding Honeypots
    The Human Checklist
    15. Testing Your Website with Scrapers
    An Introduction to Testing
    What Are Unit Tests?
    Python unittest
    Testing Wikipedia
    Testing with Selenium
    Interacting with the Site
    unittest or Selenium?
    16. Web Crawling in Parallel
    Processes versus Threads
    Multithreaded Crawling
    Race Conditions and Queues
    The threading Module
    Multiprocess Crawling
    Multiprocess Crawling
    Communicating Between Processes
    Multiprocess Crawling——Another Approach
    17. Scraping Rem0tely
    Why Use Remote Servers?
    Avoiding IP Address Blocking
    Portability and Extensibility
    Tor
    PySocks
    Remote Hosting
    Running from a Website-Hosting Account
    Running from the Cloud
    Additional Resources
    18. The Legalities and Ethics of Web Scraping
    Trademarks, Copyrights, Patents, Oh My!
    Copyright Law
    Trespass to Chattels
    The Computer Fraud and Abuse Act
    robots.txt and Terms of Service
    Three Web Scrapers
    eBay versus Bidder's Edge and Trespass to Chattels
    United States v. Auernheimer and The Computer Fraud and Abuse Act
    Field v. Google: Copyright and robots.txt
    Moving Forward
    Index

    与描述相符

    100

    北京 天津 河北 山西 内蒙古 辽宁 吉林 黑龙江 上海 江苏 浙江 安徽 福建 江西 山东 河南 湖北 湖南 广东 广西 海南 重庆 四川 贵州 云南 西藏 陕西 甘肃 青海 宁夏 新疆 台湾 香港 澳门 海外