Добро пожаловать в Scribd!

Пропустить карусель

Scribd Lefef

Загружено:

afjfasjjda

0% нашли этот документ полезным (0 голосов)

18 просмотров3 страницы

effeef

Оригинальное название

Scribd lefef

Авторское право

Доступные форматы

TXT, PDF, TXT или читайте онлайн в Scribd

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Пожаловаться на этот документ

effeef

Авторское право:

Доступные форматы

Скачайте в формате TXT, PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

0% нашли этот документ полезным (0 голосов)

18 просмотров3 страницы

Scribd Lefef

Загружено:

afjfasjjda

effeef

Авторское право:

Доступные форматы

Скачайте в формате TXT, PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

Перейти к странице

Вы находитесь на странице: 1из 3

Поиск в документе

#!

/usr/bin/env python

from bs4 import BeautifulSoup

import requests
import shutil
import sys
import argparse

def get_arguments():
parser = argparse.ArgumeasdasasntParser(
description='A Scribd-Downloader that actually works')

parser.add_argument(
'doc',
metavar='DOC',
type=str,
help='scribd document to download')
parser.add_argument(
'-i',
'--images',
help="download document made up of images",
action='store_true',
default=False)asdasd

return parser.parse_args()

# fix encoding issues in python2

def fix_encoding(query):
if sys.version_info > (3, 0):
return query
else:
return query.encode('utf-8')

def save_image(jsonp, imagename):

replacement = jsonp.replace('/pages/', '/images/').replace('jsonp', 'jpg')
response = requests.get(replacement, stream=True)

with open(imagename, 'wb') as out_file:

shutil.copyfileobj(response.raw, out_file)

def save_text(jsonp, filename):

response = requests.get(url=jsonp).text
page_no = response[11:12]

response_head = (
response).replace('window.page' + page_no + '_callback(["',
'').replace('\\n', '').replace('\\', '').replace(
'"]);', '')
soup_content = BeautifulSoup(response_head, 'html.parser')

for x in soup_content.find_all('span', {'class': 'a'}):

xtext = fix_encoding(x.get_text())
print(xtext)

extraction = xtext + '\n'

with open(filename, 'a') as feed:
feed.write(extraction)

# detect image and text

def save_content(jsonp, images, train, title):
if not jsonp == '':
if images:
imagename = title + '_' + str(train) + '.jpg'
print('Downloading image to ' + imagename)
save_image(jsonp, imagename)
else:
save_text(jsonp, (title + '.txt'))
train += 1

return train

def sanitize_title(title):
'''Remove forbidden characters from title that will prevent OS from creating
directory. (For Windows at least.)
Also change ' ' to '_' to preserve previous behavior.'''

forbidden_chars = " *\"/\<>:|"

replace_char = "_"

for ch in forbidden_chars:
title = title.replace(ch, replace_char)

return title

# the main function

def get_scribd_document(url, images):
response = requests.get(url=url).text
soup = BeautifulSoup(response, 'html.parser')

title = soup.find('title').get_text()#.replace(' ', '_')

title = sanitize_title(title) # a bit more thorough

if not images:
print('Extracting text to ' + title + '.txt\n')

print(title + '\n')

js_text = soup.find_all('script', type='text/javascript')

train = 1

for opening in js_text:

for inner_opening in opening:

portion1 = inner_opening.find('https://')

if not portion1 == -1:

portion2 = inner_opening.find('.jsonp')
jsonp = inner_opening[portion1:portion2+6]

train = save_content(jsonp, images, train, title)

def command_line():
args = get_arguments()
url = args.doc
images = args.images
get_scribd_document(url, images)

if __name__ == '__main__':

command_line()

Вам также может понравиться

Fear: Trump in the White House
От Everand
Fear: Trump in the White House
Bob Woodward
Рейтинг: 3.5 из 5 звезд
3.5/5 (738)
A Man Called Ove: A Novel
От Everand
A Man Called Ove: A Novel
Fredrik Backman
Рейтинг: 4.5 из 5 звезд
4.5/5 (4609)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
От Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Рейтинг: 3.5 из 5 звезд
3.5/5 (231)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
От Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Рейтинг: 4.5 из 5 звезд
4.5/5 (121)
Grit: The Power of Passion and Perseverance
От Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Рейтинг: 4 из 5 звезд
4/5 (588)
Yes Please
От Everand
Yes Please
Amy Poehler
Рейтинг: 4 из 5 звезд
4/5 (1891)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
От Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Рейтинг: 4.5 из 5 звезд
4.5/5 (266)
The Little Book of Hygge: Danish Secrets to Happy Living
От Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Рейтинг: 3.5 из 5 звезд
3.5/5 (400)
Never Split the Difference: Negotiating As If Your Life Depended On It
От Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Рейтинг: 4.5 из 5 звезд
4.5/5 (838)
Shoe Dog: A Memoir by the Creator of Nike
От Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Рейтинг: 4.5 из 5 звезд
4.5/5 (537)
The Emperor of All Maladies: A Biography of Cancer
От Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Рейтинг: 4.5 из 5 звезд
4.5/5 (271)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
От Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Рейтинг: 4 из 5 звезд
4/5 (5794)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
От Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Рейтинг: 3.5 из 5 звезд
3.5/5 (2259)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
От Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Рейтинг: 4.5 из 5 звезд
4.5/5 (344)
Principles: Life and Work
От Everand
Principles: Life and Work
Ray Dalio
Рейтинг: 4 из 5 звезд
4/5 (599)
Rise of ISIS: A Threat We Can't Ignore
От Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Рейтинг: 3.5 из 5 звезд
3.5/5 (137)
Team of Rivals: The Political Genius of Abraham Lincoln
От Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Рейтинг: 4.5 из 5 звезд
4.5/5 (234)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
От Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Рейтинг: 4 из 5 звезд
4/5 (1090)
John Adams
От Everand
John Adams
David McCullough
Рейтинг: 4.5 из 5 звезд
4.5/5 (2409)
The Glass Castle: A Memoir
От Everand
The Glass Castle: A Memoir
Jeannette Walls
Рейтинг: 4.5 из 5 звезд
4.5/5 (1712)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
От Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Рейтинг: 4 из 5 звезд
4/5 (895)
Her Body and Other Parties: Stories
От Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Рейтинг: 4 из 5 звезд
4/5 (821)
Sing, Unburied, Sing: A Novel
От Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Рейтинг: 4 из 5 звезд
4/5 (1103)
Angela's Ashes: A Memoir
От Everand
Angela's Ashes: A Memoir
Frank McCourt
Рейтинг: 4.5 из 5 звезд
4.5/5 (440)
Wolf Hall: A Novel
От Everand
Wolf Hall: A Novel
Hilary Mantel
Рейтинг: 4 из 5 звезд
4/5 (3811)
A Tree Grows in Brooklyn
От Everand
A Tree Grows in Brooklyn
Betty Smith
Рейтинг: 4.5 из 5 звезд
4.5/5 (1929)
The Woman in Cabin 10
От Everand
The Woman in Cabin 10
Ruth Ware
Рейтинг: 3.5 из 5 звезд
3.5/5 (2322)
The Light Between Oceans: A Novel
От Everand
The Light Between Oceans: A Novel
M.L. Stedman
Рейтинг: 4.5 из 5 звезд
4.5/5 (789)
The Constant Gardener: A Novel
От Everand
The Constant Gardener: A Novel
John le Carré
Рейтинг: 3.5 из 5 звезд
3.5/5 (104)
The Perks of Being a Wallflower
От Everand
The Perks of Being a Wallflower
Stephen Chbosky
Рейтинг: 4.5 из 5 звезд
4.5/5 (2102)
The Art of Racing in the Rain: A Novel
От Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Рейтинг: 4 из 5 звезд
4/5 (4200)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
От Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Рейтинг: 4.5 из 5 звезд
4.5/5 (474)
The Outsider: A Novel
От Everand
The Outsider: A Novel
Stephen King
Рейтинг: 4 из 5 звезд
4/5 (1839)
HTML Tutorial
Документ200 страниц
HTML Tutorial
monic_pitic
Оценок пока нет
The Unwinding: An Inner History of the New America
От Everand
The Unwinding: An Inner History of the New America
George Packer
Рейтинг: 4 из 5 звезд
4/5 (45)
The Yellow House: A Memoir (2019 National Book Award Winner)
От Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Рейтинг: 4 из 5 звезд
4/5 (98)
On Fire: The (Burning) Case for a Green New Deal
От Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Рейтинг: 4 из 5 звезд
4/5 (73)
Little Women
От Everand
Little Women
Louisa May Alcott
Рейтинг: 4 из 5 звезд
4/5 (104)
Brooklyn: A Novel
От Everand
Brooklyn: A Novel
Colm Tóibín
Рейтинг: 3.5 из 5 звезд
3.5/5 (1937)
Manhattan Beach: A Novel
От Everand
Manhattan Beach: A Novel
Jennifer Egan
Рейтинг: 3.5 из 5 звезд
3.5/5 (792)
Bad Feminist: Essays
От Everand
Bad Feminist: Essays
Roxane Gay
Рейтинг: 4 из 5 звезд
4/5 (1015)
Steve Jobs
От Everand
Steve Jobs
Walter Isaacson
Рейтинг: 4.5 из 5 звезд
4.5/5 (806)
Rakesh Yadav Class Notes Math in Hindi PDF Free Download PDF
Документ423 страницы
Rakesh Yadav Class Notes Math in Hindi PDF Free Download PDF
fake hai
100% (2)
Top 50 CSS & CSS3 Interview Questions & Answers
Документ9 страниц
Top 50 CSS & CSS3 Interview Questions & Answers
NIMESH KUMAR
Оценок пока нет
300+ Top Web Technology Lab Viva Questions and Answers PDF
Документ47 страниц
300+ Top Web Technology Lab Viva Questions and Answers PDF
PANKAJ SAHO
Оценок пока нет
Scribd L
Документ3 страницы
Scribd L
afjfasjjda
Оценок пока нет
Scribd L
Документ3 страницы
Scribd L
afjfasjjda
Оценок пока нет
Scribd L
Документ3 страницы
Scribd L
afjfasjjda
Оценок пока нет
Scribd L
Документ3 страницы
Scribd L
afjfasjjda
Оценок пока нет
Indrail V2
Документ107 страниц
Indrail V2
jikhalsaji
Оценок пока нет
HTML 5
Документ2 страницы
HTML 5
david
Оценок пока нет
GIF GIF89a: What Is... JPEG (A Definition)
Документ2 страницы
GIF GIF89a: What Is... JPEG (A Definition)
Zashila Aziz
Оценок пока нет
Id Login Tache Action Date Log Machine
Документ6 страниц
Id Login Tache Action Date Log Machine
mouke
Оценок пока нет
RouterCfm Setting
Документ10 страниц
RouterCfm Setting
Arief Saladdin
Оценок пока нет
Stack Trace
Документ9 страниц
Stack Trace
Luc
Оценок пока нет
Latex I
Документ21 страница
Latex I
DeepanshGoyal
Оценок пока нет
CSV 21
Документ15 страниц
CSV 21
Munesh Shokeen
Оценок пока нет
Limit Queue Tree Mikrotik
Документ4 страницы
Limit Queue Tree Mikrotik
Riski Ramadhan Fratama
Оценок пока нет
Angels in America A Gay PDF
Документ1 страница
Angels in America A Gay PDF
Gabriel Castillo
0% (1)
HTML Tags Complete List
Документ14 страниц
HTML Tags Complete List
Moses Gunn
Оценок пока нет
XPS Specification ECMA 388
Документ496 страниц
XPS Specification ECMA 388
shayanelhami
Оценок пока нет
IT111 Mod3C
Документ48 страниц
IT111 Mod3C
Althea Lei Delos Reyes
Оценок пока нет
Star T Tag Content End Tag: HTML - Elements
Документ155 страниц
Star T Tag Content End Tag: HTML - Elements
abinash bapi
Оценок пока нет
Manual de Filosofia by Luz Maria Edwards
Документ6 страниц
Manual de Filosofia by Luz Maria Edwards
hanstur
Оценок пока нет
Online Map Sources
Документ15 страниц
Online Map Sources
Rizky Firdaus
Оценок пока нет
Check Files For Metadata Info METADATOS
Документ5 страниц
Check Files For Metadata Info METADATOS
TATIANA NIETO LONDOÑO
Оценок пока нет
Lucrare de Laborator Nr.3: Obiect: Java Pe Calculator
Документ20 страниц
Lucrare de Laborator Nr.3: Obiect: Java Pe Calculator
Mihaela Pădure
Оценок пока нет
Anexe HG 478 2016 PDF
Документ10 страниц
Anexe HG 478 2016 PDF
pana_cristian6460
Оценок пока нет
Pricechange 10 Technique2022
Документ66 страниц
Pricechange 10 Technique2022
IPA 67
Оценок пока нет
MZ Service 1.05
Документ121 страница
MZ Service 1.05
Tramvaj Slavko
Оценок пока нет
6b37567c598a3aae3b3f2302d50c5d4ca7baa7d3cf2352ada0d115666f9cbe96
Документ2 страницы
6b37567c598a3aae3b3f2302d50c5d4ca7baa7d3cf2352ada0d115666f9cbe96
Kevin
Оценок пока нет
Game Data File List
Документ284 страницы
Game Data File List
Gerard Luke Sastrillo
Оценок пока нет
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
Документ36 страниц
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
VINAYAK CYBER
Оценок пока нет
HTML Notes Final
Документ13 страниц
HTML Notes Final
Zᴅx GAMɪɴɢ
Оценок пока нет
Unit 4 Web Programming
Документ184 страницы
Unit 4 Web Programming
JayaKrishna Dasari
Оценок пока нет