Data science
Data science: is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from ocean of data.
Web scraping can be a solution to speed up the data collection process.
Instead of looking at the job site every day, you can use Python to help automate the repetitive parts of your job search.
- Web scraping example using Python [extract title, head and h1 text from a website]
from bs4 import BeautifulSoup
import requests
url = "https://flipkart.com"
req = requests.get(url)
soup = BeautifulSoup(req.text, "lxml") #lxml=html/xml html.parser
print(soup.title)
print(soup.head)
print(soup.h1)
file1=open('test.txt', 'w')
file1.write(str(soup.h1))
file1.close()
Output of the above program is :

2. Web scrap example [check web scraping is allowed or not, if status code other than 200 then site is not allowed for scraping!]
import requests
from bs4 import BeautifulSoup
r=requests.get("https://www.amazon.in")
print(r.status_code)
output:
200
3. Retrieve live Covid-19 data using web scraping and store in .csv format locally
import bs4
import pandas as pd
import requests
url = 'https://www.worldometers.info/coronavirus/country/india/'
result = requests.get(url)
soup = bs4.BeautifulSoup(result.text,'lxml')
#search for maincounter-number class
cases = soup.find_all('div' ,class_= 'maincounter-number')
# to store data
data = []
for i in cases:
span = i.find('span')
data.append(span.string)
print('Cases', '\tDeaths', '\tRecovered')
print(data, end='\t')
#dataframe to visualize
df = pd.DataFrame({"CoronaData": data})
#creating coloumns
df.index = ['TotalCases', ' Deaths', 'Recovered']
# storing into Excel
df.to_csv('Corona_Data.csv')
Output of the above code is :
[ Cases / Deaths / Recovered ]
[‘9,857,380 ‘, ‘143,055’, ‘9,357,464’]



