robots.txt 파일이 무엇인가요?

웹사이트의 정보를 수집하는 'search engine crawlers' 들에게 어떤 '페이지'나 '파일'을 허용/금지 하는지 알려주는 역할을 하는 web standard 파일입니다.
해당 파일은 host의 root에 위치해 있어야 합니다. (보통 public 폴더 안 root directory에 위치한다)

robots.txt 파일 예시

//robots.txt

# 모든 크롤러 /accounts 경로 접근 금지
User-agent: *
Disallow: /accounts

# 모든 크롤러에게 공개 (Default 값이며 사실상 명시하지 않아도 된다)
User-agent: *
Allow: /

# 사이트 맵
Sitemap: <http://www.example.com/sitemap.xml>

Sitemap

웹사이트에서 사용하는 url을 가르키며 새 컨텐츠 업데이트시 구글은 detect를 쉽게 할수 있으며 크롤링을 빠르게 해갈 수 있게 해준다.

1. 사이트의 규모가 블로그 같은 static site 라면 public directory에 'sitemap.xml' 파일을 만드는것이 좋다.

<!-- public/sitemap.xml -->
   <xml version="1.0" encoding="UTF-8">
   <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
     <url>
       <loc>http://www.example.com/foo</loc>
       <lastmod>2021-06-01</lastmod>
     </url>
   </urlset>
   </xml>

2. getServerSideProps를 사용할 때

(요청이 있을때 XML sitemap 을 생성한다.)

//pages/sitemap.xml.js
const EXTERNAL_DATA_URL = 'https://jsonplaceholder.typicode.com/posts'

function generateSiteMap(posts) {
  return `<?xml version="1.0" encoding="UTF-8"?>
   <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
     <!--We manually set the two URLs we know already-->
     <url>
       <loc>https://jsonplaceholder.typicode.com</loc>
     </url>
     <url>
       <loc>https://jsonplaceholder.typicode.com/guide</loc>
     </url>
     ${posts
       .map(({ id }) => {
         return `
       <url>
           <loc>${`${EXTERNAL_DATA_URL}/${id}`}</loc>
       </url>
     `
       })
       .join('')}
   </urlset>
 `
}

function SiteMap() {
  // getServerSideProps will do the heavy lifting
}

export async function getServerSideProps({ res }) {
  // We make an API call to gather the URLs for our site
  const request = await fetch(EXTERNAL_DATA_URL)
  const posts = await request.json()

  // We generate the XML sitemap with the posts data
  const sitemap = generateSiteMap(posts)

  res.setHeader('Content-Type', 'text/xml')
  // we send the XML to the browser
  res.write(sitemap)
  res.end()

  return {
    props: {}
  }
}

export default SiteMap

Sitemap이 필요한 경우와 그렇지 않은 경우

필요한 경우
- site가 매우 거대할 때
  - 웹 크롤러가 다시한번 재 확인하게 만들 수 있다.
- site가 새로 만들어졌거나 외부링크가 적을 때
  - 웹 크롤러는 외부링크를 통해 들어오기 때문에 외부링크가 많이 있을수록 노출이 많다
- 미디어 컨텐트가 많을 때
  - 미디어 컨텐츠에 대한 추가적인 정보를 사이트맵을 통해 크롤링 해갈 수 있다.
필요 없는 경우
- website의 규모가 작을 때
  - 500 pages 보다 적을땐 웹 크롤러로 충분하다.
- 미디어 컨텐츠와 뉴스 컨텐츠가 적을때
  - 비디오, 이미지, 뉴스 기사등의 정보를 수집해가는 것을 도와주는데 이런 컨텐츠가 적을수록 필요성은 크지 않다.

references

Create and submit a robots.txt file | Google Search Central

robots.txt 파일 만들기 및 제출 | Google 검색 센터 | Google Developers

robots.txt 파일은 사이트의 루트에 위치합니다. robots.txt 파일을 만들고 예를 확인하며 robots.txt 규칙을 확인하는 방법을 알아보세요.

developers.google.com

What Is a Sitemap | Google Search Central | Google Developers

사이트맵이란 무엇인가요? | Google 검색 센터 | Google Developers

사이트맵은 Google에서 사이트를 더 지능적으로 크롤링할 수 있도록 정보를 제공합니다. 사이트맵의 작동 방식을 알아보고 필요한지 결정하세요.

developers.google.com

저작자표시

'Web > Next.js' 카테고리의 다른 글

Core Web Vitals (0)	2021.12.13
Rendering 전략 (4)	2021.12.12
Metadata (0)	2021.12.11

개발자 폴우정킴

Robots.tsx

robots.txt 파일이 무엇인가요?

Sitemap

Sitemap이 필요한 경우와 그렇지 않은 경우

references

'Web > Next.js' 카테고리의 다른 글

댓글

티스토리툴바

Robots.tsx

robots.txt 파일이 무엇인가요?

Sitemap

Sitemap이 필요한 경우와 그렇지 않은 경우

references

'Web > Next.js' 카테고리의 다른 글

관련글

댓글

티스토리툴바