python selenium 等待元素出现_Python Selenium等待加载几个元素

本文介绍了一种用于遍历特殊结构HTML列表的方法,该方法能够处理不确定子列表存在的元素,并支持从记住的位置开始解析。通过使用CSS选择器和WebDriver,本文详细展示了如何遍历列表并处理各种元素类型。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

考虑到Mr.E.和Arran的评论,我在CSS选择器上完全遍历了列表。棘手的部分是关于我自己的列表结构和标记(更改类等),以及动态创建所需的选择器并在遍历期间将它们保存在内存中。

我通过搜索任何未加载状态的内容来处理等待几个元素的问题。您也可以使用“:nth child”选择器,如下所示:#in for loop with enumerate for i

selector.append(' > li:nth-child(%i)' % (i + 1)) # identify child

by its order pos

这是我的硬注释代码解决方案,例如:def parse_crippled_shifted_list(driver, frame, selector, level=1, parent_id=0, path=None):

"""

Traversal of html list of special structure (you can't know if element has sub list unless you enter it).

Supports start from remembered list element.

Nested lists have classes "closed" and "last closed" when closed and "open" and "last open" when opened (on

).

Elements themselves have classes "leaf" and "last leaf" in both cases.

Nested lists situate in

element as
  • list. Each
    • appears after clicking
in each .

If you click

driver - WebDriver; frame - frame of the list; selector - selector to current list (

);

level - level of depth, just for console output formatting, parent_id - id of parent category (in DB),

path - remained path in categories (ORM objects) to target category to start with.

"""

# Add current level list elements

# This method selects all but loading. Just what is needed to exclude.

selector.append(' > li > a:not([class=loading])')

# Wait for child list to load

try:

query = WebDriverWait(driver, WAIT_LONG_TIME).until(

EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))

except TimeoutException:

print "%s timed out" % ''.join(selector)

else:

# List is loaded

del selector[-1] # selector correction: delete last part aimed to get loaded content

selector.append(' > li')

children = driver.find_elements_by_css_selector(''.join(selector)) # fetch list elements

# Walk the whole list

for i, child in enumerate(children):

del selector[-1] # delete non-unique li tag selector

if selector[-1] != ' > ul' and selector[-1] != 'ul.ltr':

del selector[-1]

selector.append(' > li:nth-child(%i)' % (i + 1)) # identify child

by its order pos

selector.append(' > a') # add 'li > a' reference to click

child_link = driver.find_element_by_css_selector(''.join(selector))

# If we parse freely further (no need to start from remembered position)

if not path:

# Open child

try:

double_click(driver, child_link)

except InvalidElementStateException:

print "\n\nERROR\n", InvalidElementStateException.message(), '\n\n'

else:

# Determine its type

del selector[-1] # delete changed and already useless link reference

# If

is category, it would have as child now and class="open"

# Check by class is priority, because

exists for sure.

current_li = driver.find_element_by_css_selector(''.join(selector))

# Category case - BRANCH

if current_li.get_attribute('class') == 'open' or current_li.get_attribute('class') == 'last open':

new_parent_id = process_category_case(child_link, parent_id, level) # add category to DB

selector.append(' > ul') # forward to nested list

# Wait for nested list to load

try:

query = WebDriverWait(driver, WAIT_LONG_TIME).until(

EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))

except TimeoutException:

print "\t" * level, "%s timed out (%i secs). Failed to load nested list." %\

''.join(selector), WAIT_LONG_TIME

# Parse nested list

else:

parse_crippled_shifted_list(driver, frame, selector, level + 1, new_parent_id)

# Page case - LEAF

elif current_li.get_attribute('class') == 'leaf' or current_li.get_attribute('class') == 'last leaf':

process_page_case(driver, child_link, level)

else:

raise Exception('Damn! Alien class: %s' % current_li.get_attribute('class'))

# If it's required to continue from specified category

else:

# Check if it's required category

if child_link.text == path[0].name:

# Open required category

try:

double_click(driver, child_link)

except InvalidElementStateException:

print "\n\nERROR\n", InvalidElementStateException.msg, '\n\n'

else:

# This element of list must be always category (have nested list)

del selector[-1] # delete changed and already useless link reference

# If

is category, it would have as child now and class="open"

# Check by class is priority, because

exists for sure.

current_li = driver.find_element_by_css_selector(''.join(selector))

# Category case - BRANCH

if current_li.get_attribute('class') == 'open' or current_li.get_attribute('class') == 'last open':

selector.append(' > ul') # forward to nested list

# Wait for nested list to load

try:

query = WebDriverWait(driver, WAIT_LONG_TIME).until(

EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))

except TimeoutException:

print "\t" * level, "%s timed out (%i secs). Failed to load nested list." %\

''.join(selector), WAIT_LONG_TIME

# Process this nested list

else:

last = path.pop(0)

if len(path) > 0: # If more to parse

print "\t" * level, "Going deeper to: %s" % ''.join(selector)

parse_crippled_shifted_list(driver, frame, selector, level + 1,

parent_id=last.id, path=path)

else: # Current is required

print "\t" * level, "Returning target category: ", ''.join(selector)

path = None

parse_crippled_shifted_list(driver, frame, selector, level + 1, last.id, path=None)

# Page case - LEAF

elif current_li.get_attribute('class') == 'leaf':

pass

else:

print "dummy"

del selector[-2:]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值